Skip to content
Jari Björne edited this page May 14, 2017 · 4 revisions

The First Shared Task on Extrinsic Parser Evaluation (EPE 2017)

TEES is one of the downstream applications in the EPE 2017 shared task. Using the TEES development branch it will be possible for the participants to evaluate their own parses for the biological event extraction downstream task. More information will be provided on this page once it becomes available.

Evaluating your parses with TEES

This guide will show how to use TEES to evaluate your EPE shared task parse submissions. For more information on using TEES, please refer to the main documentation.

Installing TEES

The EPE task organizers will evaluate your submissions, but you can also use the downstream programs yourself to test your parses. For the biological event extraction downstream task you will need to download or checkout the development branch of TEES. You don't need to install TEES, the scripts can be used directly from the downloaded directory. However, before using TEES you need to run the 'configure.py' program in order to setup the datasets and programs TEES uses. To configure TEES, please run the following program:

python configure.py

The configure program will offer you various options. It is OK to accept the defaults, but if you are using TEES only for evaluating EPE parses, you only need to install the classifier (option 1) and the corpora (option 3). For the corpora, it is enough to install the BioNLP'09 corpus (option 2) and the evaluators (option 7).

Evaluating EPE parses

Once you have saved your parses for the biomedical event extraction downstream task in the EPE format, you can use the TEES EPE shared task evaluation script. From the main TEES directory, please go to the directory 'Utils' and run the evaluator program:

python EvaluateEPE.py -i [INPUT-DIRECTORY] -o [OUTPUT-DIRECTORY]

By default the evaluation script will look for *.epe files in the directory [INPUT-DIRECTORY] and all of its immediate subdirectories. If you need to choose which directories to import from, please use the '-s' switch. It's OK to put all the event parse files in one directory, as the document ids are unique across the train, devel and test sets. By default, the program expects parses for the 'train' and 'devel' sets. If you wish to also use the 'test' set, please use the '-t' switch.

The evaluation script will import the parses, convert them to the Interaction XML format and train an event extraction machine learning model using your parses. The training will be quite slow by default, but you can increase the number of parallel processes to use with the '-n' switch. For example, to train with five parallel processes, please use the command:

python EvaluateEPE.py -i [INPUT-DIRECTORY] -o [OUTPUT-DIRECTORY] -n 5

For example, on a desktop machine with 16 Gb of memory it is generally possible to use five parallel training processes, but if the parses are very complex and have many dependencies, you may need to use fewer parallel processes. Even with five-fold parallelization running the training process will take a couple of hours to finish on a regular desktop computer.

TEES results

The most important file in the [OUTPUT-DIRECTORY] is 'log.txt' which records all parts of the training process. The final result (for the devel set) will appear under the section:

------------ Check devel classification ------------

The results are shown using the official BioNLP'09 Shared Task evaluator, for example with the default BLLIP-McCLosky-Stanford parses the following performance is reached on the devel set:

##### strict evaluation mode #####
------------------------------------------------------------------------------------
     Event Class          gold (match)   answer (match)   recall    prec.   fscore
------------------------------------------------------------------------------------
    Gene_expression        356 (  272)      379 (  272)    76.40    71.77    74.01
     Transcription          82 (   52)       95 (   52)    63.41    54.74    58.76
  Protein_catabolism        21 (   17)       22 (   17)    80.95    77.27    79.07
    Phosphorylation         47 (   36)       62 (   36)    76.60    58.06    66.06
     Localization           53 (   35)       51 (   35)    66.04    68.63    67.31
     =[SVT-TOTAL]=         559 (  412)      609 (  412)    73.70    67.65    70.55
        Binding            249 (   88)      225 (   88)    35.34    39.11    37.13
    ==[EVT-TOTAL]==        808 (  500)      834 (  500)    61.88    59.95    60.90
------------------------------------------------------------------------------------
      Regulation           173 (   51)      154 (   51)    29.48    33.12    31.19
  Positive_regulation      618 (  226)      543 (  226)    36.57    41.62    38.93
  Negative_regulation      196 (   68)      180 (   68)    34.69    37.78    36.17
    ==[REG-TOTAL]==        987 (  345)      877 (  345)    34.95    39.34    37.02
------------------------------------------------------------------------------------
       Negation            107 (   26)       56 (   26)    24.30    46.43    31.90
      Speculation           95 (   13)       44 (   13)    13.68    29.55    18.71
    ==[MOD-TOTAL]==        202 (   39)      100 (   39)    19.31    39.00    25.83
------------------------------------------------------------------------------------
    ==[ALL-TOTAL]==       1997 (  884)     1811 (  884)    44.27    48.81    46.43
------------------------------------------------------------------------------------
##### approximate span and recursive mode #####
------------------------------------------------------------------------------------
     Event Class          gold (match)   answer (match)   recall    prec.   fscore
------------------------------------------------------------------------------------
    Gene_expression        356 (  285)      379 (  284)    80.06    74.93    77.41
     Transcription          82 (   58)       95 (   58)    70.73    61.05    65.54
  Protein_catabolism        21 (   18)       22 (   18)    85.71    81.82    83.72
    Phosphorylation         47 (   38)       62 (   38)    80.85    61.29    69.72
     Localization           53 (   35)       51 (   35)    66.04    68.63    67.31
     =[SVT-TOTAL]=         559 (  434)      609 (  433)    77.64    71.10    74.23
        Binding            249 (   98)      225 (   98)    39.36    43.56    41.35
    ==[EVT-TOTAL]==        808 (  532)      834 (  531)    65.84    63.67    64.74
------------------------------------------------------------------------------------
      Regulation           173 (   59)      154 (   59)    34.10    38.31    36.09
  Positive_regulation      618 (  253)      539 (  253)    40.94    46.94    43.73
  Negative_regulation      196 (   77)      180 (   77)    39.29    42.78    40.96
    ==[REG-TOTAL]==        987 (  389)      873 (  389)    39.41    44.56    41.83
------------------------------------------------------------------------------------
       Negation            107 (   33)       55 (   31)    30.84    56.36    39.87
      Speculation           95 (   22)       44 (   19)    23.16    43.18    30.15
    ==[MOD-TOTAL]==        202 (   55)       99 (   50)    27.23    50.51    35.38
------------------------------------------------------------------------------------
    ==[ALL-TOTAL]==       1997 (  976)     1806 (  970)    48.87    53.71    51.18
------------------------------------------------------------------------------------
##### event decomposition in the approximate span mode #####
------------------------------------------------------------------------------------
     Event Class          gold (match)   answer (match)   recall    prec.   fscore
------------------------------------------------------------------------------------
    Gene_expression        356 (  285)      379 (  284)    80.06    74.93    77.41
     Transcription          82 (   58)       95 (   58)    70.73    61.05    65.54
  Protein_catabolism        21 (   18)       22 (   18)    85.71    81.82    83.72
    Phosphorylation         47 (   38)       62 (   38)    80.85    61.29    69.72
     Localization           53 (   35)       51 (   35)    66.04    68.63    67.31
     =[SVT-TOTAL]=         559 (  434)      609 (  433)    77.64    71.10    74.23
        Binding            249 (   98)      225 (   98)    39.36    43.56    41.35
    ==[EVT-TOTAL]==        808 (  532)      834 (  531)    65.84    63.67    64.74
------------------------------------------------------------------------------------
      Regulation           173 (   59)      154 (   59)    34.10    38.31    36.09
  Positive_regulation      618 (  253)      539 (  253)    40.94    46.94    43.73
  Negative_regulation      196 (   77)      180 (   77)    39.29    42.78    40.96
    ==[REG-TOTAL]==        987 (  389)      873 (  389)    39.41    44.56    41.83
------------------------------------------------------------------------------------
       Negation            107 (   33)       55 (   31)    30.84    56.36    39.87
      Speculation           95 (   22)       44 (   19)    23.16    43.18    30.15
    ==[MOD-TOTAL]==        202 (   55)       99 (   50)    27.23    50.51    35.38
------------------------------------------------------------------------------------
    ==[ALL-TOTAL]==       1997 (  976)     1806 (  970)    48.87    53.71    51.18
------------------------------------------------------------------------------------

The 'ALL-TOTAL' F-score for the 'approximate span and recursive mode' is the official metric of the BioNLP'09 Shared Task, so the above system would have an overall performance of 51.18 F-score.

Visualizing Interaction XML files

When you run the EvaluateEPE program, your parses are converted into the Interaction XML format and stored in the 'corpus' subdirectory under the [OUTPUT-DIRECTORY]. The TEES visualizer can be used to look at the parse graphs. On the main TEES package level, please run e.g. the command:

python visualizer.py -i [OUTPUT-DIRECTORY/corpus/GE09-devel.xml]

In order to use the visualizer, you will need to have the GraphViz and Tk packages installed.