-
Notifications
You must be signed in to change notification settings - Fork 42
EPE 2017
TEES is one of the downstream applications in the EPE 2017 shared task. Using the TEES development branch it will be possible for the participants to evaluate their own parses for the biological event extraction downstream task. More information will be provided on this page once it becomes available.
This guide will show how to use TEES to evaluate your EPE shared task parse submissions. For more information on using TEES, please refer to the main documentation.
The EPE task organizers will evaluate your submissions, but you can also use the downstream programs yourself to test your parses. For the biological event extraction downstream task you will need to download or checkout the development branch of TEES. You don't need to install TEES, the scripts can be used directly from the downloaded directory. However, before using TEES you need to run the 'configure.py' program in order to setup the datasets and programs TEES uses. To configure TEES, please run the following program:
python configure.py
The configure program will offer you various options. It is OK to accept the defaults, but if you are using TEES only for evaluating EPE parses, you only need to install the classifier (option 1) and the corpora (option 3). For the corpora, it is enough to install the BioNLP'09 corpus (option 2) and the evaluators (option 7).
Once you have saved your parses for the biomedical event extraction downstream task in the EPE format, you can use the TEES EPE shared task evaluation script. From the main TEES directory, please go to the directory 'Utils' and run the evaluator program:
python EvaluateEPE.py -i [INPUT-DIRECTORY] -o [OUTPUT-DIRECTORY]
By default the evaluation script will look for *.epe files in the directory [INPUT-DIRECTORY] and all of its immediate subdirectories. If you need to choose which directories to import from, please use the '-s' switch. It's OK to put all the event parse files in one directory, as the document ids are unique across the train, devel and test sets. By default, the program expects parses for the 'train' and 'devel' sets. If you wish to also use the 'test' set, please use the '-t' switch.
The evaluation script will import the parses, convert them to the Interaction XML format and train an event extraction machine learning model using your parses. The training will be quite slow by default, but you can increase the number of parallel processes to use with the '-n' switch. For example, to train with five parallel processes, please use the command:
python EvaluateEPE.py -i [INPUT-DIRECTORY] -o [OUTPUT-DIRECTORY] -n 5
For example, on a desktop machine with 16 Gb of memory it is generally possible to use five parallel training processes, but if the parses are very complex and have many dependencies, you may need to use fewer parallel processes. Even with five-fold parallelization running the training process will take a couple of hours to finish on a regular desktop computer.
The most important file in the [OUTPUT-DIRECTORY] is 'log.txt' which records all parts of the training process. The final result (for the devel set) will appear under the section:
------------ Check devel classification ------------
The results are shown using the official BioNLP'09 Shared Task evaluator, for example with the default BLLIP-McCLosky-Stanford parses the following performance is reached on the devel set:
##### strict evaluation mode #####
------------------------------------------------------------------------------------
Event Class gold (match) answer (match) recall prec. fscore
------------------------------------------------------------------------------------
Gene_expression 356 ( 272) 379 ( 272) 76.40 71.77 74.01
Transcription 82 ( 52) 95 ( 52) 63.41 54.74 58.76
Protein_catabolism 21 ( 17) 22 ( 17) 80.95 77.27 79.07
Phosphorylation 47 ( 36) 62 ( 36) 76.60 58.06 66.06
Localization 53 ( 35) 51 ( 35) 66.04 68.63 67.31
=[SVT-TOTAL]= 559 ( 412) 609 ( 412) 73.70 67.65 70.55
Binding 249 ( 88) 225 ( 88) 35.34 39.11 37.13
==[EVT-TOTAL]== 808 ( 500) 834 ( 500) 61.88 59.95 60.90
------------------------------------------------------------------------------------
Regulation 173 ( 51) 154 ( 51) 29.48 33.12 31.19
Positive_regulation 618 ( 226) 543 ( 226) 36.57 41.62 38.93
Negative_regulation 196 ( 68) 180 ( 68) 34.69 37.78 36.17
==[REG-TOTAL]== 987 ( 345) 877 ( 345) 34.95 39.34 37.02
------------------------------------------------------------------------------------
Negation 107 ( 26) 56 ( 26) 24.30 46.43 31.90
Speculation 95 ( 13) 44 ( 13) 13.68 29.55 18.71
==[MOD-TOTAL]== 202 ( 39) 100 ( 39) 19.31 39.00 25.83
------------------------------------------------------------------------------------
==[ALL-TOTAL]== 1997 ( 884) 1811 ( 884) 44.27 48.81 46.43
------------------------------------------------------------------------------------
##### approximate span and recursive mode #####
------------------------------------------------------------------------------------
Event Class gold (match) answer (match) recall prec. fscore
------------------------------------------------------------------------------------
Gene_expression 356 ( 285) 379 ( 284) 80.06 74.93 77.41
Transcription 82 ( 58) 95 ( 58) 70.73 61.05 65.54
Protein_catabolism 21 ( 18) 22 ( 18) 85.71 81.82 83.72
Phosphorylation 47 ( 38) 62 ( 38) 80.85 61.29 69.72
Localization 53 ( 35) 51 ( 35) 66.04 68.63 67.31
=[SVT-TOTAL]= 559 ( 434) 609 ( 433) 77.64 71.10 74.23
Binding 249 ( 98) 225 ( 98) 39.36 43.56 41.35
==[EVT-TOTAL]== 808 ( 532) 834 ( 531) 65.84 63.67 64.74
------------------------------------------------------------------------------------
Regulation 173 ( 59) 154 ( 59) 34.10 38.31 36.09
Positive_regulation 618 ( 253) 539 ( 253) 40.94 46.94 43.73
Negative_regulation 196 ( 77) 180 ( 77) 39.29 42.78 40.96
==[REG-TOTAL]== 987 ( 389) 873 ( 389) 39.41 44.56 41.83
------------------------------------------------------------------------------------
Negation 107 ( 33) 55 ( 31) 30.84 56.36 39.87
Speculation 95 ( 22) 44 ( 19) 23.16 43.18 30.15
==[MOD-TOTAL]== 202 ( 55) 99 ( 50) 27.23 50.51 35.38
------------------------------------------------------------------------------------
==[ALL-TOTAL]== 1997 ( 976) 1806 ( 970) 48.87 53.71 51.18
------------------------------------------------------------------------------------
##### event decomposition in the approximate span mode #####
------------------------------------------------------------------------------------
Event Class gold (match) answer (match) recall prec. fscore
------------------------------------------------------------------------------------
Gene_expression 356 ( 285) 379 ( 284) 80.06 74.93 77.41
Transcription 82 ( 58) 95 ( 58) 70.73 61.05 65.54
Protein_catabolism 21 ( 18) 22 ( 18) 85.71 81.82 83.72
Phosphorylation 47 ( 38) 62 ( 38) 80.85 61.29 69.72
Localization 53 ( 35) 51 ( 35) 66.04 68.63 67.31
=[SVT-TOTAL]= 559 ( 434) 609 ( 433) 77.64 71.10 74.23
Binding 249 ( 98) 225 ( 98) 39.36 43.56 41.35
==[EVT-TOTAL]== 808 ( 532) 834 ( 531) 65.84 63.67 64.74
------------------------------------------------------------------------------------
Regulation 173 ( 59) 154 ( 59) 34.10 38.31 36.09
Positive_regulation 618 ( 253) 539 ( 253) 40.94 46.94 43.73
Negative_regulation 196 ( 77) 180 ( 77) 39.29 42.78 40.96
==[REG-TOTAL]== 987 ( 389) 873 ( 389) 39.41 44.56 41.83
------------------------------------------------------------------------------------
Negation 107 ( 33) 55 ( 31) 30.84 56.36 39.87
Speculation 95 ( 22) 44 ( 19) 23.16 43.18 30.15
==[MOD-TOTAL]== 202 ( 55) 99 ( 50) 27.23 50.51 35.38
------------------------------------------------------------------------------------
==[ALL-TOTAL]== 1997 ( 976) 1806 ( 970) 48.87 53.71 51.18
------------------------------------------------------------------------------------
The 'ALL-TOTAL' F-score for the 'approximate span and recursive mode' is the official metric of the BioNLP'09 Shared Task, so the above system would have an overall performance of 51.18 F-score.
When you run the EvaluateEPE program, your parses are converted into the Interaction XML format and stored in the 'corpus' subdirectory under the [OUTPUT-DIRECTORY]. The TEES visualizer can be used to look at the parse graphs. On the main TEES package level, please run e.g. the command:
python visualizer.py -i [OUTPUT-DIRECTORY/corpus/GE09-devel.xml]
In order to use the visualizer, you will need to have the GraphViz and Tk packages installed.