project for paper Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information (LREC-COLING 2024))
- Python 3.8
#>pip install -r requirements.txt
#>export PYTHONPATH=<ROOT_PROJECT_FOLDER>
The main train process require the mentions pairs and embeddings from each set.
We first construct RST tress for each documents, When generate mention pair, we construct cross-document lexical chains.
Since WEC-Eng/WEC-Zh train set contains many mentions, generating all negative pairs is very resource and time consuming.
To that end, we added a control for the negative:positive ratio.
#>python src/preprocess_gen_pairs.py
To generate the embeddings for WEC-Eng/WEC-Zh run the following script and provide the slit files location, for example:
#>python src/preprocess_embed.py
We use the generated embeddings to initialize node
#>python src/preprocess_edu_embed.py
See train.py
file header for the complete set of script parameters.
Model file will be saved at output folder (for each iteration that improves).
- For training over WEC-Eng/WEC-Zh:
#> python src/train.py
#> python src/inference.py
#> python src/custer.py