This is the official repository for Extracting Lexical Features from Dialects via Interpretable Dialect Classifiers . We provide the code for training and evaluating the dialect classifiers, as well as the code for extracting and evaluating the lexical features.
pip install -r requirements.txt
The data used in this work are the FRMT, LSDC, ITDI, and Europarl v8 datasets. The processed data and data processing scripts are placed in the data
directory.
Both training code for LOO and selfExplain are placed in the model
directory.
The evaluation code and data including plasusibility
, sufficiency
, and human evaluation
are placed in the evaluation
directory.
If you use our tool, we'd appreciate if you cite the following paper:
@misc{xie2024extracting,
title={Extracting Lexical Features from Dialects via Interpretable Dialect Classifiers},
author={Roy Xie and Orevaoghene Ahia and Yulia Tsvetkov and Antonios Anastasopoulos},
year={2024},
eprint={2402.17914},
archivePrefix={arXiv},
primaryClass={cs.CL}
}