Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrections for Xposition import #1

Open
lgessler opened this issue Oct 27, 2021 · 0 comments
Open

Corrections for Xposition import #1

lgessler opened this issue Oct 27, 2021 · 0 comments

Comments

@lgessler
Copy link

lgessler commented Oct 27, 2021

Hi, so I'm working on getting this corpus into Xposition now as I mentioned and we need to work through some "errors" the validator found together. Some of these probably are data issues, but a lot of these errors are just a result of missing configuration that we need to add for Chinese. A lot of the issues now, for instance, are caused by incompatible UPOS/LexCats. The validator code assumes they should always be the same except for a specific list of exceptions, and we likely need to extend this list for Mandarin.

Here's a link to the list of errors. Maybe it'd be best for us to meet sometime to discuss what's going on in here and how we can fix it in the config.

BTW, to reproduce this:

git clone https://github.com/lgessler/conllulex.git
cd conllulex
pip install -e .
wget https://raw.githubusercontent.com/nert-nlp/Chinese-SNACS/master/out.conllulex
conllulex-enrich -c prince_zh out.conllulex chinese_enriched.conllulex
conllulex2json -c prince_zh chinese_enriched.conllulex chinese.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant