-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jakarta in Indonesia #7
Comments
Similar:
paraguay_mercopress_7.txt.tsv |
|
Birmingham, Alabama, in the United States - where to draw the line, or is it one entity? |
|
where "in" really becomes a problem is when it merges multiple tags:
or
|
Thanks for these. I think we need to make a consistent labeling job here. Maybe we can say that if it's just a comma separating them, it should be one entity and if there are any other words then it shouldn't? What are your thoughts |
I think as long as we're consistent, we're fine, but the example where two labels overlap because the |
Correct. I think the move here is that I'll edit all occurrences that I can find to have entities connected by commas to be one entity span, and then we can have entities that are separated by anything else (e.g. |
Phrases like this: one entity or two?
Conll has "Old Trafford in Manchester" as two, but our standard would normally have "Jakarta, Indonesia" as one
The text was updated successfully, but these errors were encountered: