Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Senorita in English Wordnet #755

Closed
arademaker opened this issue Sep 15, 2021 · 4 comments · Fixed by #850
Closed

Senorita in English Wordnet #755

arademaker opened this issue Sep 15, 2021 · 4 comments · Fixed by #850
Labels
delete synset A synset should be removed and deprecated in the ILI synset member A word in a synset should be added or removed
Milestone

Comments

@arademaker
Copy link
Member

arademaker commented Sep 15, 2021

Synset

ewn-06353552-n (Interlingual Index: i69807)

(n) Senorita senorita%1:10:00:: a Spanish title or form of address used to or of an unmarried girl or woman; similar to the English `Miss'

Motivation

It doesn't make sense to have a synset for a Spanish word in the English Wordnet.. There are other cases (classifiedBy Spanish and German)

@arademaker arademaker added the delete synset A synset should be removed and deprecated in the ILI label Sep 15, 2021
@jmccrae jmccrae added this to the 2022 Release milestone Sep 16, 2021
@jmccrae
Copy link
Member

jmccrae commented Sep 16, 2021

This word is widely used by English speakers with no knowledge of Spanish as supported by corpus evidence and other dictionaries, so I think it is part of the English language too.

@jmccrae
Copy link
Member

jmccrae commented Sep 16, 2021

As a side issue, we now support Unicode so we can add the lemma señorita also.

@jmccrae jmccrae added the synset member A word in a synset should be added or removed label Sep 16, 2021
@arademaker
Copy link
Member Author

arademaker commented Sep 16, 2021

Does this argument support also the other cases that I mentioned? German words? For "corpus evidence and other dictionaries", hard to be precise here right? What dictionaries?

It has:

  1. https://en.m.wiktionary.org/wiki/senorita#English
  2. https://www.merriam-webster.com/dictionary/senorita

It doesn't have:

  1. https://dictionary.cambridge.org/spellcheck/english/?q=senorita

What corpus? How many occurrences count as sufficient for evidence? Sorry, I just intend to be more precise for decisions.

@jmccrae
Copy link
Member

jmccrae commented Sep 16, 2021

The other examples are "don", "dona", "senora", "Frau", "Fraulein" and "Herr"? I would assume all of these are fine too but I can check.

The guidelines (which are here https://github.com/globalwordnet/english-wordnet/blob/master/NEW_SYNSETS.md) state that it must be at least 100 occurrences in Sketch Engine TenTen's corpus*. There are 3,788 for Senorita, so this is fine.

Similarly it is listed in at least one of our reference dictionaries (listed here https://github.com/globalwordnet/english-wordnet/blob/master/DICTIONARIES.md), so it also passes here. As such, there would be a clear case for including this word if it was a new synset proposal... also the arguments for deleting a synset need to be stronger than for creating a new synset.

EnejdaN added a commit to EnejdaN/english-wordnet that referenced this issue Jun 14, 2022
New synset member: señorita.
@jmccrae jmccrae linked a pull request Jun 17, 2022 that will close this issue
jmccrae added a commit that referenced this issue Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
delete synset A synset should be removed and deprecated in the ILI synset member A word in a synset should be added or removed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants