-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
duplicate synsets vs. poorly worded glosses #161
Comments
I agree that glosses are the real problem and probably because they were introduced later in PWN. But issues should be more specific, and what we really need is to devise a methodology for what would a good template for glosses. The link is broken, I think the stable link for this article is https://dl.acm.org/citation.cfm?id=1614064? I found other versions via Google too. Thank you for sharing. I am aware of SUMO solutions but I don't expect to be easy to define a uniform approach for the ontology ~> WN clustering decisions. I tend to believe more on the original idea of using a semantic concordance, https://dl.acm.org/citation.cfm?id=1075742 as a guide for sense clustering. I am reading the OntoNotes solution now and it seems to be more related to the idea of semantic concordance, corpus annotation. |
This is closely related to #141 I think there are a few issues here:
|
I totally agree with @restinplace on:
|
Closing this as it seems to be a very general discussion which will not suggest any specific changes to the resource. For discussion of the process of writing definitions see #141. For a general idea of whether to split synsets I suggest we continue to proceed case by case. The self-hypernyms can be discussed under #237 |
This is prompted by the "court #160" discussion.
Imho poor glosses are common, but true synset duplicates
(for alternative senses of a word) are rare. I assume the reason
is that glosses were written independently to denote each sense,
rather than as a system of contrasts that both denote individual
senses, and clearly partition related synsets' overlapping
semantic spaces.
Even as a native speaker I occasionally scratch my head, but in the
end I almost invariably agree that the sense distinction is there,
and is worth making (just as I usually disagree with splitting hairs
to add more senses that may exist, but imho are not synset-worthy! ).
The "solution" has generally been to try to coarsen the WN sense
inventory, not just by posts here, but see e.g. Hovy et al 2006
"OntoNotes: the 90% Solution"
https://dl.acm.org/citation.cfm?id=1614064
The SUMO :: WN mappings are similar to the OntoNotes clusters,
and other approaches to WN sense coarsening have been published.
For example, for "court#n" we have these coarsened sense sets:
SUMO (n):Government court#n#1|court#n#8
OntoNotes (n):a sovereign regime and its assemblage (1) court#n#3|court#n#6
I mention this because in a sledge-hammer kind of way these are
gynormous sets of "duplicate synset" notices. I think it would be
a very bad idea to treat them as requests, but they provide an extremely
informative guide as to a) why existing glosses are confusing, and
b) how a minimal gloss change might clarify the implied contrast
(as we saw the other day
I also think that trying to improve the gloss first is advantageous
because it:
simply compare the old vs new glosses (or gloss sets).
In contrast, the effects of adding or deleting forms or synsets
is not always readily apparent (although obviously it's still the
best path sometimes).
The text was updated successfully, but these errors were encountered: