Skip to content

Commit

Permalink
Merge branch 'main' into pr_usage
Browse files Browse the repository at this point in the history
  • Loading branch information
1313ou committed Jan 25, 2025
2 parents 7e4c3b5 + 1274256 commit 49d263c
Show file tree
Hide file tree
Showing 62 changed files with 7,271 additions and 1,419 deletions.
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,10 @@ src/yaml/unconnected-verbs
actions.yaml
english-wordnet-2023.xml
src/lua
OEWN-Wikidata - Anno 2.csv
OEWN-Wikidata - Anno u3.csv
add_qids.py
.venv
changes.yaml
protonyms.py
protonyms.txt
6 changes: 3 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,13 @@ Fork, then clone the repo:

git clone [email protected]:your-username/english-wordnet.git

Please compile all your changes into a single `wn31.xml` file
Please compile all your changes into a single `wn.xml` file

python merge.py
python scripts/from_yaml.py

Please ensure that your contributions are valid XML

xmllint --noout --valid wn31.xml
xmllint --noout --valid wn.xml

Please make sure that the structure is valid

Expand Down
20 changes: 20 additions & 0 deletions NEW_SYNSETS.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,3 +123,23 @@ The synset should be possible to link into the graph
* **Adverbs**: No clear guidelines but at least one links should be proposed.

The more links that can be provided the better a synset is.

## Irregular forms

If a word has an irregular inflection that is not covered by the [rules in morphy](https://wordnet.princeton.edu/documentation/morphy7wn) then it shoule be added as a variant form:

For example:
```
child:
n:
form:
- children
```

```
outwear:
v:
form:
- outwore
- outworn
```
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,9 @@ The size of each resource is as follows

To compile these into a single file please use the following script(s)

python scripts/from-yaml.py
python scripts/merge.py
python scripts/from_yaml.py

This will create a file at `wn31.xml` that contains the complete wordnet.
This will create a file at `wn.xml` that contains the complete wordnet.

Further conversions are available through the converter [here](http://server1.nlp.insight-centre.org/gwn-converter/).

Expand Down
17 changes: 16 additions & 1 deletion scripts/validate.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from sense_keys import unmap_sense_key
from wordnet import xml_id_char
from collections import Counter
from from_yaml import load

def check_symmetry(wn, fix):
errors = []
Expand Down Expand Up @@ -199,7 +200,8 @@ def is_valid_sense_id(xml_id, synset):


def main():
wn = parse_wordnet("wn.xml")
#wn = parse_wordnet("wn.xml")
wn = load()

if len(sys.argv) > 1 and sys.argv[1] == "--fix":
fix = True
Expand Down Expand Up @@ -278,6 +280,8 @@ def main():

instances = set()
ilis = set()
wikidatas = set()
definitions = set()

for synset in wn.synsets:
if synset.id[-1:] != synset.part_of_speech.value:
Expand Down Expand Up @@ -365,6 +369,11 @@ def main():
if len(defn.text) == 0:
print("ERROR: empty definition for %s" % (synset.id))
errors += 1
if defn.text in definitions:
print("ERROR: duplicate definition for %s (%s)" % (synset.id, defn.text))
errors += 1
else:
definitions.add(defn.text)

sr_counter = Counter((sr.target, sr.rel_type)
for sr in synset.synset_relations)
Expand All @@ -381,6 +390,12 @@ def main():
else:
ilis.add(synset.ili)

if synset.wikidata and synset.wikidata in wikidatas:
print(f"ERROR: QID {synset.wikidata} is duplicated")
errors += 1
else:
wikidatas.add(synset.wikidata)

for synset in wn.synsets:
for sr in synset.synset_relations:
if sr.rel_type == SynsetRelType.HYPERNYM:
Expand Down
72 changes: 72 additions & 0 deletions src/deprecations.csv
Original file line number Diff line number Diff line change
Expand Up @@ -126,3 +126,75 @@
"ewn-10809460-n","i94174","ewn-10146463-n","i90187","Compositional; use 10146463-n instead (#1068)"
"ewn-83000076-n","","ewn-10146463-n","i90187","Compositional (#1068)"
"ewn-09938325-n","i88968","ewn-10146463-n","i90187","Compositional (#1068)"
"ewn-14956360-n","i115584","ewn-14749543-n","i114384","Duplicate with wrong lemma (#1128)"
"ewn-92449518-n","","ewn-04220331-n","i58900","Duplicate (#1133)"
"ewn-09555948-n","i86747","ewn-09556053-n","i86748","Duplicate (#1127)"
"ewn-09583555-n","i86901","ewn-10941631-n","i94851","Duplicate (#1127)"
"ewn-09542327-n","i86651","ewn-09544015-n","i86663","Duplicate (#1127)"
"ewn-09549740-n","i86699","ewn-09550841-n","i86709","Duplicate (#1127)"
"ewn-09542327-n","","ewn-09544015-n","i86663","Duplicate (#1127)"
"ewn-09549740-n","","ewn-09550841-n","i86709","Duplicate (#1127)"
"ewn-09616218-n","i87078","ewn-09616022-n","i87077","Duplicate (#1172)"
"ewn-01297864-n","i42176","ewn-00955670-n","i40416","Does not exist (#1172)"
"ewn-08893294-n","i83376","ewn-08891234-n","i83374","Epithet (#1127)"
"ewn-09054580-n","","ewn-09054023-n","i84121","Epithet (#1127)"
"ewn-09020171-n","i83965","ewn-09019857-n","i83964","City is also the country (#1127)"
"ewn-09054580-n","","ewn-09054023-n","i84121","Epithet (#1127)"
"ewn-09020171-n","","ewn-09019857-n","i83964","City is also the country (#1127)"
"ewn-08893163-n","i83375","ewn-08879115-n","i83373","Epithet (#1127)"
"ewn-06780078-n","i72050","ewn-06780303-n","i72051","Duplicate (#1127)"
"ewn-92333128-n","","ewn-08985864-n","","Duplicate (#1127)"
"ewn-92333130-n","","ewn-08986176-n","","Duplicate (#1127)"
"ewn-92333131-n","","ewn-08986325-n","","Duplicate (#1127)"
"ewn-92333132-n","","ewn-08986475-n","","Duplicate (#1127)"
"ewn-92333133-n","","ewn-08986627-n","","Duplicate (#1127)"
"ewn-92333134-n","","ewn-08986776-n","","Duplicate (#1127)"
"ewn-01300469-n","i42189","ewn-01285678-n","i42122","Duplicate (#1127)"
"ewn-14381098-n","i112443","ewn-14104698-n","i110927","Duplicate (#1134)"
"ewn-02025384-v","i31846","ewn-02022224-v","i31831","Duplicate (#1134)"
"ewn-92326409-n","","ewn-07664811-n","i77096","Duplicate (#1134)"
"ewn-12144165-n","i100862","ewn-07819069-n","i78116","Duplicate (#1134)"
"ewn-01743426-v","i30434","ewn-01637966-v","i29918","Duplicate (#1134)"
"ewn-01707783-s","i9334","ewn-01705397-s","i9314","Duplicate (#1134)"
"ewn-02488985-s","i13742","ewn-02179281-s","i11936","Duplicate (#1134)"
"ewn-02463673-s","i13608","ewn-02463536-a","i13607","Duplicate (#1134)"
"ewn-00932330-v","i26262","ewn-00022092-v","i21876","Duplicate (#1134)"
"ewn-00498142-v","i24194","ewn-00158495-v","i22502","Duplicate (#1134)"
"ewn-00179205-v","i22600","ewn-00173351-v","i22577","Duplicate (#1134)"
"ewn-00572673-v","i24606","ewn-00173351-v","i22577","Duplicate (#1134)"
"ewn-86491000-n","","ewn-04771667-n","i62052","Duplicate (#1134)"
"ewn-01409889-v","i28760","ewn-01410030-v","i28761","Duplicate (#1134)"
"ewn-01607363-v","i29771","ewn-01511000-v","i29265","Duplicate (#1134)"
"ewn-07384870-n","i75426","ewn-07377946-n","i75387","Duplicate (#1134)"
"ewn-92363685-n","","ewn-14744853-n","i114358","Duplicate (#1134)"
"ewn-92363714-n","","ewn-14749988-n","i114387","Duplicate (#1134)"
"ewn-92364728-n","","ewn-14927246-n","i115401","Duplicate (#1134)"
"ewn-02228261-v","i32862","ewn-02228837-v","i32865","Duplicate (#1134)"
"ewn-01383926-n","i42585","ewn-01383685-n","i42584","Non-existant (#1134)"
"ewn-01557813-n","i43440","ewn-01550784-n","i43397","Non-existant (#1134)"
"ewn-01739337-n","i44445","ewn-01739210-n","i44444","Non-existant (#1134)"
"ewn-02635917-n","i49561","ewn-02636474-n","i49565","Duplicate (#1134)"
"ewn-02636185-n","i49563","ewn-02636474-n","i49565","Duplicate (#1134)"
"ewn-92315622-n","","ewn-05608025-n","i66342","Duplicate (#1134)"
"ewn-80147706-n","","ewn-09627401-n","i87144","Duplicate (#1134)"
"ewn-82046135-n","","ewn-09914590-n","i88827","Duplicate (#1134)"
"ewn-83218519-n","","ewn-10007754-n","i89371","Duplicate (#1134)"
"ewn-89117996-n","","ewn-81007314-n","","Duplicate (#1134)"
"ewn-08564875-n","i81893","ewn-08564718-n","i81892","Duplicate (#1134)"
"ewn-00093232-r","i18715","ewn-00034576-r","i18337","Duplicate (#1134)"
"ewn-00112752-r","i18849","ewn-00032295-r","i18320","Duplicate (#1134)"
"ewn-00431167-r","i21092","ewn-00032295-r","i18320","Duplicate (#1134)"
"ewn-00228639-r","i19701","ewn-00150568-r","i19142","Duplicate (#1134)"
"ewn-00280604-r","i20097","ewn-00256795-r","i19911","Duplicate (#1134)"
"ewn-00497644-r","i21593","ewn-00046739-r","i18410","Duplicate (#1134)"
"ewn-00497861-r","i21595","ewn-00497722-r","i21594","Duplicate (#1134)"
"ewn-92299200-n","","ewn-02700534-n","i49936","Duplicate (#1134)"
"ewn-92299310-n","","ewn-02719537-n","i50046","Duplicate (#1134)"
"ewn-92302385-n","","ewn-03237120-n","i53116","Duplicate (#1134)"
"ewn-03835818-n","i56630","ewn-03691288-n","i55794","Duplicate (#1134)"
"ewn-13062308-n","i105264","ewn-13059704-n","i105253","Non-existant (#1134)"
"ewn-13067976-n","i105293","ewn-13059704-n","i105253","Non-existant (#1134)"
"ewn-02040664-n","i46092","ewn-02040367-n","i46090","Duplicate (#1134)"
"ewn-02040983-n","i46094","ewn-02040367-n","i46090","Duplicate (#1134)"
"ewn-85556310-n","","ewn-81448123-n","","Duplicate (#1134)"
"ewn-10410299-n","i91815","ewn-"10300973-n,"i91146","Duplicate (#1150)"
56 changes: 14 additions & 42 deletions src/yaml/adj.all.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2222,6 +2222,7 @@
- 08215965-n
example:
- a newly activated unit
ili: i171
members:
- activated
partOfSpeech: s
Expand Down Expand Up @@ -25932,7 +25933,8 @@
ili: i2061
members:
- coppery
- copper colored
- copper-colored
- copper-coloured
partOfSpeech: s
similar:
- 00367771-a
Expand Down Expand Up @@ -27918,8 +27920,6 @@
- having the color of cinnamon
ili: i2241
members:
- cinnamon colored
- cinnamon coloured
- cinnamon-colored
- cinnamon-coloured
partOfSpeech: s
Expand Down Expand Up @@ -53187,6 +53187,7 @@
- a scenic but devious route
- a long and circuitous journey by train and boat
- a roundabout route avoided rush-hour traffic
ili: i4215
members:
- devious
- circuitous
Expand Down Expand Up @@ -101607,7 +101608,7 @@
- 01461111-a
01461461-s:
definition:
- (chiefly a direction or description in music) very soft
- (chiefly a direction or description in music) very very soft
ili: i7981
members:
- pianissimo assai
Expand Down Expand Up @@ -118591,14 +118592,15 @@
- 01707465-s
- 01707559-s
- 01707690-s
- 01707783-s
- 01707870-s
01705397-s:
definition:
- having two leaves
ili: i9314
members:
- bifoliate
- two-leaved
- two-leafed
partOfSpeech: s
similar:
- 01704867-a
Expand Down Expand Up @@ -118810,16 +118812,6 @@
partOfSpeech: s
similar:
- 01704867-a
01707783-s:
definition:
- having two leaves
ili: i9334
members:
- two-leaved
- two-leafed
partOfSpeech: s
similar:
- 01704867-a
01707870-s:
definition:
- having a single leaf
Expand Down Expand Up @@ -151952,13 +151944,17 @@
02179281-s:
definition:
- divided into two lobes
domain_topic:
- 06076105-n
example:
- a bilobate leaf
- a bifid petal
ili: i11936
members:
- bilobate
- bilobated
- bilobed
- bifid
partOfSpeech: s
similar:
- 02178581-a
Expand Down Expand Up @@ -172905,24 +172901,13 @@
- not traveled over or through
example:
- untraveled roads
- an untraversed region
ili: i13607
members:
- untraveled
- untravelled
partOfSpeech: a
similar:
- 02463673-s
02463673-s:
definition:
- not traveled over or through
example:
- an untraversed region
ili: i13608
members:
- untraversed
partOfSpeech: s
similar:
- 02463536-a
partOfSpeech: a
02463784-a:
definition:
- made neat and tidy by trimming
Expand Down Expand Up @@ -174653,7 +174638,6 @@
partOfSpeech: a
similar:
- 02488854-s
- 02488985-s
- 02489095-s
- 02489516-s
- 02489644-s
Expand Down Expand Up @@ -174690,19 +174674,6 @@
partOfSpeech: s
similar:
- 02488224-a
02488985-s:
definition:
- divided into two lobes
domain_topic:
- 06076105-n
example:
- a bifid petal
ili: i13742
members:
- bifid
partOfSpeech: s
similar:
- 02488224-a
02489095-s:
definition:
- resembling a fork; divided or separated into two branches
Expand Down Expand Up @@ -183422,6 +183393,7 @@
- 06138021-n
members:
- undecideable
- undecidable
partOfSpeech: s
similar:
- 01828578-a
Expand Down
Loading

0 comments on commit 49d263c

Please sign in to comment.