Skip to content

Commit

Permalink
Merge branch 'main' into pr_usage
Browse files Browse the repository at this point in the history
  • Loading branch information
1313ou committed Nov 2, 2024
2 parents 4fc62bb + 50eef25 commit 2f72a35
Show file tree
Hide file tree
Showing 16 changed files with 50 additions and 43 deletions.
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/delete-synset.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: Delete synset
about: Propose that a synset be removed from WordNet (do not use for merging synsets)
about: Propose that a synset be removed from Wordnet (do not use for merging synsets)
title: ''
labels: delete synset
assignees: ''
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/new-synset.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: New synset
about: Suggest a novel word that does not correspond to an existing concept in WordNet
about: Suggest a novel word that does not correspond to an existing concept in Wordnet
title: ''
labels: new synset
assignees: ''
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/other-change-to-wordnet.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
name: Other change to WordNet
name: Other change to Wordnet
about: Some other change to the content of the resource
title: ''
labels: ''
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ Finally, don't forget that it is human to make mistakes! We all do. Let’s work

### Neutrality

Open English WordNet aims to be a politically neutral resource, however we accept that the description of the language necessarily involves making political statements. In the case where politics are relevant to the particular word, we follow Wikipedia in the definition of political entities, states and so forth. We ask all contributors to avoid political statements when not relevant to the discussion.
Open English Wordnet aims to be a politically neutral resource, however we accept that the description of the language necessarily involves making political statements. In the case where politics are relevant to the particular word, we follow Wikipedia in the definition of political entities, states and so forth. We ask all contributors to avoid political statements when not relevant to the discussion.

### Thanks
Derived from [thoughbot's code of conduct](https://thoughtbot.com/open-source-code-of-conduct)
2 changes: 1 addition & 1 deletion DICTIONARIES.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Reference Dictionaries

This is a list of reference dictionaries that should be used when considering
changes to Open English WordNet
changes to Open English Wordnet

* [American Heritage Dictionary](https://www.ahdictionary.com/)
* [Cambridge Dictionary](https://dictionary.cambridge.org/)
Expand Down
4 changes: 2 additions & 2 deletions EDITING.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Editing Open English WordNet
# Editing Open English Wordnet

This document contains guidelines for making changes to Open English WordNet and fixing issues using GitHub and the English WordNet Editor (EWE)
This document contains guidelines for making changes to Open English Wordnet and fixing issues using GitHub and the English Wordnet Editor (EWE)

You should take the following steps to implement a change in the resource

Expand Down
2 changes: 1 addition & 1 deletion LICENSE.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
This resource is derived from Princeton WordNet under the WordNet License
and further developed under the Creative Commons Attribution 4.0 International License.
You may share and adapt this resource providing attribution is given to both
Princeton WordNet and the Open English WordNet team.
Princeton WordNet and the Open English Wordnet team.

Creative Commons Attribution 4.0 International

Expand Down
18 changes: 9 additions & 9 deletions NEW_SYNSETS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Guidelines for the addition of new synsets

New synsets should be added with some caution to Open English WordNet. This document
New synsets should be added with some caution to Open English Wordnet. This document
describes the criteria that should be applied before introducing a new synset
to the resource.

Expand All @@ -20,7 +20,7 @@ a reason to add more synsets of poor quality*

## Significance

A concept in Open English WordNet should be significant, this means that it should
A concept in Open English Wordnet should be significant, this means that it should
be possible to easily find **at least 100 examples** of the usage of the word
with this meaning. This can be done by using a search interface such as
[Sketch Engine](http://sketchengine.eu) or other corpus search interface.
Expand All @@ -31,21 +31,21 @@ In the case that a new sense of an existing word is being proposed, then it
should be possible to propose collocates that occur with this sense of the word
and these can be used to find and distinguish examples.

Open English WordNet is a dictionary not an encyclopedia. For this reason, it should
Open English Wordnet is a dictionary not an encyclopedia. For this reason, it should
not contain long lists of people, places, organizations, etc. Proper nouns are
generally not expected to be included in the resource and many kinds of common
nouns for narrow domains or geographical usage should not be included, examples
of this would include elements of different cuisines around the world. As a rule
of thumb, if there is a Wikipedia page for this concept it should not be in
Open English WordNet
Open English Wordnet

NB: *Deeper integration of Wikipedia/Wikidata is planned to allow these concepts
to be referred to from Open English WordNet*
to be referred to from Open English Wordnet*

## Non-compositionality

One of the goals of Open English WordNet is to support annotation. If a word or term
is already covered by Open English WordNet it should not be added.
One of the goals of Open English Wordnet is to support annotation. If a word or term
is already covered by Open English Wordnet it should not be added.

For multiword terms, this means that the meaning of the term should not be
derivable from its components, e.g., "French Army" could be tagged with the
Expand All @@ -63,7 +63,7 @@ include:

## Distinction

The concept should be distinct from other concepts in the WordNet. You should
The concept should be distinct from other concepts in the wordnet. You should
think about and check relevant synonyms. This should probably be considered in
terms of a substitution check, e.g.,

Expand All @@ -81,7 +81,7 @@ as the subject has a different semantic role.
## Well-defined

It should be possible to easily write a definition for this concept that is
distinct from other concepts in Open English WordNet. A good definition consists of
distinct from other concepts in Open English Wordnet. A good definition consists of
a *genus* and a *differentia*

* **Genus**: The type of the thing, often the hypernym
Expand Down
20 changes: 12 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,22 @@
# Open English WordNet
# Open English Wordnet

Open English WordNet is a lexical network of the English language grouping words into synsets and linking them according
Open English Wordnet is a lexical network of the English language grouping words into synsets and linking them according
to relationships such as hypernymy, antonymy and meronymy. It is intended to be used in natural language processing
applications and provides deep lexical information about the English language as a graph.

Open English WordNet is a fork of the [Princeton Wordnet](https://wordnet.princeton.edu/) developed under
Open English Wordnet is a fork of the [Princeton WordNet](https://wordnet.princeton.edu/) developed under
an open source methodology. The quality and veracity of the resource may differ from the Princeton
WordNet and we welcome contributions. Contributions to this wordnet may eventually be incorporated into
Wordnet and we welcome contributions. Contributions to this wordnet may eventually be incorporated into
future releases of Princeton WordNet. Correspondance to previous versions and wordnets in other language is provided
through the [Collaborative Interlingual Index (CILI)](https://github.com/globalwordnet/cili). The Open English WordNet is available as individual files in [GWN-LMF](http://globalwordnet.github.io/schemas/) format.
through the [Collaborative Interlingual Index (CILI)](https://github.com/globalwordnet/cili). The Open English Wordnet is available as individual files in [GWN-LMF](http://globalwordnet.github.io/schemas/) format.

## Releases

Open English WordNet is released through the [Open English WordNet website](https://en-word.net/). The versions released are
Open English Wordnet is released through the [Open English Wordnet website](https://en-word.net/). The versions released are

* **2024 Edition** (Released 1st November 2024). [(LMF)](https://en-word.net/static/english-wordnet-2024.xml.gz)
[(RDF)](https://en-word.net/static/english-wordnet-2024.ttl.gz)
[(WNDB)](https://en-word.net/static/english-wordnet-2024.zip)
* **2023 Edition** (Released 31st October 2023). [(LMF)](https://en-word.net/static/english-wordnet-2023.xml.gz)
[(RDF)](https://en-word.net/static/english-wordnet-2023.ttl.gz)
[(WNDB)](https://en-word.net/static/english-wordnet-2023.zip)
Expand All @@ -34,6 +37,7 @@ The size of each resource is as follows

| Edition | Words | Synsets | Relations |
|---------|---------|---------|-----------|
| 2024 | 161,705 | 120,630 | 418,168 |
| 2023 | 161,338 | 120,135 | 415,905 |
| 2022 | 161,221 | 120,068 | 386,437 |
| 2021 | 163,161 | 120,039 | 384,505 |
Expand All @@ -57,14 +61,14 @@ Further conversions are available through the converter [here](http://server1.nl
We welcome changes, to make a change please read our [contributing guidelines](CONTRIBUTING.md)
and make a pull request.

Open English WordNet is a high-quality resource that acts as a gold-standard for natural language processing,
Open English Wordnet is a high-quality resource that acts as a gold-standard for natural language processing,
as such we cannot accept any automatically generated results that have not been manually validated.

Please be aware that we use the [Global WordNet Association LMF](https://globalwordnet.github.io/schemas/) and please read the guidelines for using the [format](FORMAT.md)

## License

WordNet is released under [CC-BY 4.0](LICENSE.md)
Open English Wordnet is released under [CC-BY 4.0](LICENSE.md)

## References

Expand Down
13 changes: 6 additions & 7 deletions RELEASING.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,7 @@ This is generated by Python scripts in this repo
Edit the file `scripts/merge.py` and `WNDB_License.txt` to update the version.

```
python scripts/from-yaml.py
python scripts/merge.py
python scripts/from_yaml.py
cp wn.xml english-wordnet-20XX.xml
gzip english-wordnet-20XX.xml
```
Expand All @@ -22,7 +21,7 @@ First unzip the previous release into a folder called `oewn20XX/`

```
./gwn -i english-wordnet-20XX.xml -o oewn20XX/ -f WNLMF -t WNDB --wordnet-license WNDB_License.txt
zip english-wordnet-20XX.zip -R oewn20XX/
zip -r english-wordnet-20XX.zip oewn20XX/
```

Testing:
Expand All @@ -34,20 +33,20 @@ wordnet novel_lemma_in_this_edition -over

## RDF Release and en-word.net site

This is based on [WordNet Angular](https://github.com/jmccrae/wordnet-angular)
This is based on [Wordnet Angular](https://github.com/jmccrae/wordnet-angular)

Firstly, load the database from the XML file

```
cargo +nightly build --release
cargo build --release
rm -f wordnet.db
cargo +nightly run --release --bin wordnet-angular -- --reload --wn ../../globalwordnet/english-wordnet/wn.xml -s en
cargo run --release --bin wordnet-angular -- --reload --wn ../../globalwordnet/english-wordnet/wn.xml -s en
```

Now dump the RDF data

```
cargo +nightly run --release --bin wordnet-rdf-dump -- -s en > wn.ttl
cargo run --release --bin wordnet-rdf-dump -- -s en > wn.ttl
rapper -i turtle -o turtle wn.ttl > english-wordnet-2023.ttl
gzip english-wordnet-2023.ttl
```
Expand Down
12 changes: 6 additions & 6 deletions TOOLS.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
# Tools for Working with Open English WordNet
# Tools for Working with Open English Wordnet

## English WordNet Editor
## English Wordnet Editor

This tool is a command-line editor for English WordNet. It allows you to add, delete, and modify synsets, words, and relations in English WordNet. It is written in Python and uses the `nltk` library to interact with WordNet. The tool is available on GitHub at https://github.com/jmccrae/ewe
This tool is a command-line editor for English Wordnet. It allows you to add, delete, and modify synsets, words, and relations in English Wordnet. It is written in Python and uses the `nltk` library to interact with Wordnet. The tool is available on GitHub at https://github.com/jmccrae/ewe

## WN Python Library

This is a Python library for working with WordNet. It provides a simple interface for querying WordNet and accessing synsets, words, and relations. The library is available on GitHub at https://github.com/goodmami/wn
This is a Python library for working with Wordnet. It provides a simple interface for querying Wordnet and accessing synsets, words, and relations. The library is available on GitHub at https://github.com/goodmami/wn

## OEWN-CORE Python Library

This is a Python core IO library for Open English WordNet. It provides a simple, stripped-down OEWN model, loaded from YAML, that can be saved to the same format. Extension supports XML for loading and saving. The library is available on GitHub at https://github.com/oewntk/oewn-core
This is a Python core IO library for Open English Wordnet. It provides a simple, stripped-down OEWN model, loaded from YAML, that can be saved to the same format. Extension supports XML for loading and saving. The library is available on GitHub at https://github.com/oewntk/oewn-core

## OEWNTK Kotlin Library

This is a Kotlin library for Open English WordNet. It provides a (JVM) OEWN model and a number of modules for loading or saving it notably to WNDB, SQL (Sqlite and MySql) and JSON formats. The library is available on GitHub at https://github.com/oewntk/oewntk. Binaries are available on [Maven Central](https://central.sonatype.com/namespace/io.github.oewntk).
This is a Kotlin library for Open English Wordnet. It provides a (JVM) OEWN model and a number of modules for loading or saving it notably to WNDB, SQL (Sqlite and MySql) and JSON formats. The library is available on GitHub at https://github.com/oewntk/oewntk. Binaries are available on [Maven Central](https://central.sonatype.com/namespace/io.github.oewntk).

8 changes: 4 additions & 4 deletions scripts/from_yaml.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ def load(year="2022"):
"""
Load wordnet from YAML files
"""
wn = Lexicon("oewn", "Open Engish WordNet", "en",
wn = Lexicon("oewn", "Open Engish Wordnet", "en",
"[email protected]",
"https://creativecommons.org/licenses/by/4.0",
year,
Expand Down Expand Up @@ -220,9 +220,9 @@ def load(year="2022"):
for synset in wn.synsets:
if synset.lex_name not in by_lex_name:
by_lex_name[synset.lex_name] = Lexicon(
"oewn", "Open English WordNet", "en",
"oewn", "Open English Wordnet", "en",
"[email protected]", "https://wordnet.princeton.edu/license-and-commercial-use",
"2019", "https://github.com/globalwordnet/english-wordnet")
year, "https://github.com/globalwordnet/english-wordnet")
by_lex_name[synset.lex_name].add_synset(synset)

return wn
Expand Down Expand Up @@ -369,7 +369,7 @@ def main():
year = "2024"
wn = load(year)
with codecs.open("wn.xml", "w", "utf-8") as outp:
wn.to_xml(outp, True)
wn.to_xml(outp)


if __name__ == "__main__":
Expand Down
2 changes: 2 additions & 0 deletions src/yaml/adj.all.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -127777,6 +127777,7 @@
- prince-regent
exemplifies:
- 06318142-n
ili: i10035
members:
- regent
partOfSpeech: s
Expand Down Expand Up @@ -171893,6 +171894,7 @@
- 06067070-n
example:
- inferior alveolar artery
ili: i13521
members:
- inferior
partOfSpeech: s
Expand Down
2 changes: 1 addition & 1 deletion src/yaml/adj.pert.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20329,14 +20329,14 @@
- of or relating to or characteristic of Protestant fundamentalism or its adherents
domain_topic:
- 06191860-n
ili: i16842
members:
- fundamentalist
- fundamentalistic
partOfSpeech: a
02964788-a:
definition:
- of or relating to or tending toward ideological fundamentalism
ili: i16842
members:
- fundamentalist
- fundamentalistic
Expand Down
1 change: 1 addition & 0 deletions src/yaml/noun.artifact.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22582,6 +22582,7 @@
- an alkylating agent (trade name Leukeran) used to treat some kinds of cancer
hypernym:
- 02700297-n
ili: i51874
members:
- chlorambucil
- Leukeran
Expand Down
1 change: 1 addition & 0 deletions src/yaml/verb.cognition.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5517,6 +5517,7 @@
- We can trust in our government
hypernym:
- 00685199-v
ili: i25156
members:
- trust
partOfSpeech: v
Expand Down

0 comments on commit 2f72a35

Please sign in to comment.