From fde01a6cfb895e19fffaed31f1e9c587b529831f Mon Sep 17 00:00:00 2001 From: "John P. McCrae" Date: Fri, 11 Oct 2024 10:19:25 +0100 Subject: [PATCH 1/5] Change project name to Open English Wordnet "WordNet" with a capital N is a trademark of Princeton University, to avoid any issues the project will now on use a lowercase "n" --- .github/ISSUE_TEMPLATE/delete-synset.md | 2 +- .github/ISSUE_TEMPLATE/new-synset.md | 2 +- .../ISSUE_TEMPLATE/other-change-to-wordnet.md | 2 +- CONTRIBUTING.md | 2 +- DICTIONARIES.md | 2 +- EDITING.md | 4 ++-- LICENSE.md | 2 +- NEW_SYNSETS.md | 18 +++++++++--------- README.md | 16 ++++++++-------- RELEASING.md | 2 +- TOOLS.md | 12 ++++++------ scripts/from_yaml.py | 6 +++--- 12 files changed, 35 insertions(+), 35 deletions(-) diff --git a/.github/ISSUE_TEMPLATE/delete-synset.md b/.github/ISSUE_TEMPLATE/delete-synset.md index f72f0310..2c691be9 100644 --- a/.github/ISSUE_TEMPLATE/delete-synset.md +++ b/.github/ISSUE_TEMPLATE/delete-synset.md @@ -1,6 +1,6 @@ --- name: Delete synset -about: Propose that a synset be removed from WordNet (do not use for merging synsets) +about: Propose that a synset be removed from Wordnet (do not use for merging synsets) title: '' labels: delete synset assignees: '' diff --git a/.github/ISSUE_TEMPLATE/new-synset.md b/.github/ISSUE_TEMPLATE/new-synset.md index 994018ee..763b3df8 100644 --- a/.github/ISSUE_TEMPLATE/new-synset.md +++ b/.github/ISSUE_TEMPLATE/new-synset.md @@ -1,6 +1,6 @@ --- name: New synset -about: Suggest a novel word that does not correspond to an existing concept in WordNet +about: Suggest a novel word that does not correspond to an existing concept in Wordnet title: '' labels: new synset assignees: '' diff --git a/.github/ISSUE_TEMPLATE/other-change-to-wordnet.md b/.github/ISSUE_TEMPLATE/other-change-to-wordnet.md index ade70cfc..4a8f4222 100644 --- a/.github/ISSUE_TEMPLATE/other-change-to-wordnet.md +++ b/.github/ISSUE_TEMPLATE/other-change-to-wordnet.md @@ -1,5 +1,5 @@ --- -name: Other change to WordNet +name: Other change to Wordnet about: Some other change to the content of the resource title: '' labels: '' diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 486c4bfd..109fc765 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -90,7 +90,7 @@ Finally, don't forget that it is human to make mistakes! We all do. Let’s work ### Neutrality -Open English WordNet aims to be a politically neutral resource, however we accept that the description of the language necessarily involves making political statements. In the case where politics are relevant to the particular word, we follow Wikipedia in the definition of political entities, states and so forth. We ask all contributors to avoid political statements when not relevant to the discussion. +Open English Wordnet aims to be a politically neutral resource, however we accept that the description of the language necessarily involves making political statements. In the case where politics are relevant to the particular word, we follow Wikipedia in the definition of political entities, states and so forth. We ask all contributors to avoid political statements when not relevant to the discussion. ### Thanks Derived from [thoughbot's code of conduct](https://thoughtbot.com/open-source-code-of-conduct) diff --git a/DICTIONARIES.md b/DICTIONARIES.md index 227533b2..c0836fef 100644 --- a/DICTIONARIES.md +++ b/DICTIONARIES.md @@ -1,7 +1,7 @@ # Reference Dictionaries This is a list of reference dictionaries that should be used when considering -changes to Open English WordNet +changes to Open English Wordnet * [American Heritage Dictionary](https://www.ahdictionary.com/) * [Cambridge Dictionary](https://dictionary.cambridge.org/) diff --git a/EDITING.md b/EDITING.md index d8c7d91c..552e6f27 100644 --- a/EDITING.md +++ b/EDITING.md @@ -1,6 +1,6 @@ -# Editing Open English WordNet +# Editing Open English Wordnet -This document contains guidelines for making changes to Open English WordNet and fixing issues using GitHub and the English WordNet Editor (EWE) +This document contains guidelines for making changes to Open English Wordnet and fixing issues using GitHub and the English Wordnet Editor (EWE) You should take the following steps to implement a change in the resource diff --git a/LICENSE.md b/LICENSE.md index 8c95f7c9..a4639699 100644 --- a/LICENSE.md +++ b/LICENSE.md @@ -1,7 +1,7 @@ This resource is derived from Princeton WordNet under the WordNet License and further developed under the Creative Commons Attribution 4.0 International License. You may share and adapt this resource providing attribution is given to both -Princeton WordNet and the Open English WordNet team. +Princeton WordNet and the Open English Wordnet team. Creative Commons Attribution 4.0 International diff --git a/NEW_SYNSETS.md b/NEW_SYNSETS.md index 75a552af..8798dbe4 100644 --- a/NEW_SYNSETS.md +++ b/NEW_SYNSETS.md @@ -1,6 +1,6 @@ # Guidelines for the addition of new synsets -New synsets should be added with some caution to Open English WordNet. This document +New synsets should be added with some caution to Open English Wordnet. This document describes the criteria that should be applied before introducing a new synset to the resource. @@ -20,7 +20,7 @@ a reason to add more synsets of poor quality* ## Significance -A concept in Open English WordNet should be significant, this means that it should +A concept in Open English Wordnet should be significant, this means that it should be possible to easily find **at least 100 examples** of the usage of the word with this meaning. This can be done by using a search interface such as [Sketch Engine](http://sketchengine.eu) or other corpus search interface. @@ -31,21 +31,21 @@ In the case that a new sense of an existing word is being proposed, then it should be possible to propose collocates that occur with this sense of the word and these can be used to find and distinguish examples. -Open English WordNet is a dictionary not an encyclopedia. For this reason, it should +Open English Wordnet is a dictionary not an encyclopedia. For this reason, it should not contain long lists of people, places, organizations, etc. Proper nouns are generally not expected to be included in the resource and many kinds of common nouns for narrow domains or geographical usage should not be included, examples of this would include elements of different cuisines around the world. As a rule of thumb, if there is a Wikipedia page for this concept it should not be in -Open English WordNet +Open English Wordnet NB: *Deeper integration of Wikipedia/Wikidata is planned to allow these concepts -to be referred to from Open English WordNet* +to be referred to from Open English Wordnet* ## Non-compositionality -One of the goals of Open English WordNet is to support annotation. If a word or term -is already covered by Open English WordNet it should not be added. +One of the goals of Open English Wordnet is to support annotation. If a word or term +is already covered by Open English Wordnet it should not be added. For multiword terms, this means that the meaning of the term should not be derivable from its components, e.g., "French Army" could be tagged with the @@ -63,7 +63,7 @@ include: ## Distinction -The concept should be distinct from other concepts in the WordNet. You should +The concept should be distinct from other concepts in the wordnet. You should think about and check relevant synonyms. This should probably be considered in terms of a substitution check, e.g., @@ -76,7 +76,7 @@ that they can be substituted in every sense, e.g., "happy to help" but not ## Well-defined It should be possible to easily write a definition for this concept that is -distinct from other concepts in Open English WordNet. A good definition consists of +distinct from other concepts in Open English Wordnet. A good definition consists of a *genus* and a *differentia* * **Genus**: The type of the thing, often the hypernym diff --git a/README.md b/README.md index 194969ee..0d1590e5 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,18 @@ -# Open English WordNet +# Open English Wordnet -Open English WordNet is a lexical network of the English language grouping words into synsets and linking them according +Open English Wordnet is a lexical network of the English language grouping words into synsets and linking them according to relationships such as hypernymy, antonymy and meronymy. It is intended to be used in natural language processing applications and provides deep lexical information about the English language as a graph. -Open English WordNet is a fork of the [Princeton Wordnet](https://wordnet.princeton.edu/) developed under +Open English Wordnet is a fork of the [Princeton WordNet](https://wordnet.princeton.edu/) developed under an open source methodology. The quality and veracity of the resource may differ from the Princeton -WordNet and we welcome contributions. Contributions to this wordnet may eventually be incorporated into +Wordnet and we welcome contributions. Contributions to this wordnet may eventually be incorporated into future releases of Princeton WordNet. Correspondance to previous versions and wordnets in other language is provided -through the [Collaborative Interlingual Index (CILI)](https://github.com/globalwordnet/cili). The Open English WordNet is available as individual files in [GWN-LMF](http://globalwordnet.github.io/schemas/) format. +through the [Collaborative Interlingual Index (CILI)](https://github.com/globalwordnet/cili). The Open English Wordnet is available as individual files in [GWN-LMF](http://globalwordnet.github.io/schemas/) format. ## Releases -Open English WordNet is released through the [Open English WordNet website](https://en-word.net/). The versions released are +Open English Wordnet is released through the [Open English Wordnet website](https://en-word.net/). The versions released are * **2023 Edition** (Released 31st October 2023). [(LMF)](https://en-word.net/static/english-wordnet-2023.xml.gz) [(RDF)](https://en-word.net/static/english-wordnet-2023.ttl.gz) @@ -57,14 +57,14 @@ Further conversions are available through the converter [here](http://server1.nl We welcome changes, to make a change please read our [contributing guidelines](CONTRIBUTING.md) and make a pull request. -Open English WordNet is a high-quality resource that acts as a gold-standard for natural language processing, +Open English Wordnet is a high-quality resource that acts as a gold-standard for natural language processing, as such we cannot accept any automatically generated results that have not been manually validated. Please be aware that we use the [Global WordNet Association LMF](https://globalwordnet.github.io/schemas/) and please read the guidelines for using the [format](FORMAT.md) ## License -WordNet is released under [CC-BY 4.0](LICENSE.md) +Open English Wordnet is released under [CC-BY 4.0](LICENSE.md) ## References diff --git a/RELEASING.md b/RELEASING.md index d3f80d86..3a19a96e 100644 --- a/RELEASING.md +++ b/RELEASING.md @@ -34,7 +34,7 @@ wordnet novel_lemma_in_this_edition -over ## RDF Release and en-word.net site -This is based on [WordNet Angular](https://github.com/jmccrae/wordnet-angular) +This is based on [Wordnet Angular](https://github.com/jmccrae/wordnet-angular) Firstly, load the database from the XML file diff --git a/TOOLS.md b/TOOLS.md index 252986c0..5039a4f7 100644 --- a/TOOLS.md +++ b/TOOLS.md @@ -1,18 +1,18 @@ -# Tools for Working with Open English WordNet +# Tools for Working with Open English Wordnet -## English WordNet Editor +## English Wordnet Editor -This tool is a command-line editor for English WordNet. It allows you to add, delete, and modify synsets, words, and relations in English WordNet. It is written in Python and uses the `nltk` library to interact with WordNet. The tool is available on GitHub at https://github.com/jmccrae/ewe +This tool is a command-line editor for English Wordnet. It allows you to add, delete, and modify synsets, words, and relations in English Wordnet. It is written in Python and uses the `nltk` library to interact with Wordnet. The tool is available on GitHub at https://github.com/jmccrae/ewe ## WN Python Library -This is a Python library for working with WordNet. It provides a simple interface for querying WordNet and accessing synsets, words, and relations. The library is available on GitHub at https://github.com/goodmami/wn +This is a Python library for working with Wordnet. It provides a simple interface for querying Wordnet and accessing synsets, words, and relations. The library is available on GitHub at https://github.com/goodmami/wn ## OEWN-CORE Python Library -This is a Python core IO library for Open English WordNet. It provides a simple, stripped-down OEWN model, loaded from YAML, that can be saved to the same format. Extension supports XML for loading and saving. The library is available on GitHub at https://github.com/oewntk/oewn-core +This is a Python core IO library for Open English Wordnet. It provides a simple, stripped-down OEWN model, loaded from YAML, that can be saved to the same format. Extension supports XML for loading and saving. The library is available on GitHub at https://github.com/oewntk/oewn-core ## OEWNTK Kotlin Library -This is a Kotlin library for Open English WordNet. It provides a (JVM) OEWN model and a number of modules for loading or saving it notably to WNDB, SQL (Sqlite and MySql) and JSON formats. The library is available on GitHub at https://github.com/oewntk/oewntk. Binaries are available on [Maven Central](https://central.sonatype.com/namespace/io.github.oewntk). +This is a Kotlin library for Open English Wordnet. It provides a (JVM) OEWN model and a number of modules for loading or saving it notably to WNDB, SQL (Sqlite and MySql) and JSON formats. The library is available on GitHub at https://github.com/oewntk/oewntk. Binaries are available on [Maven Central](https://central.sonatype.com/namespace/io.github.oewntk). diff --git a/scripts/from_yaml.py b/scripts/from_yaml.py index 34ba18e4..92700a4f 100644 --- a/scripts/from_yaml.py +++ b/scripts/from_yaml.py @@ -173,7 +173,7 @@ def load(year="2022"): """ Load wordnet from YAML files """ - wn = Lexicon("oewn", "Open Engish WordNet", "en", + wn = Lexicon("oewn", "Open Engish Wordnet", "en", "english-wordnet@googlegroups.com", "https://creativecommons.org/licenses/by/4.0", year, @@ -219,9 +219,9 @@ def load(year="2022"): for synset in wn.synsets: if synset.lex_name not in by_lex_name: by_lex_name[synset.lex_name] = Lexicon( - "oewn", "Open English WordNet", "en", + "oewn", "Open English Wordnet", "en", "john@mccr.ae", "https://wordnet.princeton.edu/license-and-commercial-use", - "2019", "https://github.com/globalwordnet/english-wordnet") + year, "https://github.com/globalwordnet/english-wordnet") by_lex_name[synset.lex_name].add_synset(synset) return wn From 6c3d5341fa4191709022965c6c3c3891f9650369 Mon Sep 17 00:00:00 2001 From: "John P. McCrae" Date: Tue, 29 Oct 2024 09:59:24 +0000 Subject: [PATCH 2/5] Change from_yaml.py to not use the relaxed schema --- scripts/from_yaml.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/from_yaml.py b/scripts/from_yaml.py index e2e07acb..6d87f33c 100644 --- a/scripts/from_yaml.py +++ b/scripts/from_yaml.py @@ -369,7 +369,7 @@ def main(): year = "2024" wn = load(year) with codecs.open("wn.xml", "w", "utf-8") as outp: - wn.to_xml(outp, True) + wn.to_xml(outp) if __name__ == "__main__": From 0aea1461ac37c816bf0f293ec7ba25e89d5fd18c Mon Sep 17 00:00:00 2001 From: "John P. McCrae" Date: Thu, 31 Oct 2024 14:51:14 +0000 Subject: [PATCH 3/5] Add a few missing ILI links --- src/yaml/adj.all.yaml | 2 ++ src/yaml/adj.pert.yaml | 2 +- src/yaml/noun.artifact.yaml | 1 + src/yaml/verb.cognition.yaml | 1 + 4 files changed, 5 insertions(+), 1 deletion(-) diff --git a/src/yaml/adj.all.yaml b/src/yaml/adj.all.yaml index 96af9f18..bfdd1385 100644 --- a/src/yaml/adj.all.yaml +++ b/src/yaml/adj.all.yaml @@ -127764,6 +127764,7 @@ - prince-regent exemplifies: - 06318142-n + ili: i10035 members: - regent partOfSpeech: s @@ -171877,6 +171878,7 @@ - 06067070-n example: - inferior alveolar artery + ili: i13521 members: - inferior partOfSpeech: s diff --git a/src/yaml/adj.pert.yaml b/src/yaml/adj.pert.yaml index 4e655ee5..a9789776 100644 --- a/src/yaml/adj.pert.yaml +++ b/src/yaml/adj.pert.yaml @@ -20329,7 +20329,6 @@ - of or relating to or characteristic of Protestant fundamentalism or its adherents domain_topic: - 06191860-n - ili: i16842 members: - fundamentalist - fundamentalistic @@ -20337,6 +20336,7 @@ 02964788-a: definition: - of or relating to or tending toward ideological fundamentalism + ili: i16842 members: - fundamentalist - fundamentalistic diff --git a/src/yaml/noun.artifact.yaml b/src/yaml/noun.artifact.yaml index a21088b3..0f36e7f9 100644 --- a/src/yaml/noun.artifact.yaml +++ b/src/yaml/noun.artifact.yaml @@ -22582,6 +22582,7 @@ - an alkylating agent (trade name Leukeran) used to treat some kinds of cancer hypernym: - 02700297-n + ili: i51874 members: - chlorambucil - Leukeran diff --git a/src/yaml/verb.cognition.yaml b/src/yaml/verb.cognition.yaml index 0f62acdb..406a41fa 100644 --- a/src/yaml/verb.cognition.yaml +++ b/src/yaml/verb.cognition.yaml @@ -5517,6 +5517,7 @@ - We can trust in our government hypernym: - 00685199-v + ili: i25156 members: - trust partOfSpeech: v From 8343b418da2d355edaf5f0310344cb24227b190c Mon Sep 17 00:00:00 2001 From: "John P. McCrae" Date: Fri, 1 Nov 2024 14:42:05 +0000 Subject: [PATCH 4/5] Update release instructions --- RELEASING.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/RELEASING.md b/RELEASING.md index 3a19a96e..5170ddf3 100644 --- a/RELEASING.md +++ b/RELEASING.md @@ -8,8 +8,7 @@ This is generated by Python scripts in this repo Edit the file `scripts/merge.py` and `WNDB_License.txt` to update the version. ``` -python scripts/from-yaml.py -python scripts/merge.py +python scripts/from_yaml.py cp wn.xml english-wordnet-20XX.xml gzip english-wordnet-20XX.xml ``` @@ -22,7 +21,7 @@ First unzip the previous release into a folder called `oewn20XX/` ``` ./gwn -i english-wordnet-20XX.xml -o oewn20XX/ -f WNLMF -t WNDB --wordnet-license WNDB_License.txt -zip english-wordnet-20XX.zip -R oewn20XX/ +zip -r english-wordnet-20XX.zip oewn20XX/ ``` Testing: @@ -39,15 +38,15 @@ This is based on [Wordnet Angular](https://github.com/jmccrae/wordnet-angular) Firstly, load the database from the XML file ``` -cargo +nightly build --release +cargo build --release rm -f wordnet.db -cargo +nightly run --release --bin wordnet-angular -- --reload --wn ../../globalwordnet/english-wordnet/wn.xml -s en +cargo run --release --bin wordnet-angular -- --reload --wn ../../globalwordnet/english-wordnet/wn.xml -s en ``` Now dump the RDF data ``` -cargo +nightly run --release --bin wordnet-rdf-dump -- -s en > wn.ttl +cargo run --release --bin wordnet-rdf-dump -- -s en > wn.ttl rapper -i turtle -o turtle wn.ttl > english-wordnet-2023.ttl gzip english-wordnet-2023.ttl ``` From 2ccb86a5b584ad913775592c5e93ed8fc29e690e Mon Sep 17 00:00:00 2001 From: "John P. McCrae" Date: Fri, 1 Nov 2024 15:01:21 +0000 Subject: [PATCH 5/5] Add details of 2024 release --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 0d1590e5..5a2f8bf8 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,9 @@ through the [Collaborative Interlingual Index (CILI)](https://github.com/globalw Open English Wordnet is released through the [Open English Wordnet website](https://en-word.net/). The versions released are +* **2024 Edition** (Released 1st November 2024). [(LMF)](https://en-word.net/static/english-wordnet-2024.xml.gz) +[(RDF)](https://en-word.net/static/english-wordnet-2024.ttl.gz) +[(WNDB)](https://en-word.net/static/english-wordnet-2024.zip) * **2023 Edition** (Released 31st October 2023). [(LMF)](https://en-word.net/static/english-wordnet-2023.xml.gz) [(RDF)](https://en-word.net/static/english-wordnet-2023.ttl.gz) [(WNDB)](https://en-word.net/static/english-wordnet-2023.zip) @@ -34,6 +37,7 @@ The size of each resource is as follows | Edition | Words | Synsets | Relations | |---------|---------|---------|-----------| +| 2024 | 161,705 | 120,630 | 418,168 | | 2023 | 161,338 | 120,135 | 415,905 | | 2022 | 161,221 | 120,068 | 386,437 | | 2021 | 163,161 | 120,039 | 384,505 |