This repository contains the data from origianl the Cantonese Wordnet project developed by Sio, Joanna Ut-Seong, & Morgado da Costa, Luis. This project was created and is continuously updated by Joanna Ut-Seong Sio (Palacký University, Czech Republic) and Luis Morgado da Costa (Palacký University, Czech Republic).
Our wordnet contains data both in traditional characters and in Jyutping (a romanisation system for Cantonese developed by the Linguistic Society of Hong Kong in 1993). The Cantonese wordnet is currently supported in two formats:
- the Lexical Markup Framework (LMF) compatible XML, released and maintained by the Global Wordnet Association;
- a legacy TSV format adopted by the original version of the Open Multilingual Wordnet; (due to format constraints, not all data are available in the legacy format -- i.e. Jyutping forms).
Currently the Cantonese Wordnet contains over 16,500 hand-checked lemmas and respective romanizations, distributed across all major parts-of-speech. More descriptive statistics and methodology can be found in its canonical citation (see below).
We appreciated Sio, Joanna Ut-Seong, & Morgado da Costa, Luis first issue of the CantoneseWN to try to preserve this traditional language. In 2.0 verision of CantoneseWN, we aim to add more evem more functionality to CantoneseWN, by incorporating resources and data from Duolingo.
This process may take more than 6 months since there are few technical issues we encoutered right now.
The Cantonese Wordnet is released under a Creative Commons Attribution 4.0 International License (CC BY 4.0) and its canonical citation is:
Sio, Joanna Ut-Seong, & Morgado da Costa, Luis. (2019). Building the Cantonese Wordnet. In Proceedings of the Tenth Global Wordnet Conference (GWC 2019), pp. 206-215. Wroclaw, Poland.