From 0d5c6949f17f966611bebc9e8dd372db8c0c3811 Mon Sep 17 00:00:00 2001 From: Christian Chiarcos Date: Thu, 3 Aug 2023 23:49:31 +0200 Subject: [PATCH 1/9] inuktitut data from GDrive --- data/polysynthetic/Readme.md | 305 ++++++++++++++++++++++ data/polysynthetic/atausiulugu.morphs.tsv | 23 ++ data/polysynthetic/atausiulugu.tsv | 25 ++ draft.md | 4 +- 4 files changed, 355 insertions(+), 2 deletions(-) create mode 100644 data/polysynthetic/Readme.md create mode 100644 data/polysynthetic/atausiulugu.morphs.tsv create mode 100644 data/polysynthetic/atausiulugu.tsv diff --git a/data/polysynthetic/Readme.md b/data/polysynthetic/Readme.md new file mode 100644 index 0000000..51a5c5e --- /dev/null +++ b/data/polysynthetic/Readme.md @@ -0,0 +1,305 @@ +# Polysynthetic languages + +Example: Inuktitut, an Eskimo-Aleutic language from the Eastern Canadian Arctic, official language in Nunavut (Canada). Our analysis is based on the Uqialaut analyzer. Uqailaut is an ad-hoc implementation in Java, but roughly equivalent to an FST implementation (as available for the closely related Kalaallisut language from Western Greenland). Inuktitut poses a number of unique challenges because of its extremely rich morphology. + +The Uqailaut Inuktitut data was originally included in the OntoLex-Morph GDrive and migrated to OntoLex-Morph GitHub on 2021-10-06. + +## Inuktitut + +Features; agglutination, assimilation, incorporation, polypersonal +agreement, head marking + +- Web demo analyzer (sample output): + > [[http://www.inuktitutcomputing.ca/Uqailaut/Demo/demoword.php?lang=en&demoword=imiqtarvingmunngauliqhlutik]{.underline}](http://www.inuktitutcomputing.ca/Uqailaut/Demo/demoword.php?lang=en&demoword=imiqtarvingmunngauliqhlutik) + +- Morphological inventory: + > [[http://www.inuktitutcomputing.ca/Technocrats/ILFT.php#morphology]{.underline}](http://www.inuktitutcomputing.ca/Technocrats/ILFT.php#morphology), + > [[https://uqausiit.ca/morpheme-list/infix]{.underline}](https://uqausiit.ca/morpheme-list/infix) + +- We consider a single full form + +## Modelling challenges + +### 1. Encode ambiguity in derivation + +- **sample**: [the word *atausiulugu*](atausiulugu.tsv): Verb feat. incorporation and polypersonal agreement, produced by Uqailaut analyzer + +- Many morphemes are ambiguous + +- It would be good to have a compact representation where all possible + > segmentations are represented in a directed acyclic graph (DAG) + > rather than as a sequence. + +- If not, we run into a combinatoric explosion, here; + + - Given *abcedefg* + + - If the sequence *bc* can always be analysed as either *b-c* or + > *bc* + + - + + - *8,7,7,7,6,6,6,5* + + - And the sequence *def* can always be analysed as either *d-e-f* + > or *de-f* or *d-ef* or *def* + + - And everything else is unambiguous + + - Then a there are 2 \* 4 possible morphological analyses, with 52 + > (!) different morphological segments + + - But as a DAG, this can be represented as one path (here using \| + > to separate possible alternative sub-paths): + + - *a-(b-c\|bc)-e-(d-e\|de-f\|de-f\|d-ef\|def)-g* + + - This requires only 15 morphological segments (and clever + > compression can reduce that a bit mit) + + +# not properly integrated yet + +### For modelling: Necessary morphemes and allomorphs + +Morphs from parser: [atausiulugu.morphs.tsv](atausiulugu.morphs.tsv) + +Data from http://www.inuktitutcomputing.ca/DataBase/info.php +(does not contain inflectional morphemes) + +### Roots + +{ata:ata/1n} + +ata ᐊᑕ + +Meaning bottom + +Type nominal root + +Source Spalding, Alex, \"Inuktitut - A Multi-Dialectal Outline +Dictionary\". Nunavut Arctic College, Iqaluit, Nunavut, Canada, 1998. + +{atausi:atausiq/1n} + +atausiq ᐊᑕᐅᓯᖅ + +Meaning one + +Semantic category number; quantity; + +Type nominal root + +Source Spalding, Alex, \"Inuktitut - A Multi-Dialectal Outline +Dictionary\". Nunavut Arctic College, Iqaluit, Nunavut, Canada, 1998. + +### Verbal inflection + +{gu:guk/tv-imp-2s-3s} + +\[abbrev means transitive verb + 2s subject/ergative argument + 3s +object/abs argument\] + +{order: you \...him/her/it} + +### Derivation and incorporation + +#### Sample entry + +{lu:luk/3vv} + +luk ᓗᒃ + +Meaning to perform an action in a poor or bad manner + +Type verb-to-verb (VV) suffix: attaches to a verb root or verb stem, and +produces a verb stem + +Mobility this suffix is mobile: it can be used at will with all roots +and stems of the proper type + +Position this suffix must be followed by another suffix, i.e. it cannot +occur in word-final position + +Forms and Behaviours + +After \'a\', \'i\', \'u\' When the stem ends with \'a\', \'i\' or \'u\', +this affix takes the form luk ; it has no effect on the stem. + +After \'t\' When the stem ends with \'t\', this affix takes the form luk +; it deletes the end character of the stem \[\_t + luk → \_luk\]. + +After \'k\' When the stem ends with \'k\', this affix takes the form luk +; it deletes the end character of the stem \[\_k + luk → \_luk\]. + +After \'q\' When the stem ends with \'q\', this affix takes the form luk +; it deletes the end character of the stem \[\_q + luk → \_luk\]. + +**Modelling challenges**: + +- allomorphy rules + +- Cardinality and type restrictions (vn morpheme requires verb and + > produces n, \[most\] verbal morphemes cannot be final, etc.) + +#### Incorporation (basically a verbal derivation from a noun): + +{si:liq/2nv} + +{u:u/1nv} + +\[read: if applied to a noun, return a verb; number is number of lexical +entry for a particular lemma\] + +#### Verb-to-verb derivations: + +{si:si/2vv} + +{si:siq/1vv} + +{u:uq/3vv} + +#### Verb-to-noun derivations: + +{si:siq/2vn} + +#### Noun-to-noun derivations: + +{u:ut/2nn} + +# FYI: Other data + +Not for modelling but FYI + +- FYI: Corpus data (CoNLL data, not to be modelled, but the individual + > morphemes and their combinatorics need to be modelled + +\# Hansard + +1 Hansard Hansard \_ \_ + +\# nunavut kanata + +1 nunavut nunavut {nunavut:nunavut/1n} {Nunavut} + +1 nunavut nunavut {nuna:nuna/1n}{vut:vut/tn-nom-s-1p} {(1) land (2) +country}{nominative: our (one thing to us many) } + +1 nunavut nunavut {nuna:nuna/1n}{vut:vut/tn-nom-p-1p} {(1) land (2) +country}{nominative: our (many things to us many) } + +2 kanata kanata {kanata:kanata/1n} {Canada} + +\# nunavut maligaliurvia + +1 nunavut nunavut {nunavut:nunavut/1n} {Nunavut} + +1 nunavut nunavut {nuna:nuna/1n}{vut:vut/tn-nom-s-1p} {(1) land (2) +country}{nominative: our (one thing to us many) } + +1 nunavut nunavut {nuna:nuna/1n}{vut:vut/tn-nom-p-1p} {(1) land (2) +country}{nominative: our (many things to us many) } + +2 maligaliurvia maligaliurvia +{maligaliurvi:maligaliurvik/1n}{a:nga/tn-nom-s-4s} {legislative +assembly}{nominative: his;her;its (one thing, different person) } + +2 maligaliurvia maligaliurvia +{maliga:maligaq/1n}{liur:liuq/1nv}{vi:vik/3vn}{a:nga/tn-nom-s-4s} +{bill}{construction in progress: \'to be building s.t.\' (trans.: for +s.o.)}{place where the action of the verb takes place}{nominative: +his;her;its (one thing, different person) } + +2 maligaliurvia maligaliurvia +{maliga:maligaq/1n}{li:li/3nv}{ur:uq/3vv}{vi:vik/3vn}{a:nga/tn-nom-s-4s} +{bill}{to build, make something (trans.: for s.o.); with certain words: +find s.t.}{frequentative: many subjects; many objects}{place where the +action of the verb takes place}{nominative: his;her;its (one thing, +different person) } + +2 maligaliurvia maligaliurvia +{maliga:maligaq/1n}{li:lik/3nv}{ur:uq/3vv}{vi:vik/3vn}{a:nga/tn-nom-s-4s} +{bill}{to give, to provide, to offer, to fetch s.t. (trans.: to +s.o.)}{frequentative: many subjects; many objects}{place where the +action of the verb takes place}{nominative: his;her;its (one thing, +different person) } + +2 maligaliurvia maligaliurvia +{maliga:maligaq/1n}{li:lik/2nv}{ur:uq/3vv}{vi:vik/3vn}{a:nga/tn-nom-s-4s} +{bill}{to go to; to have gone to; to come to; to find by +chance}{frequentative: many subjects; many objects}{place where the +action of the verb takes place}{nominative: his;her;its (one thing, +different person) } + +2 maligaliurvia maligaliurvia +{maliga:maligaq/1n}{li:liq/3nv}{ur:uq/3vv}{vi:vik/3vn}{a:nga/tn-nom-s-4s} +{bill}{to go to, toward}{frequentative: many subjects; many +objects}{place where the action of the verb takes place}{nominative: +his;her;its (one thing, different person) } + +2 maligaliurvia maligaliurvia +{maliga:maligaq/1n}{li:liq/2nv}{ur:uq/3vv}{vi:vik/3vn}{a:nga/tn-nom-s-4s} +{bill}{to provide, supply; to put s.t. (trans.: to, on +s.o.)}{frequentative: many subjects; many objects}{place where the +action of the verb takes place}{nominative: his;her;its (one thing, +different person) } + +2 maligaliurvia maligaliurvia +{mali:malik/1v}{ga:gaq/2vv}{li:li/2vv}{ur:uq/3vv}{vi:vik/3vn}{a:nga/tn-nom-s-4s} +{(1) to follow (trans. \[-mik\]: s.o. or s.t.) (2) to obey (trans. +\[-mik\]: s.o.)}{frequentative: several times}{to make that s.t. or s.o. +\... (refl.: to become); to make s.t. (trans.: to s.o.)}{frequentative: +many subjects; many objects}{place where the action of the verb takes +place}{nominative: his;her;its (one thing, different person) } + +2 maligaliurvia maligaliurvia +{mali:malik/1v}{ga:gaq/1vn}{li:li/3nv}{ur:uq/3vv}{vi:vik/3vn}{a:nga/tn-nom-s-4s} +{(1) to follow (trans. \[-mik\]: s.o. or s.t.) (2) to obey (trans. +\[-mik\]: s.o.)}{forms a noun with an inherently passive meaning: +someone/something that one \...}{to build, make something (trans.: for +s.o.); with certain words: find s.t.}{frequentative: many subjects; many +objects}{place where the action of the verb takes place}{nominative: +his;her;its (one thing, different person) } + +2 maligaliurvia maligaliurvia +{mali:malik/1v}{ga:gaq/1vn}{li:lik/3nv}{ur:uq/3vv}{vi:vik/3vn}{a:nga/tn-nom-s-4s} +{(1) to follow (trans. \[-mik\]: s.o. or s.t.) (2) to obey (trans. +\[-mik\]: s.o.)}{forms a noun with an inherently passive meaning: +someone/something that one \...}{to give, to provide, to offer, to fetch +s.t. (trans.: to s.o.)}{frequentative: many subjects; many +objects}{place where the action of the verb takes place}{nominative: +his;her;its (one thing, different person) } + +2 maligaliurvia maligaliurvia +{mali:malik/1v}{ga:gaq/1vn}{li:lik/2nv}{ur:uq/3vv}{vi:vik/3vn}{a:nga/tn-nom-s-4s} +{(1) to follow (trans. \[-mik\]: s.o. or s.t.) (2) to obey (trans. +\[-mik\]: s.o.)}{forms a noun with an inherently passive meaning: +someone/something that one \...}{to go to; to have gone to; to come to; +to find by chance}{frequentative: many subjects; many objects}{place +where the action of the verb takes place}{nominative: his;her;its (one +thing, different person) } + +2 maligaliurvia maligaliurvia +{mali:malik/1v}{ga:gaq/1vn}{li:liq/3nv}{ur:uq/3vv}{vi:vik/3vn}{a:nga/tn-nom-s-4s} +{(1) to follow (trans. \[-mik\]: s.o. or s.t.) (2) to obey (trans. +\[-mik\]: s.o.)}{forms a noun with an inherently passive meaning: +someone/something that one \...}{to go to, toward}{frequentative: many +subjects; many objects}{place where the action of the verb takes +place}{nominative: his;her;its (one thing, different person) } + +2 maligaliurvia maligaliurvia +{mali:malik/1v}{ga:gaq/2vv}{li:liq/1vv}{ur:uq/3vv}{vi:vik/3vn}{a:nga/tn-nom-s-4s} +{(1) to follow (trans. \[-mik\]: s.o. or s.t.) (2) to obey (trans. +\[-mik\]: s.o.)}{frequentative: several times}{on-going action; present +progressive tense}{frequentative: many subjects; many objects}{place +where the action of the verb takes place}{nominative: his;her;its (one +thing, different person) } + +2 maligaliurvia maligaliurvia +{mali:malik/1v}{ga:gaq/1vn}{li:liq/2nv}{ur:uq/3vv}{vi:vik/3vn}{a:nga/tn-nom-s-4s} +{(1) to follow (trans. \[-mik\]: s.o. or s.t.) (2) to obey (trans. +\[-mik\]: s.o.)}{forms a noun with an inherently passive meaning: +someone/something that one \...}{to provide, supply; to put s.t. +(trans.: to, on s.o.)}{frequentative: many subjects; many objects}{place +where the action of the verb takes place}{nominative: his;her;its (one +thing, different person) } diff --git a/data/polysynthetic/atausiulugu.morphs.tsv b/data/polysynthetic/atausiulugu.morphs.tsv new file mode 100644 index 0000000..c063b93 --- /dev/null +++ b/data/polysynthetic/atausiulugu.morphs.tsv @@ -0,0 +1,23 @@ +{ata:ata/1n} {bottom} +{atausi:atausiq/1n} {one} +{gu:guk/tv-imp-2s-3s} {order: you \...him/her/it} +{gu:guk/tv-imp-2s-3s} {order: you \...him/her/it} +{gu:guk/tv-imp-2s-3s} {order: you \...him/her/it} +{lugu:lugu/tv-part-1d-3s-fut} {part. future: while we (two) \...him/her/it} +{lugu:lugu/tv-part-1p-3s-fut} {part. future: while we (many) \...him/her/it} +{lugu:lugu/tv-part-1s-3s-fut} {part. future: while I \...him/her/it} +{lugu:lugu/tv-part-2d-3s-fut} {part. future: while you (two) \...him/her/it} +{lugu:lugu/tv-part-2p-3s-fut} {part. future: while you (many) \...him/her/it} +{lugu:lugu/tv-part-2s-3s-fut} {part. future: while you \...him/her/it} +{lugu:lugu/tv-part-4d-3s-fut} {part. future: while they (two) \...him/her/it} +{lugu:lugu/tv-part-4p-3s-fut} {part. future: while they (many) \...him/her/it} +{lugu:lugu/tv-part-4s-3s-fut} {part. future: while he/she/it \...him/her/it} +{lu:luk/3vv} {to perform an action in a poor or bad manner} +{lu:luk/3vv} {to perform an action in a poor or bad manner} +{si:liq/2nv} {to provide, supply; to put s.t. (trans.: to, on s.o.)} +{si:si/2vv} {the action is being done now, where it was not the case before; readiness, commencement of action or motion} +{si:siq/1vv} {to put or bring out, to be put or brought up for some natural process; to be waiting for an action to be performed or completed} +{si:siq/2vn} {custom; way; habit; manner of doing s.t.} +{u:u/1nv} {existence; is} +{u:uq/3vv} {frequentative: many subjects; many objects} +{u:ut/2nn} {bag, container for; s.t. which has\...} diff --git a/data/polysynthetic/atausiulugu.tsv b/data/polysynthetic/atausiulugu.tsv new file mode 100644 index 0000000..f1f50b3 --- /dev/null +++ b/data/polysynthetic/atausiulugu.tsv @@ -0,0 +1,25 @@ +# form: atausiulugu +# COL1: morphological segmentation, COL2: morphological glosses +{atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-1s-3s-fut} {one}{existence; is}{part. future: while I \...him/her/it} +{atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-2s-3s-fut} {one}{existence; is}{part. future: while you \...him/her/it} +{atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-4s-3s-fut} {one}{existence; is}{part. future: while he/she/it \...him/her/it} +{atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-1d-3s-fut} {one}{existence; is}{part. future: while we (two) \...him/her/it} +{atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-2d-3s-fut} {one}{existence; is}{part. future: while you (two) \...him/her/it} +{atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-4d-3s-fut} {one}{existence; is}{part. future: while they (two) \...him/her/it} +{atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-1p-3s-fut} {one}{existence; is}{part. future: while we (many) \...him/her/it} +{atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-2p-3s-fut} {one}{existence; is}{part. future: while you (many) \...him/her/it} +{atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-4p-3s-fut} {one}{existence; is}{part. future: while they (many) \...him/her/it} +{atausi:atausiq/1n}{u:u/1nv}{lu:luk/3vv}{gu:guk/tv-imp-2s-3s} {one}{existence; is}{to perform an action in a poor or bad manner}{order: you \...him/her/it} +{ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-1s-3s-fut} {bottom}{existence; is}{custom; way; habit; manner of doing s.t.}{existence; is}{part. future: while I \...him/her/it} +{ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-2s-3s-fut} {bottom}{existence; is}{custom; way; habit; manner of doing s.t.}{existence; is}{part. future: while you \...him/her/it} +{ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-4s-3s-fut} {bottom}{existence; is}{custom; way; habit; manner of doing s.t.}{existence; is}{part. future: while he/she/it \...him/her/it} +{ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-1d-3s-fut} {bottom}{existence; is}{custom; way; habit; manner of doing s.t.}{existence; is}{part. future: while we (two) \...him/her/it} +{ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-2d-3s-fut} {bottom}{existence; is}{custom; way; habit; manner of doing s.t.}{existence; is}{part. future: while you (two) \...him/her/it} +{ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-4d-3s-fut} {bottom}{existence; is}{custom; way; habit; manner of doing s.t.}{existence; is}{part. future: while they (two) \...him/her/it} +{ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-1p-3s-fut} {bottom}{existence; is}{custom; way; habit; manner of doing s.t.}{existence; is}{part. future: while we (many) \...him/her/it} +{ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-2p-3s-fut} {bottom}{existence; is}{custom; way; habit; manner of doing s.t.}{existence; is}{part. future: while you (many) \...him/her/it} +{ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-4p-3s-fut} {bottom}{existence; is}{custom; way; habit; manner of doing s.t.}{existence; is}{part. future: while they (many) \...him/her/it} +{ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lu:luk/3vv}{gu:guk/tv-imp-2s-3s} {bottom}{existence; is}{custom; way; habit; manner of doing s.t.}{existence; is}{to perform an action in a poor or bad manner}{order: you \...him/her/it} +{ata:ata/1n}{u:u/1nv}{si:si/2vv}{u:uq/3vv}{lu:luk/3vv}{gu:guk/tv-imp-2s-3s} {bottom}{existence; is}{the action is being done now, where it was not the case before; readiness, commencement of action or motion}{frequentative: many subjects; many objects}{to perform an action in a poor or bad manner}{order: you \...him/her/it} +{ata:ata/1n}{u:ut/2nn}{si:liq/2nv}{u:uq/3vv}{lu:luk/3vv}{gu:guk/tv-imp-2s-3s} {bottom}{bag, container for; s.t. which has\...}{to provide, supply; to put s.t. (trans.: to, on s.o.)}{frequentative: many subjects; many objects}{to perform an action in a poor or bad manner}{order: you \...him/her/it} +{ata:ata/1n}{u:u/1nv}{si:siq/1vv}{u:uq/3vv}{lu:luk/3vv}{gu:guk/tv-imp-2s-3s} {bottom}{existence; is}{to put or bring out, to be put or brought up for some natural process; to be waiting for an action to be performed or completed}{frequentative: many subjects; many objects}{to perform an action in a poor or bad manner}{order: you \...him/her/it} \ No newline at end of file diff --git a/draft.md b/draft.md index 38d0f98..8b5ad4c 100644 --- a/draft.md +++ b/draft.md @@ -496,13 +496,13 @@ Accordingly, the morphological derivation of German *Schönheit* "beauty" can be ontolex:lexicalForm [ ontolex:writtenRep "-heit"@de ]. > --- -> class **morph:CompoundRelation** is a `morph:WordFormationRelation` that connects a (lexical entry representing a) morphological consituent of a compound with the (lexical entry representing the) compound. This is a reification of `decomp:subTerm`: A compound relation entails that the constituent is a subterm of the compound. +> class **morph:CompoundRelation** is a `morph:WordFormationRelation` that connects a (lexical entry representing a) morphological consituent of a compound with the (lexical entry representing the) compound. This is a reification of `decomp:subterm`: A compound relation entails that the constituent is a subterm of the compound. > > ---- CC 2022-02-23 (offline): is that really necessary? why can't we just use `decomp` ? I guess this is for cases in which we don't know the head, but then, we can also just omit the source. -telco 2022-04-20: is to be redefined as a reification of `decomp:subTerm`. +telco 2022-04-20: is to be redefined as a reification of `decomp:subterm`. > --- > class **morph:CompoundHead** is a `morph:WordFormationRelation` that connects the (lexical entry representing the) morphological head of a compound with the (lexical entry representing the) compound. From 86cc20016ba8212ee78a25d47912fe13d136fc39 Mon Sep 17 00:00:00 2001 From: Christian Chiarcos Date: Thu, 3 Aug 2023 23:54:05 +0200 Subject: [PATCH 2/9] spalding --- data/polysynthetic/Readme.md | 79 ++-------------------- data/polysynthetic/atausiulugu.spalding.md | 69 +++++++++++++++++++ 2 files changed, 73 insertions(+), 75 deletions(-) create mode 100644 data/polysynthetic/atausiulugu.spalding.md diff --git a/data/polysynthetic/Readme.md b/data/polysynthetic/Readme.md index 51a5c5e..233b952 100644 --- a/data/polysynthetic/Readme.md +++ b/data/polysynthetic/Readme.md @@ -22,7 +22,10 @@ agreement, head marking ### 1. Encode ambiguity in derivation -- **sample**: [the word *atausiulugu*](atausiulugu.tsv): Verb feat. incorporation and polypersonal agreement, produced by Uqailaut analyzer +- **sample** + - [the word *atausiulugu*](atausiulugu.tsv): Verb feat. incorporation and polypersonal agreement, produced by Uqailaut analyzer + - Necessary morphemes and allomorphs from parser: [atausiulugu.morphs.tsv](atausiulugu.morphs.tsv) + - Root and derivational morphology from Spalding (1998): [atausiulugu.spalding.md](atausiulugu.spalding.md) - Many morphemes are ambiguous @@ -60,80 +63,6 @@ agreement, head marking # not properly integrated yet -### For modelling: Necessary morphemes and allomorphs - -Morphs from parser: [atausiulugu.morphs.tsv](atausiulugu.morphs.tsv) - -Data from http://www.inuktitutcomputing.ca/DataBase/info.php -(does not contain inflectional morphemes) - -### Roots - -{ata:ata/1n} - -ata ᐊᑕ - -Meaning bottom - -Type nominal root - -Source Spalding, Alex, \"Inuktitut - A Multi-Dialectal Outline -Dictionary\". Nunavut Arctic College, Iqaluit, Nunavut, Canada, 1998. - -{atausi:atausiq/1n} - -atausiq ᐊᑕᐅᓯᖅ - -Meaning one - -Semantic category number; quantity; - -Type nominal root - -Source Spalding, Alex, \"Inuktitut - A Multi-Dialectal Outline -Dictionary\". Nunavut Arctic College, Iqaluit, Nunavut, Canada, 1998. - -### Verbal inflection - -{gu:guk/tv-imp-2s-3s} - -\[abbrev means transitive verb + 2s subject/ergative argument + 3s -object/abs argument\] - -{order: you \...him/her/it} - -### Derivation and incorporation - -#### Sample entry - -{lu:luk/3vv} - -luk ᓗᒃ - -Meaning to perform an action in a poor or bad manner - -Type verb-to-verb (VV) suffix: attaches to a verb root or verb stem, and -produces a verb stem - -Mobility this suffix is mobile: it can be used at will with all roots -and stems of the proper type - -Position this suffix must be followed by another suffix, i.e. it cannot -occur in word-final position - -Forms and Behaviours - -After \'a\', \'i\', \'u\' When the stem ends with \'a\', \'i\' or \'u\', -this affix takes the form luk ; it has no effect on the stem. - -After \'t\' When the stem ends with \'t\', this affix takes the form luk -; it deletes the end character of the stem \[\_t + luk → \_luk\]. - -After \'k\' When the stem ends with \'k\', this affix takes the form luk -; it deletes the end character of the stem \[\_k + luk → \_luk\]. - -After \'q\' When the stem ends with \'q\', this affix takes the form luk -; it deletes the end character of the stem \[\_q + luk → \_luk\]. **Modelling challenges**: diff --git a/data/polysynthetic/atausiulugu.spalding.md b/data/polysynthetic/atausiulugu.spalding.md new file mode 100644 index 0000000..58a4999 --- /dev/null +++ b/data/polysynthetic/atausiulugu.spalding.md @@ -0,0 +1,69 @@ +Complementary data from http://www.inuktitutcomputing.ca/DataBase/info.php (does not contain inflectional morphemes) + +### Roots + +{ata:ata/1n} + +ata ᐊᑕ + +Meaning bottom + +Type nominal root + +Source Spalding, Alex, \"Inuktitut - A Multi-Dialectal Outline +Dictionary\". Nunavut Arctic College, Iqaluit, Nunavut, Canada, 1998. + +{atausi:atausiq/1n} + +atausiq ᐊᑕᐅᓯᖅ + +Meaning one + +Semantic category number; quantity; + +Type nominal root + +Source Spalding, Alex, \"Inuktitut - A Multi-Dialectal Outline +Dictionary\". Nunavut Arctic College, Iqaluit, Nunavut, Canada, 1998. + +### Verbal inflection + +{gu:guk/tv-imp-2s-3s} + +\[abbrev means transitive verb + 2s subject/ergative argument + 3s +object/abs argument\] + +{order: you \...him/her/it} + +### Derivation and incorporation + +#### Sample entry + +{lu:luk/3vv} + +luk ᓗᒃ + +Meaning to perform an action in a poor or bad manner + +Type verb-to-verb (VV) suffix: attaches to a verb root or verb stem, and +produces a verb stem + +Mobility this suffix is mobile: it can be used at will with all roots +and stems of the proper type + +Position this suffix must be followed by another suffix, i.e. it cannot +occur in word-final position + +Forms and Behaviours + +After \'a\', \'i\', \'u\' When the stem ends with \'a\', \'i\' or \'u\', +this affix takes the form luk ; it has no effect on the stem. + +After \'t\' When the stem ends with \'t\', this affix takes the form luk +; it deletes the end character of the stem \[\_t + luk → \_luk\]. + +After \'k\' When the stem ends with \'k\', this affix takes the form luk +; it deletes the end character of the stem \[\_k + luk → \_luk\]. + +After \'q\' When the stem ends with \'q\', this affix takes the form luk +; it deletes the end character of the stem \[\_q + luk → \_luk\]. From 678dcc31d5dc690889d1da6d9598fb3db4f0690e Mon Sep 17 00:00:00 2001 From: Christian Chiarcos Date: Sun, 6 Aug 2023 10:03:03 +0200 Subject: [PATCH 3/9] Inuktitut sample morphs as OntoLex-Morph --- data/polysynthetic/Readme.md | 45 ++--- data/polysynthetic/atausiulugu.ttl | 280 +++++++++++++++++++++++++++++ 2 files changed, 292 insertions(+), 33 deletions(-) create mode 100644 data/polysynthetic/atausiulugu.ttl diff --git a/data/polysynthetic/Readme.md b/data/polysynthetic/Readme.md index 233b952..21cfea5 100644 --- a/data/polysynthetic/Readme.md +++ b/data/polysynthetic/Readme.md @@ -25,40 +25,19 @@ agreement, head marking - **sample** - [the word *atausiulugu*](atausiulugu.tsv): Verb feat. incorporation and polypersonal agreement, produced by Uqailaut analyzer - Necessary morphemes and allomorphs from parser: [atausiulugu.morphs.tsv](atausiulugu.morphs.tsv) + - direct OntoLex rendering (no particular problems, but illustrates `morph:baseConstraint` and treatment of lexinfo gaps): [atausiulugu.ttl](atausiulugu.ttl). - Root and derivational morphology from Spalding (1998): [atausiulugu.spalding.md](atausiulugu.spalding.md) - -- Many morphemes are ambiguous - -- It would be good to have a compact representation where all possible - > segmentations are represented in a directed acyclic graph (DAG) - > rather than as a sequence. - -- If not, we run into a combinatoric explosion, here; - - - Given *abcedefg* - - - If the sequence *bc* can always be analysed as either *b-c* or - > *bc* - - - - - - *8,7,7,7,6,6,6,5* - - - And the sequence *def* can always be analysed as either *d-e-f* - > or *de-f* or *d-ef* or *def* - - - And everything else is unambiguous - - - Then a there are 2 \* 4 possible morphological analyses, with 52 - > (!) different morphological segments - - - But as a DAG, this can be represented as one path (here using \| - > to separate possible alternative sub-paths): - - - *a-(b-c\|bc)-e-(d-e\|de-f\|de-f\|d-ef\|def)-g* - - - This requires only 15 morphological segments (and clever - > compression can reduce that a bit mit) +- **problem** + - Many morphemes are ambiguous + - It would be good to have a compact representation where all possible segmentations are represented in a directed acyclic graph (DAG) rather than as a sequence. If not, we run into a combinatoric explosion, here: + - Given *abcedefg* + - If the sequence *bc* can always be analysed as either *b-c* or *bc* + - And the sequence *def* can always be analysed as either *d-e-f* or *de-f* or *d-ef* or *def* + - And everything else is unambiguous + - Then a there are 2 \* 4 possible morphological analyses, with 52 (!) different morphological segments + - But as a DAG, this can be represented as one path (here using \| to separate possible alternative sub-paths): + - *a-(b-c\|bc)-e-(d-e\|de-f\|de-f\|d-ef\|def)-g* + - This requires only 15 morphological segments (and clever compression can reduce that a bit mit) # not properly integrated yet diff --git a/data/polysynthetic/atausiulugu.ttl b/data/polysynthetic/atausiulugu.ttl new file mode 100644 index 0000000..9376eb3 --- /dev/null +++ b/data/polysynthetic/atausiulugu.ttl @@ -0,0 +1,280 @@ +# Naming conventions + +# as morph:grammaticalMeaning is a morph:Morph-level feature, the following two lines must be put into different entries +# that means that we need to mint distinct URIs for them. I take the number (.../1...) to identify senses for all homographic morphemes +# in an umambiguous fashion, so a URI of the type CANONICAL_FORM + "_" + NUMBER + "_le" should be unambiguous + +# Root morphemes (nothing really morphy here, except for lexinfo:RootMorph) + +# {ata:ata/1n} {bottom} +:ata_1_le a lexinfo:RootMorph, ontolex:LexicalEntry; # .../1n + ontolex:canonicalForm :ata_ata_f; # ata:, no :otherForm, because :ata is identical + ontolex:sense :ata_1n; # {bottom}, .../1n is the sense number + lexinfo:partOfSpeech lexinfo:noun. # .../1n +:ata_ata_f a ontolex:Form; # for forms, we provide both the canonical and the current form in the URI, + # so that assimilation patterns are respected. + # Form URIs created in this way are not unique, but this is not an OntoLex requirement. + ontolex:writtenRep "ata"@iu-Latn. # latin transliteration +:ata_1n a ontolex:LexicalSense; # we encode the sense number (.../1n) only in URI + skos:definition "bottom"@en. + +# {atausi:atausiq/1n} {one} +# this is formed by ata/1n u/1nv siq/2vn "the thing which is at the bottom", lit. "bottom is (it's) habit" +# but the analyzer doesn't tell us +:atausiq_1_le a lexinfo:RootMorph, ontolex:LexicalEntry; # .../1n + ontolex:canonicalForm :atausiq_atausiq_f; # :atausiq + ontolex:otherForm :atausiq_atausi_f; # atausi:, assimilated form + ontolex:sense :atausiq_1n; # {one}, .../1n + lexinfo:partOfSpeech lexinfo:noun. # .../1n +:atausiq_atausi_f a ontolex:Form; + ontolex:writtenRep "atausi"@iu-Latn. +:atausiq_atausiq_f a ontolex:Form; + ontolex:writtenRep "atausiq"@iu-Latn. +:atausiq_1n a ontolex:LexicalSense; + skos:definition "one"@en. + +# Derivation + +:verb a morph:GrammaticalMeaning; + rdfs:comment "for derivation morphemes that produce or require verbs"@en; + lexinfo:partOfSpeech lexinfo:verb. +:noun a morph:GrammaticalMeaning; + rdfs:comment ""for derivation morphemes that produce or require verbs"@en; + lexinfo:partOfSpeech lexinfo:noun. + +# {u:u/1nv} {existence; is} +:u_1_le a ontolex:Affix; # morph:Morph is redundant here, but ok ... + ontolex:canonicalForm :u_u_f; # we have more than one u-form, so the ids have to be more specific, + # as these forms differ in their phonological context + ontolex:sense :u_1nv; + morph:grammaticalMeaning :verb; # ../1n*v* + morph:baseConstraint :noun. # ../1*n*v +:u_1nv a ontolex:LexicalSense; + skos:definition "existence; is"@en. +:u_u_f a ontolex:Form; + ontolex:writtenRep "u". +# BTW: ata=u- is an example of incorporation, so this is covered here, as well + +# {u:uq/3vv} {frequentative: many subjects; many objects} +:uq_3_le a ontolex:Affix; + ontolex:canonicalForm :uq_uq_f; + ontolex:otherForm :uq_u_f; + ontolex:sense :uq_3vv; + morph:grammaticalMeaning :verb; + morph:baseConstraint :verb. +:uq_uq_f a ontolex:Form; + ontolex:writtenRep "uq"@iu-Latn. +:uq_u_f a ontolex:Form; + ontolex:writtenRep "u"@iu-Latn. +:uq_3vv a ontolex:LexicalSense; + skos:definition "frequentative: many subjects; many objects"@en. + +# {u:ut/2nn} {bag, container for; s.t. which has\...} +:ut_2_le a ontolex:Affix; + ontolex:canonicalForm :ut_ut_f; + ontolex:otherForm :ut_u_f; + ontolex:sense :ut_2nn; + morph:grammaticalMeaning :noun; + morph:baseConstraint :noun. +:ut_ut_f a ontolex:Form; + ontolex:writtenRep "ut"@iu-Latn. +:ut_u_f a ontolex:Form; + ontolex:writtenRep "u"@iu-Latn. +ut_2nn a ontolex:LexicalSense; + skos:definition "bag, container for; s.t. which has\..."@en. + +# {si:liq/2nv} {to provide, supply; to put s.t. (trans.: to, on s.o.)} +:liq_2_le a ontolex:Affix; + ontolex:canonicalForm :liq_liq_f; + ontolex:otherForm :liq_si_f; + ontolex:sense :liq_2nv; + morph:grammaticalMeaning :noun; + morph:baseConstraint :verb. +:liq_liq_f a ontolex:Form; + ontolex:writtenRep "liq"@iu-Latn. +:liq_si_f a ontolex:Form; + ontolex:writtenRep "si"@iu-Latn. +:liq_2nv a ontolex:LexicalSense; + skos:definition "to provide, supply; to put s.t. (trans.: to, on s.o.)"@en. + +# {si:si/2vv} {the action is being done now, where it was not the case before; readiness, commencement of action or motion} +:si_2_le a ontolex:Affix; + ontolex:canonicalForm :si_si_f; + ontolex:sense :si_2vv; + morph:grammaticalMeaning :verb; + morph:baseConstraint :verb. +:si_si_f a ontolex:Form; + ontolex:writtenRep "si"@iu-Latn. +:si_2vv a ontolex:LexicalSense; + skos:definition "the action is being done now, where it was not the case before; readiness, commencement of action or motion"@en. + +# {si:siq/1vv} {to put or bring out, to be put or brought up for some natural process; to be waiting for an action to be performed or completed} +:siq_1_le a ontolex:Affix; + ontolex:canonicalForm :siq_siq_f; + ontolex:otherForm :siq_si_f; + ontolex:sense :siq_1vv; + morph:grammaticalMeaning :verb; + morph:baseConstaint :verb. +:siq_1vv a ontolex:LexicalSense; + skos:definition "to put or bring out, to be put or brought up for some natural process; to be waiting for an action to be performed or completed"@en. +:siq_siq_f a ontolex:Form; + ontolex:writtenRep "siq"@iu-Latn. +:siq_si_f a ontolex:Form; + ontolex:writtenRep "si"@iu-Latn. + +# {si:siq/2vn} {custom; way; habit; manner of doing s.t.} +# Note: must be a different entry because it has a different grammaticalMeaning, coupled with a different sense +:siq_2_le a ontolex:Affix; + ontolex:canonicalForm :siq_siq_f; + ontolex:otherForm :siq_si_f; + ontolex:sense :siq_2vn; + morph:grammaticalMeaning :noun; + morph:baseConstaint :verb. +:siq_2vn a ontolex:LexicalSense; + skos:definition "custom; way; habit; manner of doing s.t."@en. +# :siq_siq_f and :siq_si_f are the same as for si:siq/1vv, and this is ok, because they have the same assimilation patterns + +# {lu:luk/3vv} {to perform an action in a poor or bad manner} +# Note: ...lugu can be either -lu=gu or -lugu +:luk_3_le a ontolex:Affix; + ontolex:canonicalForm :luk_luk_f; + ontolex:otherForm :luk_lu_f; + ontolex:sense :luk_3vv; + morph:grammaticalMeaning :verb; + morph:baseConstraint :verb. +:luk_luk_f a ontolex:Form; + ontolex:writtenRep "luk"@iu-Latn. +:luk_lu_f a ontolex:Form; + ontolex:writtenRep "lu"@iu-Latn. +:luk_3vv a ontolex:LexicalSense; + skos:definition "to perform an action in a poor or bad manner"@en. + +# Note: by using grammatical meaning and base constraint, we can ensure proper combinatoric semantics, +# BUT we cannot indicate whether a form is complete or not: all the verbal forms *require* inflection morphemes + +# Inflection morphemes + +# for -lugu: conventionally, Inuktitut inflection morphemes for transitive +# verbs indicate subject and object agreement. -lugu is different in +# indicating only object agreement (i.e., with absolutive argument) +# However, it is also a shift-reference marker, so it cannot refer to a +# third person subject, but only to a fourth (obviative/distal third) person subject +# here, we use multiple grammatical meanings to express different agreement patterns + +# Lexinfo extensions (as for number/person agreement, there may be better ways to do that) +:fourthPerson a lexinfo:Person; + rdfs:comment "third person argument marked for switch reference (*different* third person)"@en. +:sbjPerson rdfs:subPropertyOf lexinfo:person; + rdfs:comment "person of subject argument in transitive or ditransive verbs in languages with polypersonal agreement"@en. +:objPerson rdfs:subPropertyOf lexinfo:person; + rdfs:comment "person of direct object argument in transitive or ditransive verbs in languages with polypersonal agreement"@en. +:sbjNumber rdfs:subPropertyOf lexinfo:number; + rdfs:comment "number of subject argument in transitive or ditransive verbs in languages with polypersonal agreement"@en. +:objNumber rdfs:subPropertyOf lexinfo:number; + rdfs:comment "person of direct object argument in transitive or ditransive verbs in languages with polypersonal agreement"@en. + +# grammatical meanings for inflection morphemes +:tv_1d_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:firstPerson; + :sbjNumber lexinfo:dual; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_1s_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:firstPerson; + :sbjNumber lexinfo:singular; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_1p_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:firstPerson; + :sbjNumber lexinfo:plural; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_2d_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:secondPerson; + :sbjNumber lexinfo:dual; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_2s_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:secondPerson; + :sbjNumber lexinfo:singular; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_2p_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:secondPerson; + :sbjNumber lexinfo:plural; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_4d_3s a morph:GrammaticalMeaning; + :sbjPerson :fourthPerson; + :sbjNumber lexinfo:dual; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_4s_3s a morph:GrammaticalMeaning; + :sbjPerson :fourthPerson; + :sbjNumber lexinfo:singular; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_4p_3s a morph:GrammaticalMeaning; + :sbjPerson :fourthPerson; + :sbjNumber lexinfo:plural; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:verbal_participle a lexinfo:Mood; + rdfs:comment "Inuktitut verbal participles are finite verbs (sic!) in subordinate clauses. They tend to be translated by present participles in English, hence the name."@en. + +# for -lugu +# {lugu:lugu/tv-part-1d-3s-fut} {part. future: while we (two) \...him/her/it} +# {lugu:lugu/tv-part-1p-3s-fut} {part. future: while we (many) \...him/her/it} +# {lugu:lugu/tv-part-1s-3s-fut} {part. future: while I \...him/her/it} +# {lugu:lugu/tv-part-2d-3s-fut} {part. future: while you (two) \...him/her/it} +# {lugu:lugu/tv-part-2p-3s-fut} {part. future: while you (many) \...him/her/it} +# {lugu:lugu/tv-part-2s-3s-fut} {part. future: while you \...him/her/it} +# {lugu:lugu/tv-part-4d-3s-fut} {part. future: while they (two) \...him/her/it} +# {lugu:lugu/tv-part-4p-3s-fut} {part. future: while they (many) \...him/her/it} +# {lugu:lugu/tv-part-4s-3s-fut} {part. future: while he/she/it \...him/her/it} +:lugu_tv_le a ontolex:Affix; + ontolex:canonicalForm :lugu_lugu_f; + ontolex:sense :lugu_tv_part_fut; + ontolex:baseConstraint :verb; # we're doing verbal inflection here, so we can attach to a verbal base, only + ontolex:grammaticalMeaning + :tv_1d_3s, :tv_1p_3s, :tv_1s_3s, # these are alternative meanings + :tv_2d_3s, :tv_2p_3s, :tv_2s_3s, + :tv_4d_3s, :tv_4p_3s, :tv_4s_3s. + # Note: In this way, we cannot disambiguate forms for their different grammatical meanings + # If that would be intended, we would need to create one lexical entry per feature combination. + # the following features are shared across all forms (meanings), so we encode them directly at the entry level + lexinfo:mood :verbal_participle; + lexinfo:tense :future. + # Note: Inuktitut does not inflect for grammatical tense, but only for mood. Some moods have future readings, though. + # Explicit tense (temporal) information is provided in derivation morphemes -- which is optional, though. + +:lugu_tv_part_fut a ontolex:LexicalSense; + skos:definiton "part. future: while s.o. does something to s.t. (object)"@en. +:lugu_lugu_f a ontolex:Form; + ontolex:writtenRep "lugu"@iu-Latn. + +# the following is for the analysis -lu=gu +# {gu:guk/tv-imp-2s-3s} {order: you \...him/her/it} +:guk_tv_le a ontolex:Affix; + ontolex:canonicalForm :guk_gu_f; + ontolex:otherForm :guk_guk_f; + ontolex:sense :guk_tv_imp; + morph:baseConstraint :verb; + morph:grammaticalMeaning :tv_2s_3s; + lexinfo:mood lexinfo:imperative. # yes, that's there, already + +:guk_gu_f a ontolex:Form; + ontolex:writtenRep "gu"@iu-Latn. +:guk_guk_f a ontolex:Form; + ontolex:writtenRep "guk"@iu-Latn. +:guk_tv_imp a ontolex:LexicalSense; + skos:definition "order: you \...him/her/it"@en. From 2ee8faa7f871a59fcff8d9013a0efb2dac30b288 Mon Sep 17 00:00:00 2001 From: Christian Chiarcos Date: Sun, 6 Aug 2023 10:52:45 +0200 Subject: [PATCH 4/9] spalding addenda, excl. assimilation rules --- data/polysynthetic/Readme.md | 7 ++- ...atausiulugu.ttl => atausiulugu.morphs.ttl} | 0 data/polysynthetic/atausiulugu.spalding.ttl | 58 +++++++++++++++++++ 3 files changed, 62 insertions(+), 3 deletions(-) rename data/polysynthetic/{atausiulugu.ttl => atausiulugu.morphs.ttl} (100%) create mode 100644 data/polysynthetic/atausiulugu.spalding.ttl diff --git a/data/polysynthetic/Readme.md b/data/polysynthetic/Readme.md index 21cfea5..f766326 100644 --- a/data/polysynthetic/Readme.md +++ b/data/polysynthetic/Readme.md @@ -23,10 +23,11 @@ agreement, head marking ### 1. Encode ambiguity in derivation - **sample** - - [the word *atausiulugu*](atausiulugu.tsv): Verb feat. incorporation and polypersonal agreement, produced by Uqailaut analyzer + - [the word *atausiulugu*](atausiulugu.tsv): Verb feat. incorporation and polypersonal agreement, produced by Uqailaut analyzer. Analyzer doesn't disambiguate, so we need to represent all analyses in a compact way - Necessary morphemes and allomorphs from parser: [atausiulugu.morphs.tsv](atausiulugu.morphs.tsv) - - direct OntoLex rendering (no particular problems, but illustrates `morph:baseConstraint` and treatment of lexinfo gaps): [atausiulugu.ttl](atausiulugu.ttl). - - Root and derivational morphology from Spalding (1998): [atausiulugu.spalding.md](atausiulugu.spalding.md) + - direct OntoLex rendering of morph(eme) inventory (no particular problems, but illustrates `morph:baseConstraint` and treatment of lexinfo gaps): [atausiulugu.morph.ttl](atausiulugu.morphs.ttl). + - Root and derivational morphology from Spalding (1998): [atausiulugu.spalding.md](atausiulugu.spalding.md), this is to check whether there is additional information in conventional dictionaries that we might need to add + - direct OntoLex rendering of selected Spalding information: [atausiulugu.spalding.ttl](atausiulugu.spalding.ttl). A potential problem is that the assimilation rules are not implemented at morph(eme) level - **problem** - Many morphemes are ambiguous - It would be good to have a compact representation where all possible segmentations are represented in a directed acyclic graph (DAG) rather than as a sequence. If not, we run into a combinatoric explosion, here: diff --git a/data/polysynthetic/atausiulugu.ttl b/data/polysynthetic/atausiulugu.morphs.ttl similarity index 100% rename from data/polysynthetic/atausiulugu.ttl rename to data/polysynthetic/atausiulugu.morphs.ttl diff --git a/data/polysynthetic/atausiulugu.spalding.ttl b/data/polysynthetic/atausiulugu.spalding.ttl new file mode 100644 index 0000000..ecc3995 --- /dev/null +++ b/data/polysynthetic/atausiulugu.spalding.ttl @@ -0,0 +1,58 @@ +# Addenda from Spalding + +:spalding1998 a lime:Lexicon; + rdfs:comment """Source Spalding, Alex, \"Inuktitut - A Multi-Dialectal OutlineDictionary\". Nunavut Arctic College, Iqaluit, Nunavut, Canada, 1998."""@en; # NEW + lime:entry :ata_1_le, :atausiq_1_le, :luk_3_le. + +# we keep original URIs, but limit ourselves to information provided by Spalding +# Uqailaut is very closely based on Spalding, to these could be put into the same graph/lime:Lexicon, but here, we assume that we have distinct graphs to reflect the provenance differences. + +:ata_1_le a lexinfo:RootMorph, ontolex:LexicalEntry; + ontolex:canonicalForm :ata_ata_f; + ontolex:sense :ata_1n; + lexinfo:partOfSpeech lexinfo:noun; # SAME: "type: nominal root" +:ata_ata_f a ontolex:Form; + ontolex:writtenRep + "ata"@iu-Latn, # SAME: latin transliteration + "ᐊᑕ"@iu-Cans. # NEW: Unified Canadian Aboriginal Syllabics +:ata_1n a ontolex:LexicalSense; # SAME: "meaning: bottom" + skos:definition "bottom"@en. + +:atausiq_1_le a lexinfo:RootMorph, ontolex:LexicalEntry; + ontolex:canonicalForm :atausiq_atausiq_f; + ontolex:sense :atausiq_1n; + lexinfo:partOfSpeech lexinfo:noun. # SAME: "type: nominal root" +:atausiq_atausiq_f a ontolex:Form; + ontolex:writtenRep "atausiq"@iu-Latn, "ᐊᑕᐅᓯᖅ"@iu-Cans. +:atausiq_1n a ontolex:LexicalSense; + skos:definition "one"@en; # SAME + ontolex:concept :number_quantity. # NEW + +:number_quantity a ontolex:LexicalConcept; # NEW + rdfs:comment "Semantic category number; quantity"@en. # NEW + # Note: this is more like a domain, rather than a concept, may be controversial ... + +# URI from Uqailaut ... +:luk_3_le a ontolex:Affix; + ontolex:canonicalForm :luk_luk_f; + ontolex:sense :luk_3vv; + morph:grammaticalMeaning :verb; # SAME + morph:baseConstraint :verb; # SAME + # "Type verb-to-verb (VV) suffix: attaches to a verb root or verb stem, and produces a verb stem" + rdfs:comment "Mobility this suffix is mobile: it can be used at will with all roots and stems of the proper type"@en; + # this is default, so no need to encode it, I guess + rdfs:comment "Position this suffix must be followed by another suffix, i.e. it cannot occur in word-final position"@en; + # Note: this may be a gap in OntoLex-Morph: we cannot encode whether an affix produces a complete word or not, fallback: encode as part of grammatical meaning + # forms and behaviours are partially replicated in OntoLex-Morph, by providing different "otherForms", but not with the exact assimilation rules + +:luk_luk_f a ontolex:Form; + ontolex:writtenRep "luk"@iu-Latn, "ᓗᒃ"@iu-Cans; + rdfs:comment # NEW + "After \'a\', \'i\', \'u\' When the stem ends with \'a\', \'i\' or \'u\', this affix takes the form luk ; it has no effect on the stem"@en, + "After \'t\' When the stem ends with \'t\', this affix takes the form luk; it deletes the end character of the stem \[\_t + luk → \_luk\]."@en, + "After \'k\' When the stem ends with \'k\', this affix takes the form luk; it deletes the end character of the stem \[\_k + luk → \_luk\]."@en, + "After \'q\' When the stem ends with \'q\', this affix takes the form luk; it deletes the end character of the stem \[\_q + luk → \_luk\]."@en . + +:luk_3vv a ontolex:LexicalSense; + skos:definition "to perform an action in a poor or bad manner"@en. # SAME + From 97413e29cd1de283bf0dd8c1c73700fc7f61e0d8 Mon Sep 17 00:00:00 2001 From: Christian Chiarcos Date: Tue, 8 Aug 2023 01:54:00 +0200 Subject: [PATCH 5/9] segmentation experiment --- data/polysynthetic/Readme.md | 110 +++++++----- data/polysynthetic/atausiulugu.spalding.ttl | 20 ++- data/polysynthetic/atausiulugu.ttl | 178 ++++++++++++++++++++ 3 files changed, 261 insertions(+), 47 deletions(-) create mode 100644 data/polysynthetic/atausiulugu.ttl diff --git a/data/polysynthetic/Readme.md b/data/polysynthetic/Readme.md index f766326..bed5a43 100644 --- a/data/polysynthetic/Readme.md +++ b/data/polysynthetic/Readme.md @@ -1,6 +1,8 @@ # Polysynthetic languages -Example: Inuktitut, an Eskimo-Aleutic language from the Eastern Canadian Arctic, official language in Nunavut (Canada). Our analysis is based on the Uqialaut analyzer. Uqailaut is an ad-hoc implementation in Java, but roughly equivalent to an FST implementation (as available for the closely related Kalaallisut language from Western Greenland). Inuktitut poses a number of unique challenges because of its extremely rich morphology. +Example: Inuktitut, an Eskimo-Aleut* language from the Eastern Canadian Arctic, official language in Nunavut (Canada). Our analysis is based on the Uqialaut analyzer. Uqailaut is an ad-hoc implementation in Java, but roughly equivalent to an FST implementation (as available for the closely related Kalaallisut language from Western Greenland). Inuktitut poses a number of unique challenges because of its extremely rich morphology. + +* We are aware that the exonym "Eskimo" is considered derogative in Canada, and that "Inuit" is preferred. However, "Inuit" excludes the Yupik languages of Alaska, so that in absence of a better designation, we stay with the traditional term when referring to (features of) the group of languages that includes both Inuit and Yupik. The Uqailaut Inuktitut data was originally included in the OntoLex-Morph GDrive and migrated to OntoLex-Morph GitHub on 2021-10-06. @@ -16,72 +18,94 @@ agreement, head marking > [[http://www.inuktitutcomputing.ca/Technocrats/ILFT.php#morphology]{.underline}](http://www.inuktitutcomputing.ca/Technocrats/ILFT.php#morphology), > [[https://uqausiit.ca/morpheme-list/infix]{.underline}](https://uqausiit.ca/morpheme-list/infix) -- We consider a single full form - -## Modelling challenges - -### 1. Encode ambiguity in derivation +- We consider a single full form, *atausiulugu*. - **sample** - [the word *atausiulugu*](atausiulugu.tsv): Verb feat. incorporation and polypersonal agreement, produced by Uqailaut analyzer. Analyzer doesn't disambiguate, so we need to represent all analyses in a compact way - - Necessary morphemes and allomorphs from parser: [atausiulugu.morphs.tsv](atausiulugu.morphs.tsv) - - direct OntoLex rendering of morph(eme) inventory (no particular problems, but illustrates `morph:baseConstraint` and treatment of lexinfo gaps): [atausiulugu.morph.ttl](atausiulugu.morphs.ttl). - - Root and derivational morphology from Spalding (1998): [atausiulugu.spalding.md](atausiulugu.spalding.md), this is to check whether there is additional information in conventional dictionaries that we might need to add - - direct OntoLex rendering of selected Spalding information: [atausiulugu.spalding.ttl](atausiulugu.spalding.ttl). A potential problem is that the assimilation rules are not implemented at morph(eme) level -- **problem** - - Many morphemes are ambiguous - - It would be good to have a compact representation where all possible segmentations are represented in a directed acyclic graph (DAG) rather than as a sequence. If not, we run into a combinatoric explosion, here: - - Given *abcedefg* - - If the sequence *bc* can always be analysed as either *b-c* or *bc* - - And the sequence *def* can always be analysed as either *d-e-f* or *de-f* or *d-ef* or *def* - - And everything else is unambiguous - - Then a there are 2 \* 4 possible morphological analyses, with 52 (!) different morphological segments - - But as a DAG, this can be represented as one path (here using \| to separate possible alternative sub-paths): - - *a-(b-c\|bc)-e-(d-e\|de-f\|de-f\|d-ef\|def)-g* - - This requires only 15 morphological segments (and clever compression can reduce that a bit mit) + - Necessary morphemes and allomorphs from parser: [atausiulugu.morphs.tsv](atausiulugu.morphs.tsv) + - Root and derivational morphology from Spalding (1998): [atausiulugu.spalding.md](atausiulugu.spalding.md), this is to check whether there is additional information in conventional dictionaries that we might need to add +- **modelling** + - Uqailaut morph(eme) inventory (for atausiulugu.morphs.tsv) + - direct OntoLex rendering: [atausiulugu.morph.ttl](atausiulugu.morphs.ttl). + + > NOTE: no particular problems, but illustrates `morph:baseConstraint` and treatment of lexinfo gaps) + + - selected Spalding information (atausiulugu.spalding.md) + - direct OntoLex rendering: [atausiulugu.spalding.ttl](atausiulugu.spalding.ttl). Also shows the application of an assimilation (allomorphy) rule. A challenge is to mark whether a word form is complete or not. The current workaround is to append a special symbol in replacements, final `-` for Inuktitut. This works nicely, but it would be more natural to encode this directly as a property of morphs. -# not properly integrated yet + - approaches to model full segmentations using the morpheme inventory under [atausiulugu.ttl](atausiulugu.ttl) +## Modelling challenges -**Modelling challenges**: +### 1. Allomorphy rules -- allomorphy rules +[atausiulugu.spalding.ttl](atausiulugu.spalding.ttl) shows an assimilation (allomorphy) rule implemented as `morph:Replacement` with capturing groups. No particular difficulties, but the translation of human-readable assimilation rules to regular expressions with capturing groups cannot be automatized. -- Cardinality and type restrictions (vn morpheme requires verb and - > produces n, \[most\] verbal morphemes cannot be final, etc.) +### 2. Cardinality and type restrictions -#### Incorporation (basically a verbal derivation from a noun): +- a `vn` morpheme requires a verb and produces a noun + - implemented using `morph:baseConstraint` and `morph:grammaticalMeaning`. Here, grammatical meaning represents the result state. -{si:liq/2nv} +### 3. Distinguish complete and incomplete forms -{u:u/1nv} +- verbal derviation morphemes cannot be final + - **CANNOT** be directly modelled at `morph:Morph`, in atausiulugu.spalding.ttl implemented in `morph:Replacement`, using a special placeholder symbol + - for derived forms, it can be encoded by identifying them as `lexinfo:StemMorph` (i.e., not an `ontolex:Word`, this would be a complete form, then) + - **TODO**: put this aspect into the definition of `lexinfo:StemMorph`: "lexinfo:StemMorph is to be used only for morphs that do not represent complete words and that require additional markers (e.g., inflection) in order to occur in natural language." -\[read: if applied to a noun, return a verb; number is number of lexical -entry for a particular lemma\] +> NOTE: we may need a representation of incomplete or otherwise special forms in lexinfo, e.g., `lexinfo:constructed` (for constructed, reconstructed or hypothetical forms, similar to `*`), `lexinfo:attested` (default, should only be used in resources that provide attested along with (re)constructed or otherwise non-attested forms), `lexinfo:incorrect` (sometimes, resources provide counterexamples, conventionally marked by `**`). With that information, we may flag all automatically predicted forms as `lexinfo:constructed` and if a resource provides an `lexinfo:attested` form, this overrides the constructed forms (and users can provide a SPARQL filter to do that) -#### Verb-to-verb derivations: -{si:si/2vv} +### 4. Incorporation -{si:siq/1vv} +Incorporation is usually seen as a process separate from other word formation rules such as compounding or derivation. Incorporation is typical for Inuit and Yupik. The Inuktitut data, however, does not require to differentiate between incorporation and the derivation of nouns from verbs. -{u:uq/3vv} +Uqailaut examples: -#### Verb-to-noun derivations: +- {si:liq/2nv} +- {u:u/1nv} -{si:siq/2vn} +read: if applied to a noun, return a verb; number is number of lexical entry for a particular lemma -#### Noun-to-noun derivations: +- implemented using `morph:baseConstraint` and `morph:grammaticalMeaning`. Here, grammatical meaning represents the result state. No particular difficulties. -{u:ut/2nn} +This is fully equivalent to other cases of derivation: +- verb-to-verb: {si:si/2vv}, {si:siq/1vv}, {u:uq/3vv} +- verb-to-noun: {si:siq/2vn} +- noun-to-noun: {u:ut/2nn} -# FYI: Other data -Not for modelling but FYI +### 5. Encode ambiguity in derivation -- FYI: Corpus data (CoNLL data, not to be modelled, but the individual - > morphemes and their combinatorics need to be modelled +- Many morphemes are ambiguous + - It would be good to have a compact representation where all possible segmentations are represented in a directed acyclic graph (DAG) rather than as a sequence. If not, we run into a combinatoric explosion, here: + - Given *abcedefg* + - If the sequence *bc* can always be analysed as either *b-c* or *bc* + - And the sequence *def* can always be analysed as either *d-e-f* or *de-f* or *d-ef* or *def* + - And everything else is unambiguous + - Then a there are 2 \* 4 possible morphological analyses, with 52 (!) different morphological segments + - But as a DAG, this can be represented as one path (here using \| to separate possible alternative sub-paths): + - *a-(b-c\|bc)-e-(d-e\|de-f\|de-f\|d-ef\|def)-g* + - This requires only 15 morphological segments (and clever compression can reduce that a bit) + +- two segmentation modelling options under [atausiulugu.ttl](atausiulugu.ttl) + - current Seq modelling: + - FAILS to provide more than one segmentation + => this is solved with current WordFormationRelation + - FAILS to identify individual allomorphs (forms) + => **suggested REVISION**: make ontolex:Forms a sequence of ontolex:Forms, then, allomorphs are/can be different forms of the same morph(eme) + (it is still possible -- and, for other resources, requires --, to model each allomorphic variant as separate morph) + - current WordFormationRelation: + - works seamlessly for multiple segmentation, encoding as DAG + - FALS to identify individual allomorphs (forms) + => **OPEN ISSUE**: ignore this problem, in most existing resources, there will be a unique segmentation. if a resource (e.g., a software tool) requires that explicitly, the recommended workaround is to create one morph per allomorph and to use these here. + => **suggested REVISION**: add a property `morph:allomorph` as a subproperty of `vartrans:lexicalRel` that connects a morph with its allomorphic variants (if these are represented as individual morphs) + => **TODO** longer descriptions of allomorphy in guidelines + +# Corpus data (FYI) + +Corpus data (CoNLL data, not to be modelled in OntoLex, but the individual morphemes and their combinatorics need to be modelled \# Hansard diff --git a/data/polysynthetic/atausiulugu.spalding.ttl b/data/polysynthetic/atausiulugu.spalding.ttl index ecc3995..eb4e8dd 100644 --- a/data/polysynthetic/atausiulugu.spalding.ttl +++ b/data/polysynthetic/atausiulugu.spalding.ttl @@ -43,16 +43,28 @@ # this is default, so no need to encode it, I guess rdfs:comment "Position this suffix must be followed by another suffix, i.e. it cannot occur in word-final position"@en; # Note: this may be a gap in OntoLex-Morph: we cannot encode whether an affix produces a complete word or not, fallback: encode as part of grammatical meaning - # forms and behaviours are partially replicated in OntoLex-Morph, by providing different "otherForms", but not with the exact assimilation rules + # forms and behaviours are partially replicated in OntoLex-Morph, by providing different "otherForms", but not with the exact assimilation rules. + # Fallback is to enclose delimiters in replacement rules + +:luk_3vv a ontolex:LexicalSense; + skos:definition "to perform an action in a poor or bad manner"@en. # SAME :luk_luk_f a ontolex:Form; ontolex:writtenRep "luk"@iu-Latn, "ᓗᒃ"@iu-Cans; - rdfs:comment # NEW + rdfs:comment # NEW, also encoded as DerivationRule and Replacement "After \'a\', \'i\', \'u\' When the stem ends with \'a\', \'i\' or \'u\', this affix takes the form luk ; it has no effect on the stem"@en, "After \'t\' When the stem ends with \'t\', this affix takes the form luk; it deletes the end character of the stem \[\_t + luk → \_luk\]."@en, "After \'k\' When the stem ends with \'k\', this affix takes the form luk; it deletes the end character of the stem \[\_k + luk → \_luk\]."@en, "After \'q\' When the stem ends with \'q\', this affix takes the form luk; it deletes the end character of the stem \[\_q + luk → \_luk\]."@en . -:luk_3vv a ontolex:LexicalSense; - skos:definition "to perform an action in a poor or bad manner"@en. # SAME +# implementation of the assimilation rule for :luk_luk_f +:luk_3_rule a morph:DerivationRule; + morph:involves :luk_3_le; + morph:replacement :luk_3_luk_replacement. +:luk_3_luk_replacement a morph:Replacement; + morph:source "(([aiu])|[tqk]?)-$"; + morph:target "\2luk-". +# Note: In that replacement, the final - is used to mark incomplete forms +# We use a capturing group to keep [aiu] (= second group), we delete [qtk] if present. If neither is found the source condition is not applicable +# and, thus, the rule is not. \ No newline at end of file diff --git a/data/polysynthetic/atausiulugu.ttl b/data/polysynthetic/atausiulugu.ttl new file mode 100644 index 0000000..6a08fbc --- /dev/null +++ b/data/polysynthetic/atausiulugu.ttl @@ -0,0 +1,178 @@ +# combinatorics of atausiulugu + +# 1. modelling as a rdf:Seq +# - one seq per analysis => 23 different forms!? +# => this modelling can only be applied if the analysis is unambiguous +# - we need to point to allomorphs (i.e., forms of morphs), not morphs, this is not possible with the current modelling +# can we change + +{atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-1s-3s-fut} {one}{existence; is}{part. future: while I \...him/her/it} + +# CURRENT Seq with morphs +# FAIL 1: we loose information about the actual forms +# FAIL 2: we cannot encode more than one possible analysis +:atausiulugu_le a ontolex:Word; + ontolex:canonicalForm :atausiulugu_atausiulugu_f. +:atausiulugu_atausiulugu_f a ontolex:Form; + ontolex:writtenRep "atausiulugu"@iu-Latn. +:atausiulugu_atausiulugu_f a rdfs:Seq; + rdf:_1 :atausiq_1_le; + rdf:_2 :u_1_le; + rdf:_3 :lugu_tv_part_le. + +# REVISION Seq with forms +# FAIL 1: SOLVED +# FAIL 2: we cannot encode more than one possible analysis +:atausiulugu_le a ontolex:Word; + ontolex:canonicalForm :atausiulugu_atausiulugu_f. +:atausiulugu_atausiulugu_f a ontolex:Form; + ontolex:writtenRep "atausiulugu"@iu-Latn. +:atausiulugu_atausiulugu_f a rdfs:Seq; + rdf:_1 :atausiq_atausi_le; + rdf:_2 :u_u_f; + rdf:_3 :lugu_lugu_f. + # note that *in the data*, forms are not guaranteed to be unique for lexical entries + +# 2. modelling with WordFormationRelations + +# we skip inflections here + +# CURRENT +# FAIL 1: we loose information about the actual forms + # with the revision to define a ontolex:Form a Seq of other forms, this can be solved. + # we just need to give all forms their full segmentation. + # problem is that we can represent a single form only +# FAIL 2 SOLVED: we encode more than one possible analysis + +# analysis 1: the following are equivalent except for inflection +# note that the alternative analyses (their grammatical meanings) are encoded as grammatical meaning of :lugu_tv_part. +# if we *want* to disambiguate and have one form per analysis, we can carry over selected morphosyntactic properties into the form at hand +# {atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-1s-3s-fut} +# {atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-1s-3s-fut} +# {atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-2s-3s-fut} +# {atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-4s-3s-fut} +# {atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-1d-3s-fut} +# {atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-2d-3s-fut} +# {atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-4d-3s-fut} +# {atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-1p-3s-fut} +# {atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-2p-3s-fut} +# {atausi:atausiq/1n}{u:u/1nv}{lugu:lugu/tv-part-4p-3s-fut} + +:rel_atausiq_1_u_1 a morph:WordFormationRelation; + vartrans:source :atausiq_1_le; + vartrans:target :atausiu_verb; + morph:wordFormationRule [ morph:involves :u_1_le ] . + +:atausiu_verb a lexinfo:StemMorph; # this is incomplete + lexinfo:partOfSpeech lexinfo:verb; + ontolex:lexicalForm :atausiu_atausiulugu_f. + +:atausiu_atausiulugu_f a ontolex:Form; + ontolex:writtenRep "atausiulugu"@iu-Latn; + morph:inflectionRule [ morph:involves :lugu_tv_part ]. + +# analysis 2: ...lugu analyzed as -luk=guk, the second morpheme is inflection +# {atausi:atausiq/1n}{u:u/1nv}{lu:luk/3vv}{gu:guk/tv-imp-2s-3s} + +:rel_atausiu_v_luk_3 a morph:WordFormationRelation; + vartrans:source :atausiu_verb; + vartrans:target :atausiuluk_verb; + morph:wordFormationRule [ morph:involves :luk_3_le ] . + +:atausiuluk_verb a lexinfo:StemMorph; + lexinfo:partOfSpeech lexinfo:verb; + ontolex:lexicalForm :atausiuluk_atausiulugu_f. + +:atausiuluk_atausiulugu_f a ontolex:Form; + ontolex:writtenRep "atausiulugu"@iu-Latn; + morph:inflectionRule [ morph:involves :guk_tv_imp ]. + +# analysis 3: ata=u=siq=u- => :atausiu_verb (everything from there is the same) +# {ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-1s-3s-fut} +# {ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-2s-3s-fut} +# {ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-4s-3s-fut} +# {ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-1d-3s-fut} +# {ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-2d-3s-fut} +# {ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-4d-3s-fut} +# {ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-1p-3s-fut} +# {ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-2p-3s-fut} +# {ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lugu:lugu/tv-part-4p-3s-fut} +# {ata:ata/1n}{u:u/1nv}{si:siq/2vn}{u:u/1nv}{lu:luk/3vv}{gu:guk/tv-imp-2s-3s} + +:rel_ata_1_u_1 a morph:WordFormationRelation; + vartrans:source :ata_1_le; + vartrans:target :atau_verb; + morph:wordFormationRule [ morph:involves :u_1_le ] . + +:atau_verb a lexinfo:StemMorph; + lexinfo:partOfSpeech lexinfo:verb. + +:rel_atau_v_siq_2 a morph:WordFormationRelation; + vartrans:source :atau_verb; + vartrans:target :atausiq_noun; + morph:wordFormationRule [ morph:involves :siq_2_le ]. + +:atausiq_noun a lexinfo:StemMorph; + lexinfo:partOfSpeech lexinfo:noun. + +:rel_atausiq_n_u_1 a morph:WordFormationRelation; + # note that this is a different relation from the atausiq_u_1, which starts from the root atausiuq, instead + vartrans:source :atausiq_noun; + vartrans:target :atausiu_verb; # but it returns the same base + morph:wordFormationRule [ morph:involves :u_1_le ] . + +# analysis 4: ata=u=si=uq=luk- => atau_verb -> atausiuluk_verb +# {ata:ata/1n}{u:u/1nv}{si:si/2vv}{u:uq/3vv}{lu:luk/3vv}{gu:guk/tv-imp-2s-3s} + +:rel_atau_v_si_2 a morph:WordFormationRelation; + vartrans:source :atau_verb; + vartrans:target :atausi_verb; + morph:wordFormationRelation [ morph:involves :si_2_le ]. + +:atausi_verb a lexinfo:StemMorph; + lexinfo:partOfSpeech lexinfo:verb. + +:rel_atausi_v_uq_3 a morph:WordFormationRelation; + vartrans:source :atausi_verb; + vartrans:target :atausiuq_verb; + morph:wordFormationRelation [ morph:involves :uq_3_le ] . + +:atausiuq_verb a lexinfo:StemMorph; + lexinfo:partOfSpeech lexinfo:verb. + +:rel_atausiuq_v_luk_3 a morph:WordFormationRelation; + vartrans:source :atausiuq_verb; + vartrans:target :atausiuluk_verb; + morph:wordFormationRelation [ morph:involves :luk_3_le ]. + +# analysis 5: ata=ut=liq=uq=luk- => atausiuq_verb +# {ata:ata/1n}{u:ut/2nn}{si:liq/2nv}{u:uq/3vv}{lu:luk/3vv}{gu:guk/tv-imp-2s-3s} + +:rel_ata_1_ut_2 a morph:WordFormationRelation; + vartrans:source :ata_1_le; + vartrans:target :ataut_noun; + morph:wordFormationRelation [ morph:involves :ut_2_le ]. + +:ataut_noun a lexinfo:StemMorph; + lexinfo:partOfSpeech lexinfo:noun. + +:rel_ataut_n_liq_2 a morph:WordFormationRelation; + vartrans:source :ataut_noun; + vartrans:target :atausiq_verb; + morph:wordFormationRelation [ morph:involves :liq_2_le ]. + +:atausiq_verb a lexinfo:StemMorph; + lexinfo:partOfSpeech lexinfo:verb. + +:rel_atausiq_v_uq_3 a morph:WordFormationRelation; + vartrans:source :atausiq_verb; + vartrans:target :atausiuq_verb; + morph:wordFormationRelation [ morph:involves :uq_3_le ]. + +# analysis 6: ata=u=siq=uq=luk- => :atau_verb -> :atausiq_verb +# {ata:ata/1n}{u:u/1nv}{si:siq/1vv}{u:uq/3vv}{lu:luk/3vv}{gu:guk/tv-imp-2s-3s} + +:rel_atau_v_siq_1 a morph:WordFormationRelation; + vartrans:source :atau_verb; + vartrans:target :atausiq_verb; + morph:wordFormationRelation [ morph:involves :siq_1_le ]. \ No newline at end of file From 57cbc649922988a592e94785c8028b553a43f84b Mon Sep 17 00:00:00 2001 From: Christian Chiarcos Date: Tue, 8 Aug 2023 02:04:22 +0200 Subject: [PATCH 6/9] tl/dr --- data/polysynthetic/Readme.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/data/polysynthetic/Readme.md b/data/polysynthetic/Readme.md index bed5a43..d243c4f 100644 --- a/data/polysynthetic/Readme.md +++ b/data/polysynthetic/Readme.md @@ -1,12 +1,23 @@ # Polysynthetic languages +## TL/DR + +- **FAIL**: *this data* requires modelling of allomorphs as form variants of a single morph. so, we cannot connect forms with their morphological segments, but only with the morphemes + - **SUGGESTED REVISION**: make ontolex:Form an rdfs:Seq of ontolex:Forms rather than morph:Morphs (preserve current `morph:contains`) +- **MINOR**: lexinfo property for attested/correct, constructed/hypothetical (`*`), and incorrect (`**`) forms +- **MINOR**: illustrates usage and need for `morph:baseConstraint` (=> example for documentation, also to show grammatical meaning of morphs used for features of derived forms) +- **MINOR**: illustrates need for `lexinfo:StemMorph`. TODO: add to definition that this should be used for cases in which the canonical form cannot be used as a standalone word (but requires additional markers, e.g., inflectional morphology) +- **MINOR** longer descriptions of allomorphy in guidelines + +## Inuktitut + Example: Inuktitut, an Eskimo-Aleut* language from the Eastern Canadian Arctic, official language in Nunavut (Canada). Our analysis is based on the Uqialaut analyzer. Uqailaut is an ad-hoc implementation in Java, but roughly equivalent to an FST implementation (as available for the closely related Kalaallisut language from Western Greenland). Inuktitut poses a number of unique challenges because of its extremely rich morphology. * We are aware that the exonym "Eskimo" is considered derogative in Canada, and that "Inuit" is preferred. However, "Inuit" excludes the Yupik languages of Alaska, so that in absence of a better designation, we stay with the traditional term when referring to (features of) the group of languages that includes both Inuit and Yupik. The Uqailaut Inuktitut data was originally included in the OntoLex-Morph GDrive and migrated to OntoLex-Morph GitHub on 2021-10-06. -## Inuktitut +## Grammar and Data Features; agglutination, assimilation, incorporation, polypersonal agreement, head marking From 3b6211af026cb45c2b68bd18c5fc9f1513198628 Mon Sep 17 00:00:00 2001 From: Christian Chiarcos Date: Tue, 8 Aug 2023 02:09:10 +0200 Subject: [PATCH 7/9] tl/dr --- data/polysynthetic/Readme.md | 1 + 1 file changed, 1 insertion(+) diff --git a/data/polysynthetic/Readme.md b/data/polysynthetic/Readme.md index d243c4f..d1e2767 100644 --- a/data/polysynthetic/Readme.md +++ b/data/polysynthetic/Readme.md @@ -4,6 +4,7 @@ - **FAIL**: *this data* requires modelling of allomorphs as form variants of a single morph. so, we cannot connect forms with their morphological segments, but only with the morphemes - **SUGGESTED REVISION**: make ontolex:Form an rdfs:Seq of ontolex:Forms rather than morph:Morphs (preserve current `morph:contains`) +- **ADDITION**: if data sets or tools require the encoding of allomorphs as individual morphs (which is allowed, but not in line with the resource we looked into here), these variants should be linked by `morph:allomorph` (sub-property of `vartrans:lexicalRel`) - **MINOR**: lexinfo property for attested/correct, constructed/hypothetical (`*`), and incorrect (`**`) forms - **MINOR**: illustrates usage and need for `morph:baseConstraint` (=> example for documentation, also to show grammatical meaning of morphs used for features of derived forms) - **MINOR**: illustrates need for `lexinfo:StemMorph`. TODO: add to definition that this should be used for cases in which the canonical form cannot be used as a standalone word (but requires additional markers, e.g., inflectional morphology) From 02a75d25fd22b5dba8d2480b4fa396b557e321ba Mon Sep 17 00:00:00 2001 From: Christian Chiarcos Date: Tue, 8 Aug 2023 02:13:59 +0200 Subject: [PATCH 8/9] tl/dr --- data/polysynthetic/Readme.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/data/polysynthetic/Readme.md b/data/polysynthetic/Readme.md index d1e2767..44f98f4 100644 --- a/data/polysynthetic/Readme.md +++ b/data/polysynthetic/Readme.md @@ -6,6 +6,8 @@ - **SUGGESTED REVISION**: make ontolex:Form an rdfs:Seq of ontolex:Forms rather than morph:Morphs (preserve current `morph:contains`) - **ADDITION**: if data sets or tools require the encoding of allomorphs as individual morphs (which is allowed, but not in line with the resource we looked into here), these variants should be linked by `morph:allomorph` (sub-property of `vartrans:lexicalRel`) - **MINOR**: lexinfo property for attested/correct, constructed/hypothetical (`*`), and incorrect (`**`) forms +- **MINOR**: add 4th person (obviative) to lexinfo +- **MINOR**: add properties for sbuject and object agreement to lexinfo - **MINOR**: illustrates usage and need for `morph:baseConstraint` (=> example for documentation, also to show grammatical meaning of morphs used for features of derived forms) - **MINOR**: illustrates need for `lexinfo:StemMorph`. TODO: add to definition that this should be used for cases in which the canonical form cannot be used as a standalone word (but requires additional markers, e.g., inflectional morphology) - **MINOR** longer descriptions of allomorphy in guidelines From 59e660ac749857cc92c5c6f77bf925ffa1f13889 Mon Sep 17 00:00:00 2001 From: Christian Chiarcos Date: Thu, 16 Nov 2023 05:31:55 +0100 Subject: [PATCH 9/9] different rdfs:Seq variants --- data/polysynthetic/Readme.md | 11 + data/polysynthetic/atausiulugu.ttl | 4 +- .../0-prefixes.ttl | 1 + .../atausiulugu.morphs.ttl | 347 ++++++++++++++++++ .../atausiulugu.spalding.ttl | 66 ++++ .../atausiulugu.ttl | 10 + .../0-prefixes.ttl | 8 + .../atausiulugu.morphs.ttl | 280 ++++++++++++++ .../atausiulugu.spalding.ttl | 65 ++++ .../atausiulugu.ttl | 13 + 10 files changed, 803 insertions(+), 2 deletions(-) create mode 120000 data/polysynthetic/every-form-variant-one-morph/0-prefixes.ttl create mode 100644 data/polysynthetic/every-form-variant-one-morph/atausiulugu.morphs.ttl create mode 100644 data/polysynthetic/every-form-variant-one-morph/atausiulugu.spalding.ttl create mode 100644 data/polysynthetic/every-form-variant-one-morph/atausiulugu.ttl create mode 100644 data/polysynthetic/one-morph-with-multiple-forms/0-prefixes.ttl create mode 100644 data/polysynthetic/one-morph-with-multiple-forms/atausiulugu.morphs.ttl create mode 100644 data/polysynthetic/one-morph-with-multiple-forms/atausiulugu.spalding.ttl create mode 100644 data/polysynthetic/one-morph-with-multiple-forms/atausiulugu.ttl diff --git a/data/polysynthetic/Readme.md b/data/polysynthetic/Readme.md index 44f98f4..046801c 100644 --- a/data/polysynthetic/Readme.md +++ b/data/polysynthetic/Readme.md @@ -1,5 +1,16 @@ # Polysynthetic languages +note: + +- this is christian's original proposal, which makes use of ontolex:otherForm to encode contextual variants of the same morph(eme). note that this modelling makes no claim as to whether morph:Morph and its forms are allomorphs or just notational variants, because there is no formal notion of allomorphy here. +- as the proposal to use forms (of morphs) in the rdfs:Seq of ontolex:Forms caused some controversy, we now provide both ways of modelling to better compare. + - this directory: discussion and original modelling (including both these variants, sources and more details) + - `one-morph-with-multiple-forms/`: proposal by christian, ttl only + - 260 triples + - `every-form-variant-one-morph/`: proposal by max, ttl only + - 312 triples (+20%) + - no linking across form variants of the same morph, yet + ## TL/DR - **FAIL**: *this data* requires modelling of allomorphs as form variants of a single morph. so, we cannot connect forms with their morphological segments, but only with the morphemes diff --git a/data/polysynthetic/atausiulugu.ttl b/data/polysynthetic/atausiulugu.ttl index 6a08fbc..d54706a 100644 --- a/data/polysynthetic/atausiulugu.ttl +++ b/data/polysynthetic/atausiulugu.ttl @@ -28,7 +28,7 @@ :atausiulugu_atausiulugu_f a ontolex:Form; ontolex:writtenRep "atausiulugu"@iu-Latn. :atausiulugu_atausiulugu_f a rdfs:Seq; - rdf:_1 :atausiq_atausi_le; + rdf:_1 :atausiq_atausi_f; rdf:_2 :u_u_f; rdf:_3 :lugu_lugu_f. # note that *in the data*, forms are not guaranteed to be unique for lexical entries @@ -175,4 +175,4 @@ :rel_atau_v_siq_1 a morph:WordFormationRelation; vartrans:source :atau_verb; vartrans:target :atausiq_verb; - morph:wordFormationRelation [ morph:involves :siq_1_le ]. \ No newline at end of file + morph:wordFormationRelation [ morph:involves :siq_1_le ]. diff --git a/data/polysynthetic/every-form-variant-one-morph/0-prefixes.ttl b/data/polysynthetic/every-form-variant-one-morph/0-prefixes.ttl new file mode 120000 index 0000000..2ceb386 --- /dev/null +++ b/data/polysynthetic/every-form-variant-one-morph/0-prefixes.ttl @@ -0,0 +1 @@ +../one-morph-with-multiple-forms/0-prefixes.ttl \ No newline at end of file diff --git a/data/polysynthetic/every-form-variant-one-morph/atausiulugu.morphs.ttl b/data/polysynthetic/every-form-variant-one-morph/atausiulugu.morphs.ttl new file mode 100644 index 0000000..21e8392 --- /dev/null +++ b/data/polysynthetic/every-form-variant-one-morph/atausiulugu.morphs.ttl @@ -0,0 +1,347 @@ +# one morph per form variant +# - this means we create separate lexical entries for each variant (separate lexical entries, separate lexical senses) +# - this has been done for root morphs and for derivation morphs (doesn't apply to inflection morphs in this particular word) +# UNCLEAR AND NOT MODELLED: how to connect allomorphs with each other? + +# Root morphemes (nothing really morphy here, except for lexinfo:RootMorph) + +# {ata:ata/1n} {bottom} +:ata_1_le a lexinfo:RootMorph, ontolex:LexicalEntry; # .../1n + ontolex:canonicalForm :ata_ata_f; # ata:, no :otherForm, because :ata is identical + ontolex:sense :ata_1n; # {bottom}, .../1n is the sense number + lexinfo:partOfSpeech lexinfo:noun. # .../1n +:ata_ata_f a ontolex:Form; # for forms, we provide both the canonical and the current form in the URI, + # so that assimilation patterns are respected. + # Form URIs created in this way are not unique, but this is not an OntoLex requirement. + ontolex:writtenRep "ata"@iu-Latn. # latin transliteration +:ata_1n a ontolex:LexicalSense; # we encode the sense number (.../1n) only in URI + skos:definition "bottom"@en. + +# {atausi:atausiq/1n} {one}, canonical form +:atausiq_1_le a lexinfo:RootMorph, ontolex:LexicalEntry; # .../1n + ontolex:canonicalForm :atausiq_atausiq_f; # :atausiq + ontolex:sense :atausiq_1n; # {one}, .../1n + lexinfo:partOfSpeech lexinfo:noun. # .../1n +:atausiq_atausiq_f a ontolex:Form; + ontolex:writtenRep "atausiq"@iu-Latn. +:atausiq_1n a ontolex:LexicalSense; + skos:definition "one"@en. + +# {atausi:atausiq/1n} {one}, contextual variant +:atausi_for_atausiq_1_le a lexinfo:RootMorph, ontolex:LexicalEntry; # .../1n + ontolex:canonicalForm :atausiq_atausi_f; # atausi:, assimilated form + ontolex:sense :atausi_for_atausiq_1n; # {one}, .../1n + lexinfo:partOfSpeech lexinfo:noun. # .../1n +:atausiq_atausi_f a ontolex:Form; + ontolex:writtenRep "atausi"@iu-Latn. +:atausi_for_atausiq_1n a ontolex:LexicalSense; + skos:definition "one"@en. + + +# Derivation + +:verb a morph:GrammaticalMeaning; + rdfs:comment "for derivation morphemes that produce or require verbs"@en; + lexinfo:partOfSpeech lexinfo:verb. +:noun a morph:GrammaticalMeaning; + rdfs:comment "for derivation morphemes that produce or require verbs"@en; + lexinfo:partOfSpeech lexinfo:noun. + +# {u:u/1nv} {existence; is} +:u_1_le a ontolex:Affix; # morph:Morph is redundant here, but ok ... + ontolex:canonicalForm :u_u_f; # we have more than one u-form, so the ids have to be more specific, + # as these forms differ in their phonological context + ontolex:sense :u_1nv; + morph:grammaticalMeaning :verb; # ../1n*v* + morph:baseConstraint :noun. # ../1*n*v +:u_1nv a ontolex:LexicalSense; + skos:definition "existence; is"@en. +:u_u_f a ontolex:Form; + ontolex:writtenRep "u". +# BTW: ata=u- is an example of incorporation, so this is covered here, as well + +# {u:uq/3vv} {frequentative: many subjects; many objects}, canonical form +:uq_3_le a ontolex:Affix; + ontolex:canonicalForm :uq_uq_f; + ontolex:otherForm :uq_u_f; + ontolex:sense :uq_3vv; + morph:grammaticalMeaning :verb; + morph:baseConstraint :verb. +:uq_uq_f a ontolex:Form; + ontolex:writtenRep "uq"@iu-Latn. +:uq_3vv a ontolex:LexicalSense; + skos:definition "frequentative: many subjects; many objects"@en. + +# {u:uq/3vv} {frequentative: many subjects; many objects}, contextual variant +:u_for_uq_3_le a ontolex:Affix; + ontolex:canonicalForm :uq_u_f; + ontolex:sense :u_for_uq_3vv; + morph:grammaticalMeaning :verb; + morph:baseConstraint :verb. +:uq_u_f a ontolex:Form; + ontolex:writtenRep "u"@iu-Latn. +:u_for_uq_3vv a ontolex:LexicalSense; + skos:definition "frequentative: many subjects; many objects"@en. + # note that we need one sense per form variant, now + +# ??? how to encode that :u_for_uq_3_le is a contextual variant of :uq_3_le? + +# {u:ut/2nn} {bag, container for; s.t. which has\...}, canonical form +:ut_2_le a ontolex:Affix; + ontolex:canonicalForm :ut_ut_f; + ontolex:sense :ut_2nn; + morph:grammaticalMeaning :noun; + morph:baseConstraint :noun. +:ut_ut_f a ontolex:Form; + ontolex:writtenRep "ut"@iu-Latn. +:ut_2nn a ontolex:LexicalSense; + skos:definition "bag, container for; s.t. which has\..."@en. + +# {u:ut/2nn} {bag, container for; s.t. which has\...}, contextual variant +:u_for_ut_2_le a ontolex:Affix; + ontolex:canonicalForm :ut_u_f; + ontolex:sense :u_for_ut_2nn; + morph:grammaticalMeaning :noun; + morph:baseConstraint :noun. +:ut_u_f a ontolex:Form; + ontolex:writtenRep "u"@iu-Latn. +:u_for_ut_2nn a ontolex:LexicalSense; + skos:definition "bag, container for; s.t. which has\..."@en. + +# {si:liq/2nv} {to provide, supply; to put s.t. (trans.: to, on s.o.)}, canonical form +:liq_2_le a ontolex:Affix; + ontolex:canonicalForm :liq_liq_f; + ontolex:otherForm :liq_si_f; + ontolex:sense :liq_2nv; + morph:grammaticalMeaning :noun; + morph:baseConstraint :verb. +:liq_liq_f a ontolex:Form; + ontolex:writtenRep "liq"@iu-Latn. +:liq_2nv a ontolex:LexicalSense; + skos:definition "to provide, supply; to put s.t. (trans.: to, on s.o.)"@en. + +# {si:liq/2nv} {to provide, supply; to put s.t. (trans.: to, on s.o.)}, contextual variant +:si_for_liq_2_le a ontolex:Affix; + ontolex:canonicalForm :liq_liq_f; + ontolex:sense :si_for_liq_2nv; + morph:grammaticalMeaning :noun; + morph:baseConstraint :verb. +:liq_si_f a ontolex:Form; + ontolex:writtenRep "si"@iu-Latn. +:si_for_liq_2nv a ontolex:LexicalSense; + skos:definition "to provide, supply; to put s.t. (trans.: to, on s.o.)"@en. + + +# {si:si/2vv} {the action is being done now, where it was not the case before; readiness, commencement of action or motion} +:si_2_le a ontolex:Affix; + ontolex:canonicalForm :si_si_f; + ontolex:sense :si_2vv; + morph:grammaticalMeaning :verb; + morph:baseConstraint :verb. +:si_si_f a ontolex:Form; + ontolex:writtenRep "si"@iu-Latn. +:si_2vv a ontolex:LexicalSense; + skos:definition "the action is being done now, where it was not the case before; readiness, commencement of action or motion"@en. + +# {si:siq/1vv} {to put or bring out, to be put or brought up for some natural process; to be waiting for an action to be performed or completed}, canonical form +:siq_1_le a ontolex:Affix; + ontolex:canonicalForm :siq_siq_f; + ontolex:otherForm :siq_si_f; + ontolex:sense :siq_1vv; + morph:grammaticalMeaning :verb; + morph:baseConstaint :verb. +:siq_1vv a ontolex:LexicalSense; + skos:definition "to put or bring out, to be put or brought up for some natural process; to be waiting for an action to be performed or completed"@en. +:siq_siq_f a ontolex:Form; + ontolex:writtenRep "siq"@iu-Latn. + +# {si:siq/1vv} {to put or bring out, to be put or brought up for some natural process; to be waiting for an action to be performed or completed}, contextual variant +:si_for_siq_1_le a ontolex:Affix; + ontolex:canonicalForm :siq_si_f; + ontolex:sense :si_for_siq_1vv; + morph:grammaticalMeaning :verb; + morph:baseConstaint :verb. +:si_for_siq_1vv a ontolex:LexicalSense; + skos:definition "to put or bring out, to be put or brought up for some natural process; to be waiting for an action to be performed or completed"@en. +:siq_si_f a ontolex:Form; + ontolex:writtenRep "si"@iu-Latn. + + +# {si:siq/2vn} {custom; way; habit; manner of doing s.t.}, canonical form +# Note: must be a different entry because it has a different grammaticalMeaning, coupled with a different sense +:siq_2_le a ontolex:Affix; + ontolex:canonicalForm :siq_siq_f; + ontolex:sense :siq_2vn; + morph:grammaticalMeaning :noun; + morph:baseConstaint :verb. +:siq_2vn a ontolex:LexicalSense; + skos:definition "custom; way; habit; manner of doing s.t."@en. + +# {si:siq/2vn} {custom; way; habit; manner of doing s.t.}, contextual variant +:si_for_siq_2_le a ontolex:Affix; + ontolex:canonicalForm :siq_si_f; + ontolex:sense :si_for_siq_2vn; + morph:grammaticalMeaning :noun; + morph:baseConstaint :verb. +:si_for_siq_2vn a ontolex:LexicalSense; + skos:definition "custom; way; habit; manner of doing s.t."@en. + +# {lu:luk/3vv} {to perform an action in a poor or bad manner}, canonical form +:luk_3_le a ontolex:Affix; + ontolex:canonicalForm :luk_luk_f; + ontolex:sense :luk_3vv; + morph:grammaticalMeaning :verb; + morph:baseConstraint :verb. +:luk_luk_f a ontolex:Form; + ontolex:writtenRep "luk"@iu-Latn. +:luk_3vv a ontolex:LexicalSense; + skos:definition "to perform an action in a poor or bad manner"@en. + +# {lu:luk/3vv} {to perform an action in a poor or bad manner}, contextual variant +:lu_for_luk_3_le a ontolex:Affix; + ontolex:canonicalForm :luk_lu_f; + ontolex:sense :lu_for_luk_3vv; + morph:grammaticalMeaning :verb; + morph:baseConstraint :verb. +:luk_lu_f a ontolex:Form; + ontolex:writtenRep "lu"@iu-Latn. +:lu_for_luk_3vv a ontolex:LexicalSense; + skos:definition "to perform an action in a poor or bad manner"@en. + +# Inflection morphemes + +# for -lugu: conventionally, Inuktitut inflection morphemes for transitive +# verbs indicate subject and object agreement. -lugu is different in +# indicating only object agreement (i.e., with absolutive argument) +# However, it is also a shift-reference marker, so it cannot refer to a +# third person subject, but only to a fourth (obviative/distal third) person subject +# here, we use multiple grammatical meanings to express different agreement patterns + +# Lexinfo extensions (as for number/person agreement, there may be better ways to do that) +:fourthPerson a lexinfo:Person; + rdfs:comment "third person argument marked for switch reference (*different* third person)"@en. +:sbjPerson rdfs:subPropertyOf lexinfo:person; + rdfs:comment "person of subject argument in transitive or ditransive verbs in languages with polypersonal agreement"@en. +:objPerson rdfs:subPropertyOf lexinfo:person; + rdfs:comment "person of direct object argument in transitive or ditransive verbs in languages with polypersonal agreement"@en. +:sbjNumber rdfs:subPropertyOf lexinfo:number; + rdfs:comment "number of subject argument in transitive or ditransive verbs in languages with polypersonal agreement"@en. +:objNumber rdfs:subPropertyOf lexinfo:number; + rdfs:comment "person of direct object argument in transitive or ditransive verbs in languages with polypersonal agreement"@en. + +# grammatical meanings for inflection morphemes +:tv_1d_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:firstPerson; + :sbjNumber lexinfo:dual; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_1s_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:firstPerson; + :sbjNumber lexinfo:singular; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_1p_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:firstPerson; + :sbjNumber lexinfo:plural; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_2d_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:secondPerson; + :sbjNumber lexinfo:dual; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_2s_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:secondPerson; + :sbjNumber lexinfo:singular; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_2p_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:secondPerson; + :sbjNumber lexinfo:plural; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_4d_3s a morph:GrammaticalMeaning; + :sbjPerson :fourthPerson; + :sbjNumber lexinfo:dual; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_4s_3s a morph:GrammaticalMeaning; + :sbjPerson :fourthPerson; + :sbjNumber lexinfo:singular; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_4p_3s a morph:GrammaticalMeaning; + :sbjPerson :fourthPerson; + :sbjNumber lexinfo:plural; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:verbal_participle a lexinfo:Mood; + rdfs:comment "Inuktitut verbal participles are finite verbs (sic!) in subordinate clauses. They tend to be translated by present participles in English, hence the name."@en. + +# for -lugu +# {lugu:lugu/tv-part-1d-3s-fut} {part. future: while we (two) \...him/her/it} +# {lugu:lugu/tv-part-1p-3s-fut} {part. future: while we (many) \...him/her/it} +# {lugu:lugu/tv-part-1s-3s-fut} {part. future: while I \...him/her/it} +# {lugu:lugu/tv-part-2d-3s-fut} {part. future: while you (two) \...him/her/it} +# {lugu:lugu/tv-part-2p-3s-fut} {part. future: while you (many) \...him/her/it} +# {lugu:lugu/tv-part-2s-3s-fut} {part. future: while you \...him/her/it} +# {lugu:lugu/tv-part-4d-3s-fut} {part. future: while they (two) \...him/her/it} +# {lugu:lugu/tv-part-4p-3s-fut} {part. future: while they (many) \...him/her/it} +# {lugu:lugu/tv-part-4s-3s-fut} {part. future: while he/she/it \...him/her/it} +:lugu_tv_le a ontolex:Affix; + ontolex:canonicalForm :lugu_lugu_f; + ontolex:sense :lugu_tv_part_fut; + ontolex:baseConstraint :verb; # we're doing verbal inflection here, so we can attach to a verbal base, only + ontolex:grammaticalMeaning + :tv_1d_3s, :tv_1p_3s, :tv_1s_3s, # these are alternative meanings + :tv_2d_3s, :tv_2p_3s, :tv_2s_3s, + :tv_4d_3s, :tv_4p_3s, :tv_4s_3s; + # Note: In this way, we cannot disambiguate forms for their different grammatical meanings + # If that would be intended, we would need to create one lexical entry per feature combination. + # the following features are shared across all forms (meanings), so we encode them directly at the entry level + lexinfo:mood :verbal_participle; + lexinfo:tense :future. + # Note: Inuktitut does not inflect for grammatical tense, but only for mood. Some moods have future readings, though. + # Explicit tense (temporal) information is provided in derivation morphemes -- which is optional, though. + +:lugu_tv_part_fut a ontolex:LexicalSense; + skos:definiton "part. future: while s.o. does something to s.t. (object)"@en. +:lugu_lugu_f a ontolex:Form; + ontolex:writtenRep "lugu"@iu-Latn. + +# the following is for the analysis -lu=gu +# {gu:guk/tv-imp-2s-3s} {order: you \...him/her/it} , canonical form +:guk_tv_le a ontolex:Affix; + ontolex:canonicalForm :guk_guk_f; + ontolex:sense :guk_tv_imp; + morph:baseConstraint :verb; + morph:grammaticalMeaning :tv_2s_3s; + lexinfo:mood lexinfo:imperative. # yes, that's there, already + +:guk_guk_f a ontolex:Form; + ontolex:writtenRep "guk"@iu-Latn. +:guk_tv_imp a ontolex:LexicalSense; + skos:definition "order: you \...him/her/it"@en. + +# {gu:guk/tv-imp-2s-3s} {order: you \...him/her/it}, contextual variant +:gu_for_guk_tv_le a ontolex:Affix; + ontolex:canonicalForm :guk_gu_f; + ontolex:sense :gu_for_guk_tv_imp; + morph:baseConstraint :verb; + morph:grammaticalMeaning :tv_2s_3s; + lexinfo:mood lexinfo:imperative. # yes, that's there, already + +:guk_gu_f a ontolex:Form; + ontolex:writtenRep "gu"@iu-Latn. +:gu_for_guk_tv_imp a ontolex:LexicalSense; + skos:definition "order: you \...him/her/it"@en. + diff --git a/data/polysynthetic/every-form-variant-one-morph/atausiulugu.spalding.ttl b/data/polysynthetic/every-form-variant-one-morph/atausiulugu.spalding.ttl new file mode 100644 index 0000000..4fd7a88 --- /dev/null +++ b/data/polysynthetic/every-form-variant-one-morph/atausiulugu.spalding.ttl @@ -0,0 +1,66 @@ +# Addenda from Spalding +# no changes here + +:spalding1998 a lime:Lexicon; + rdfs:comment """Source Spalding, Alex, "Inuktitut - A Multi-Dialectal OutlineDictionary". Nunavut Arctic College, Iqaluit, Nunavut, Canada, 1998."""@en; # NEW + lime:entry :ata_1_le, :atausiq_1_le, :luk_3_le. + +# we keep original URIs, but limit ourselves to information provided by Spalding +# Uqailaut is very closely based on Spalding, to these could be put into the same graph/lime:Lexicon, but here, we assume that we have distinct graphs to reflect the provenance differences. + +:ata_1_le a lexinfo:RootMorph, ontolex:LexicalEntry; + ontolex:canonicalForm :ata_ata_f; + ontolex:sense :ata_1n; + lexinfo:partOfSpeech lexinfo:noun. # SAME: "type: nominal root" +:ata_ata_f a ontolex:Form; + ontolex:writtenRep + "ata"@iu-Latn, # SAME: latin transliteration + "ᐊᑕ"@iu-Cans. # NEW: Unified Canadian Aboriginal Syllabics +:ata_1n a ontolex:LexicalSense; # SAME: "meaning: bottom" + skos:definition "bottom"@en. + +:atausiq_1_le a lexinfo:RootMorph, ontolex:LexicalEntry; + ontolex:canonicalForm :atausiq_atausiq_f; + ontolex:sense :atausiq_1n; + lexinfo:partOfSpeech lexinfo:noun. # SAME: "type: nominal root" +:atausiq_atausiq_f a ontolex:Form; + ontolex:writtenRep "atausiq"@iu-Latn, "ᐊᑕᐅᓯᖅ"@iu-Cans. +:atausiq_1n a ontolex:LexicalSense; + skos:definition "one"@en; # SAME + ontolex:concept :number_quantity. # NEW + +:number_quantity a ontolex:LexicalConcept; # NEW + rdfs:comment "Semantic category number; quantity"@en. # NEW + # Note: this is more like a domain, rather than a concept, may be controversial ... + +# URI from Uqailaut ... +:luk_3_le a ontolex:Affix; + ontolex:canonicalForm :luk_luk_f; + ontolex:sense :luk_3vv; + morph:grammaticalMeaning :verb; # SAME + morph:baseConstraint :verb; # SAME + rdfs:comment "Mobility this suffix is mobile: it can be used at will with all roots and stems of the proper type"@en; + rdfs:comment "Position this suffix must be followed by another suffix, i.e. it cannot occur in word-final position"@en. + +:luk_3vv a ontolex:LexicalSense; + skos:definition "to perform an action in a poor or bad manner"@en. # SAME + +:luk_luk_f a ontolex:Form; + ontolex:writtenRep "luk"@iu-Latn, "ᓗᒃ"@iu-Cans; + rdfs:comment # NEW, also encoded as DerivationRule and Replacement + "After 'a', 'i', 'u' When the stem ends with 'a', 'i' or 'u', this affix takes the form luk ; it has no effect on the stem"@en, + "After 't' When the stem ends with 't', this affix takes the form luk; it deletes the end character of the stem [_t + luk → _luk]."@en, + "After 'k' When the stem ends with 'k', this affix takes the form luk; it deletes the end character of the stem [_k + luk → _luk]."@en, + "After 'q' When the stem ends with 'q', this affix takes the form luk; it deletes the end character of the stem [_q + luk → _luk]."@en . + +# implementation of the assimilation rule for :luk_luk_f + +:luk_3_rule a morph:DerivationRule; + morph:involves :luk_3_le; + morph:replacement :luk_3_luk_replacement. +:luk_3_luk_replacement a morph:Replacement; + morph:source "(([aiu])|[tqk]?)-$"; + morph:target "\\2luk-". +# Note: In that replacement, the final - is used to mark incomplete forms +# We use a capturing group to keep [aiu] (= second group), we delete [qtk] if present. If neither is found the source condition is not applicable +# and, thus, the rule is not. \ No newline at end of file diff --git a/data/polysynthetic/every-form-variant-one-morph/atausiulugu.ttl b/data/polysynthetic/every-form-variant-one-morph/atausiulugu.ttl new file mode 100644 index 0000000..c821a43 --- /dev/null +++ b/data/polysynthetic/every-form-variant-one-morph/atausiulugu.ttl @@ -0,0 +1,10 @@ +# combinatorics of atausiulugu, modelling as a rdf:Seq, only + +:atausiulugu_le a ontolex:Word; + ontolex:canonicalForm :atausiulugu_atausiulugu_f. +:atausiulugu_atausiulugu_f a ontolex:Form; + ontolex:writtenRep "atausiulugu"@iu-Latn. +:atausiulugu_atausiulugu_f a rdfs:Seq; + rdf:_1 :atausi_for_atausiq_1_le; # this is a contextual variant of :atausiq_1_le + rdf:_2 :u_1_le; # happens to be same as the canonical form + rdf:_3 :lugu_tv_part_le. # happens to be same as the canonical form \ No newline at end of file diff --git a/data/polysynthetic/one-morph-with-multiple-forms/0-prefixes.ttl b/data/polysynthetic/one-morph-with-multiple-forms/0-prefixes.ttl new file mode 100644 index 0000000..bafa812 --- /dev/null +++ b/data/polysynthetic/one-morph-with-multiple-forms/0-prefixes.ttl @@ -0,0 +1,8 @@ +PREFIX : +PREFIX ontolex: +PREFIX rdfs: +PREFIX lexinfo: +PREFIX morph: +PREFIX skos: +PREFIX lime: +PREFIX rdf: diff --git a/data/polysynthetic/one-morph-with-multiple-forms/atausiulugu.morphs.ttl b/data/polysynthetic/one-morph-with-multiple-forms/atausiulugu.morphs.ttl new file mode 100644 index 0000000..febc59f --- /dev/null +++ b/data/polysynthetic/one-morph-with-multiple-forms/atausiulugu.morphs.ttl @@ -0,0 +1,280 @@ +# Naming conventions + +# as morph:grammaticalMeaning is a morph:Morph-level feature, the following two lines must be put into different entries +# that means that we need to mint distinct URIs for them. I take the number (.../1...) to identify senses for all homographic morphemes +# in an umambiguous fashion, so a URI of the type CANONICAL_FORM + "_" + NUMBER + "_le" should be unambiguous + +# Root morphemes (nothing really morphy here, except for lexinfo:RootMorph) + +# {ata:ata/1n} {bottom} +:ata_1_le a lexinfo:RootMorph, ontolex:LexicalEntry; # .../1n + ontolex:canonicalForm :ata_ata_f; # ata:, no :otherForm, because :ata is identical + ontolex:sense :ata_1n; # {bottom}, .../1n is the sense number + lexinfo:partOfSpeech lexinfo:noun. # .../1n +:ata_ata_f a ontolex:Form; # for forms, we provide both the canonical and the current form in the URI, + # so that assimilation patterns are respected. + # Form URIs created in this way are not unique, but this is not an OntoLex requirement. + ontolex:writtenRep "ata"@iu-Latn. # latin transliteration +:ata_1n a ontolex:LexicalSense; # we encode the sense number (.../1n) only in URI + skos:definition "bottom"@en. + +# {atausi:atausiq/1n} {one} +# this is formed by ata/1n u/1nv siq/2vn "the thing which is at the bottom", lit. "bottom is (it's) habit" +# but the analyzer doesn't tell us +:atausiq_1_le a lexinfo:RootMorph, ontolex:LexicalEntry; # .../1n + ontolex:canonicalForm :atausiq_atausiq_f; # :atausiq + ontolex:otherForm :atausiq_atausi_f; # atausi:, assimilated form + ontolex:sense :atausiq_1n; # {one}, .../1n + lexinfo:partOfSpeech lexinfo:noun. # .../1n +:atausiq_atausi_f a ontolex:Form; + ontolex:writtenRep "atausi"@iu-Latn. +:atausiq_atausiq_f a ontolex:Form; + ontolex:writtenRep "atausiq"@iu-Latn. +:atausiq_1n a ontolex:LexicalSense; + skos:definition "one"@en. + +# Derivation + +:verb a morph:GrammaticalMeaning; + rdfs:comment "for derivation morphemes that produce or require verbs"@en; + lexinfo:partOfSpeech lexinfo:verb. +:noun a morph:GrammaticalMeaning; + rdfs:comment "for derivation morphemes that produce or require verbs"@en; + lexinfo:partOfSpeech lexinfo:noun. + +# {u:u/1nv} {existence; is} +:u_1_le a ontolex:Affix; # morph:Morph is redundant here, but ok ... + ontolex:canonicalForm :u_u_f; # we have more than one u-form, so the ids have to be more specific, + # as these forms differ in their phonological context + ontolex:sense :u_1nv; + morph:grammaticalMeaning :verb; # ../1n*v* + morph:baseConstraint :noun. # ../1*n*v +:u_1nv a ontolex:LexicalSense; + skos:definition "existence; is"@en. +:u_u_f a ontolex:Form; + ontolex:writtenRep "u". +# BTW: ata=u- is an example of incorporation, so this is covered here, as well + +# {u:uq/3vv} {frequentative: many subjects; many objects} +:uq_3_le a ontolex:Affix; + ontolex:canonicalForm :uq_uq_f; + ontolex:otherForm :uq_u_f; + ontolex:sense :uq_3vv; + morph:grammaticalMeaning :verb; + morph:baseConstraint :verb. +:uq_uq_f a ontolex:Form; + ontolex:writtenRep "uq"@iu-Latn. +:uq_u_f a ontolex:Form; + ontolex:writtenRep "u"@iu-Latn. +:uq_3vv a ontolex:LexicalSense; + skos:definition "frequentative: many subjects; many objects"@en. + +# {u:ut/2nn} {bag, container for; s.t. which has\...} +:ut_2_le a ontolex:Affix; + ontolex:canonicalForm :ut_ut_f; + ontolex:otherForm :ut_u_f; + ontolex:sense :ut_2nn; + morph:grammaticalMeaning :noun; + morph:baseConstraint :noun. +:ut_ut_f a ontolex:Form; + ontolex:writtenRep "ut"@iu-Latn. +:ut_u_f a ontolex:Form; + ontolex:writtenRep "u"@iu-Latn. +:ut_2nn a ontolex:LexicalSense; + skos:definition "bag, container for; s.t. which has ..."@en. + +# {si:liq/2nv} {to provide, supply; to put s.t. (trans.: to, on s.o.)} +:liq_2_le a ontolex:Affix; + ontolex:canonicalForm :liq_liq_f; + ontolex:otherForm :liq_si_f; + ontolex:sense :liq_2nv; + morph:grammaticalMeaning :noun; + morph:baseConstraint :verb. +:liq_liq_f a ontolex:Form; + ontolex:writtenRep "liq"@iu-Latn. +:liq_si_f a ontolex:Form; + ontolex:writtenRep "si"@iu-Latn. +:liq_2nv a ontolex:LexicalSense; + skos:definition "to provide, supply; to put s.t. (trans.: to, on s.o.)"@en. + +# {si:si/2vv} {the action is being done now, where it was not the case before; readiness, commencement of action or motion} +:si_2_le a ontolex:Affix; + ontolex:canonicalForm :si_si_f; + ontolex:sense :si_2vv; + morph:grammaticalMeaning :verb; + morph:baseConstraint :verb. +:si_si_f a ontolex:Form; + ontolex:writtenRep "si"@iu-Latn. +:si_2vv a ontolex:LexicalSense; + skos:definition "the action is being done now, where it was not the case before; readiness, commencement of action or motion"@en. + +# {si:siq/1vv} {to put or bring out, to be put or brought up for some natural process; to be waiting for an action to be performed or completed} +:siq_1_le a ontolex:Affix; + ontolex:canonicalForm :siq_siq_f; + ontolex:otherForm :siq_si_f; + ontolex:sense :siq_1vv; + morph:grammaticalMeaning :verb; + morph:baseConstaint :verb. +:siq_1vv a ontolex:LexicalSense; + skos:definition "to put or bring out, to be put or brought up for some natural process; to be waiting for an action to be performed or completed"@en. +:siq_siq_f a ontolex:Form; + ontolex:writtenRep "siq"@iu-Latn. +:siq_si_f a ontolex:Form; + ontolex:writtenRep "si"@iu-Latn. + +# {si:siq/2vn} {custom; way; habit; manner of doing s.t.} +# Note: must be a different entry because it has a different grammaticalMeaning, coupled with a different sense +:siq_2_le a ontolex:Affix; + ontolex:canonicalForm :siq_siq_f; + ontolex:otherForm :siq_si_f; + ontolex:sense :siq_2vn; + morph:grammaticalMeaning :noun; + morph:baseConstaint :verb. +:siq_2vn a ontolex:LexicalSense; + skos:definition "custom; way; habit; manner of doing s.t."@en. +# :siq_siq_f and :siq_si_f are the same as for si:siq/1vv, and this is ok, because they have the same assimilation patterns + +# {lu:luk/3vv} {to perform an action in a poor or bad manner} +# Note: ...lugu can be either -lu=gu or -lugu +:luk_3_le a ontolex:Affix; + ontolex:canonicalForm :luk_luk_f; + ontolex:otherForm :luk_lu_f; + ontolex:sense :luk_3vv; + morph:grammaticalMeaning :verb; + morph:baseConstraint :verb. +:luk_luk_f a ontolex:Form; + ontolex:writtenRep "luk"@iu-Latn. +:luk_lu_f a ontolex:Form; + ontolex:writtenRep "lu"@iu-Latn. +:luk_3vv a ontolex:LexicalSense; + skos:definition "to perform an action in a poor or bad manner"@en. + +# Note: by using grammatical meaning and base constraint, we can ensure proper combinatoric semantics, +# BUT we cannot indicate whether a form is complete or not: all the verbal forms *require* inflection morphemes + +# Inflection morphemes + +# for -lugu: conventionally, Inuktitut inflection morphemes for transitive +# verbs indicate subject and object agreement. -lugu is different in +# indicating only object agreement (i.e., with absolutive argument) +# However, it is also a shift-reference marker, so it cannot refer to a +# third person subject, but only to a fourth (obviative/distal third) person subject +# here, we use multiple grammatical meanings to express different agreement patterns + +# Lexinfo extensions (as for number/person agreement, there may be better ways to do that) +:fourthPerson a lexinfo:Person; + rdfs:comment "third person argument marked for switch reference (*different* third person)"@en. +:sbjPerson rdfs:subPropertyOf lexinfo:person; + rdfs:comment "person of subject argument in transitive or ditransive verbs in languages with polypersonal agreement"@en. +:objPerson rdfs:subPropertyOf lexinfo:person; + rdfs:comment "person of direct object argument in transitive or ditransive verbs in languages with polypersonal agreement"@en. +:sbjNumber rdfs:subPropertyOf lexinfo:number; + rdfs:comment "number of subject argument in transitive or ditransive verbs in languages with polypersonal agreement"@en. +:objNumber rdfs:subPropertyOf lexinfo:number; + rdfs:comment "person of direct object argument in transitive or ditransive verbs in languages with polypersonal agreement"@en. + +# grammatical meanings for inflection morphemes +:tv_1d_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:firstPerson; + :sbjNumber lexinfo:dual; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_1s_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:firstPerson; + :sbjNumber lexinfo:singular; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_1p_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:firstPerson; + :sbjNumber lexinfo:plural; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_2d_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:secondPerson; + :sbjNumber lexinfo:dual; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_2s_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:secondPerson; + :sbjNumber lexinfo:singular; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_2p_3s a morph:GrammaticalMeaning; + :sbjPerson lexinfo:secondPerson; + :sbjNumber lexinfo:plural; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_4d_3s a morph:GrammaticalMeaning; + :sbjPerson :fourthPerson; + :sbjNumber lexinfo:dual; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_4s_3s a morph:GrammaticalMeaning; + :sbjPerson :fourthPerson; + :sbjNumber lexinfo:singular; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:tv_4p_3s a morph:GrammaticalMeaning; + :sbjPerson :fourthPerson; + :sbjNumber lexinfo:plural; + :objPerson lexinfo:thirdPerson; + :objNumber lexinfo:singular. + +:verbal_participle a lexinfo:Mood; + rdfs:comment "Inuktitut verbal participles are finite verbs (sic!) in subordinate clauses. They tend to be translated by present participles in English, hence the name."@en. + +# for -lugu +# {lugu:lugu/tv-part-1d-3s-fut} {part. future: while we (two) \...him/her/it} +# {lugu:lugu/tv-part-1p-3s-fut} {part. future: while we (many) \...him/her/it} +# {lugu:lugu/tv-part-1s-3s-fut} {part. future: while I \...him/her/it} +# {lugu:lugu/tv-part-2d-3s-fut} {part. future: while you (two) \...him/her/it} +# {lugu:lugu/tv-part-2p-3s-fut} {part. future: while you (many) \...him/her/it} +# {lugu:lugu/tv-part-2s-3s-fut} {part. future: while you \...him/her/it} +# {lugu:lugu/tv-part-4d-3s-fut} {part. future: while they (two) \...him/her/it} +# {lugu:lugu/tv-part-4p-3s-fut} {part. future: while they (many) \...him/her/it} +# {lugu:lugu/tv-part-4s-3s-fut} {part. future: while he/she/it \...him/her/it} +:lugu_tv_le a ontolex:Affix; + ontolex:canonicalForm :lugu_lugu_f; + ontolex:sense :lugu_tv_part_fut; + ontolex:baseConstraint :verb; # we're doing verbal inflection here, so we can attach to a verbal base, only + ontolex:grammaticalMeaning + :tv_1d_3s, :tv_1p_3s, :tv_1s_3s, # these are alternative meanings + :tv_2d_3s, :tv_2p_3s, :tv_2s_3s, + :tv_4d_3s, :tv_4p_3s, :tv_4s_3s; + # Note: In this way, we cannot disambiguate forms for their different grammatical meanings + # If that would be intended, we would need to create one lexical entry per feature combination. + # the following features are shared across all forms (meanings), so we encode them directly at the entry level + lexinfo:mood :verbal_participle; + lexinfo:tense :future. + # Note: Inuktitut does not inflect for grammatical tense, but only for mood. Some moods have future readings, though. + # Explicit tense (temporal) information is provided in derivation morphemes -- which is optional, though. + +:lugu_tv_part_fut a ontolex:LexicalSense; + skos:definiton "part. future: while s.o. does something to s.t. (object)"@en. +:lugu_lugu_f a ontolex:Form; + ontolex:writtenRep "lugu"@iu-Latn. + +# the following is for the analysis -lu=gu +# {gu:guk/tv-imp-2s-3s} {order: you \...him/her/it} +:guk_tv_le a ontolex:Affix; + ontolex:canonicalForm :guk_gu_f; + ontolex:otherForm :guk_guk_f; + ontolex:sense :guk_tv_imp; + morph:baseConstraint :verb; + morph:grammaticalMeaning :tv_2s_3s; + lexinfo:mood lexinfo:imperative. # yes, that's there, already + +:guk_gu_f a ontolex:Form; + ontolex:writtenRep "gu"@iu-Latn. +:guk_guk_f a ontolex:Form; + ontolex:writtenRep "guk"@iu-Latn. +:guk_tv_imp a ontolex:LexicalSense; + skos:definition "order: you \...him/her/it"@en. diff --git a/data/polysynthetic/one-morph-with-multiple-forms/atausiulugu.spalding.ttl b/data/polysynthetic/one-morph-with-multiple-forms/atausiulugu.spalding.ttl new file mode 100644 index 0000000..c57db4a --- /dev/null +++ b/data/polysynthetic/one-morph-with-multiple-forms/atausiulugu.spalding.ttl @@ -0,0 +1,65 @@ +# Addenda from Spalding + +:spalding1998 a lime:Lexicon; + rdfs:comment """Source Spalding, Alex, \"Inuktitut - A Multi-Dialectal OutlineDictionary\". Nunavut Arctic College, Iqaluit, Nunavut, Canada, 1998."""@en; # NEW + lime:entry :ata_1_le, :atausiq_1_le, :luk_3_le. + +# we keep original URIs, but limit ourselves to information provided by Spalding +# Uqailaut is very closely based on Spalding, to these could be put into the same graph/lime:Lexicon, but here, we assume that we have distinct graphs to reflect the provenance differences. + +:ata_1_le a lexinfo:RootMorph, ontolex:LexicalEntry; + ontolex:canonicalForm :ata_ata_f; + ontolex:sense :ata_1n; + lexinfo:partOfSpeech lexinfo:noun. # SAME: "type: nominal root" +:ata_ata_f a ontolex:Form; + ontolex:writtenRep + "ata"@iu-Latn, # SAME: latin transliteration + "ᐊᑕ"@iu-Cans. # NEW: Unified Canadian Aboriginal Syllabics +:ata_1n a ontolex:LexicalSense; # SAME: "meaning: bottom" + skos:definition "bottom"@en. + +:atausiq_1_le a lexinfo:RootMorph, ontolex:LexicalEntry; + ontolex:canonicalForm :atausiq_atausiq_f; + ontolex:sense :atausiq_1n; + lexinfo:partOfSpeech lexinfo:noun. # SAME: "type: nominal root" +:atausiq_atausiq_f a ontolex:Form; + ontolex:writtenRep "atausiq"@iu-Latn, "ᐊᑕᐅᓯᖅ"@iu-Cans. +:atausiq_1n a ontolex:LexicalSense; + skos:definition "one"@en; # SAME + ontolex:concept :number_quantity. # NEW + +:number_quantity a ontolex:LexicalConcept; # NEW + rdfs:comment "Semantic category number; quantity"@en. # NEW + # Note: this is more like a domain, rather than a concept, may be controversial ... + +# URI from Uqailaut ... +:luk_3_le a ontolex:Affix; + ontolex:canonicalForm :luk_luk_f; + ontolex:sense :luk_3vv; + morph:grammaticalMeaning :verb; # SAME + morph:baseConstraint :verb; # SAME + rdfs:comment "Mobility this suffix is mobile: it can be used at will with all roots and stems of the proper type"@en; + rdfs:comment "Position this suffix must be followed by another suffix, i.e. it cannot occur in word-final position"@en. + +:luk_3vv a ontolex:LexicalSense; + skos:definition "to perform an action in a poor or bad manner"@en. # SAME + +:luk_luk_f a ontolex:Form; + ontolex:writtenRep "luk"@iu-Latn, "ᓗᒃ"@iu-Cans; + rdfs:comment # NEW, also encoded as DerivationRule and Replacement + "After 'a', 'i', 'u' When the stem ends with 'a', 'i' or 'u', this affix takes the form luk ; it has no effect on the stem"@en, + "After 't' When the stem ends with 't', this affix takes the form luk; it deletes the end character of the stem [_t + luk → _luk]."@en, + "After 'k' When the stem ends with 'k', this affix takes the form luk; it deletes the end character of the stem [_k + luk → _luk]."@en, + "After 'q' When the stem ends with 'q', this affix takes the form luk; it deletes the end character of the stem [_q + luk → _luk]."@en . + +# implementation of the assimilation rule for :luk_luk_f + +:luk_3_rule a morph:DerivationRule; + morph:involves :luk_3_le; + morph:replacement :luk_3_luk_replacement. +:luk_3_luk_replacement a morph:Replacement; + morph:source "(([aiu])|[tqk]?)-$"; + morph:target "\\2luk-". +# Note: In that replacement, the final - is used to mark incomplete forms +# We use a capturing group to keep [aiu] (= second group), we delete [qtk] if present. If neither is found the source condition is not applicable +# and, thus, the rule is not.'' \ No newline at end of file diff --git a/data/polysynthetic/one-morph-with-multiple-forms/atausiulugu.ttl b/data/polysynthetic/one-morph-with-multiple-forms/atausiulugu.ttl new file mode 100644 index 0000000..c2c2b66 --- /dev/null +++ b/data/polysynthetic/one-morph-with-multiple-forms/atausiulugu.ttl @@ -0,0 +1,13 @@ +# combinatorics of atausiulugu + +# 1. modelling as a rdf:Seq, revised + +# REVISION Seq with forms +:atausiulugu_le a ontolex:Word; + ontolex:canonicalForm :atausiulugu_atausiulugu_f. +:atausiulugu_atausiulugu_f a ontolex:Form; + ontolex:writtenRep "atausiulugu"@iu-Latn. +:atausiulugu_atausiulugu_f a rdfs:Seq; + rdf:_1 :atausiq_atausi_f; + rdf:_2 :u_u_f; + rdf:_3 :lugu_lugu_f.