Pos tagging for imperative sentence is inconsistent #139

moskaliukua · 2024-08-07T21:21:36Z

Hi,
I ran into a corner case with pos tagging for imperative sentences like:
Suppose I tell you that it is true.
if run this sentence on its own then it works as expected

import winkNLP from 'wink-nlp';
import model from 'wink-eng-lite-web-model';
const nlp = winkNLP(model);
nlp.readDoc('Suppose I tell you that it is true.').printTokens();

token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
Suppose 0 Su ose Xxxxx 3 0 word suppose / VERB
I 1 I I X 2 0 word i / PRON
tell 1 te ell xxxx 1 0 word tell / VERB
you 1 yo you xxx 1 0 word you / PRON
that 1 th hat xxxx 1 0 word that / SCONJ
it 1 it it xx 1 0 word it / PRON
is 1 is is xx 1 0 word is / AUX
true 1 tr rue xxxx 1 0 word true / ADJ
. 0 . . . 0 0 punctuat . / PUNCT

if run it with text that contains one sentence before
it changes pos of suppose to pnoun

nlp.readDoc('I watch TV every day.').printTokens();
nlp.readDoc('Suppose I tell you that it is true.').printTokens();

token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
I 0 I I X 2 0 word i / PRON
watch 1 wa tch xxxx 1 0 word watch / VERB
TV 1 TV TV XX 2 0 word tv / NOUN
every 1 ev ery xxxx 1 0 word every / DET
day 1 da day xxx 1 0 word day / NOUN
. 0 . . . 0 0 punctuat . / PUNCT

total number of tokens: 6

token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
Suppose 0 Su ose Xxxxx 3 0 word suppose / PROPN
I 1 I I X 2 0 word i / PRON
tell 1 te ell xxxx 1 0 word tell / VERB
you 1 yo you xxx 1 0 word you / PRON
that 1 th hat xxxx 1 0 word that / SCONJ
it 1 it it xx 1 0 word it / PRON
is 1 is is xx 1 0 word is / AUX
true 1 tr rue xxxx 1 0 word true / ADJ
. 0 . . . 0 0 punctuat . / PUNCT

the problem occurs only with some specific sentences or specific words, I haven't figured it out yet. for example:

 nlp.readDoc('I like playing football').printTokens();
 nlp.readDoc('Suppose I tell you that it is true.').printTokens();

produces correct response:
Suppose 0 Su ose Xxxxx 3 0 word suppose / VERB

can it be related cache? also is there an easy way to disable cache, or make lib to parse sentence in isolation without loading model again?

versions of packages:
"wink-eng-lite-web-model": "^1.8.0",
"wink-nlp": "^2.3.0",

The text was updated successfully, but these errors were encountered:

rachnachakraborty · 2024-08-09T07:02:27Z

Hi @moskaliukua

We appreciate your time and effort in elaborating the inconsistency in Pos-tagging.

We could replicate the issue. This needs a deeper dive at our end.

Shall revert on this shortly.

Thank you once again.

Best,
Rachna

moskaliukua · 2024-08-09T08:09:26Z

Hi, I have an update regarding the issue, I found where the issue is located,
it's in wink-eng-lite-web-model repo, https://github.com/winkjs/wink-eng-lite-web-model/blob/0cfed33874bb7675621d58db53ddb8f37db3c1ef/src/feature.js#L192
it's related to isFirstToken variable which sets all upper case words which are not not first token to PROPN, it doesn't matter if they are in the same sentence or next one.
So sentence like TV. Suppose I tell you that it is true.. is enough to reproduce the error. For now I just changed the logic to return original pos

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pos tagging for imperative sentence is inconsistent #139

Pos tagging for imperative sentence is inconsistent #139

moskaliukua commented Aug 7, 2024 •

edited

Loading

rachnachakraborty commented Aug 9, 2024

moskaliukua commented Aug 9, 2024

Pos tagging for imperative sentence is inconsistent #139

Pos tagging for imperative sentence is inconsistent #139

Comments

moskaliukua commented Aug 7, 2024 • edited Loading

rachnachakraborty commented Aug 9, 2024

moskaliukua commented Aug 9, 2024

moskaliukua commented Aug 7, 2024 •

edited

Loading