You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I ran into a corner case with pos tagging for imperative sentences like:
Suppose I tell you that it is true.
if run this sentence on its own then it works as expected
import winkNLP from 'wink-nlp';
import model from 'wink-eng-lite-web-model';
const nlp = winkNLP(model);
nlp.readDoc('Suppose I tell you that it is true.').printTokens();
token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
Suppose 0 Su ose Xxxxx 3 0 word suppose / VERB
I 1 I I X 2 0 word i / PRON
tell 1 te ell xxxx 1 0 word tell / VERB
you 1 yo you xxx 1 0 word you / PRON
that 1 th hat xxxx 1 0 word that / SCONJ
it 1 it it xx 1 0 word it / PRON
is 1 is is xx 1 0 word is / AUX
true 1 tr rue xxxx 1 0 word true / ADJ
. 0 . . . 0 0 punctuat . / PUNCT
if run it with text that contains one sentence before
it changes pos of suppose to pnoun
nlp.readDoc('I watch TV every day.').printTokens();
nlp.readDoc('Suppose I tell you that it is true.').printTokens();
token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
I 0 I I X 2 0 word i / PRON
watch 1 wa tch xxxx 1 0 word watch / VERB
TV 1 TV TV XX 2 0 word tv / NOUN
every 1 ev ery xxxx 1 0 word every / DET
day 1 da day xxx 1 0 word day / NOUN
. 0 . . . 0 0 punctuat . / PUNCT
total number of tokens: 6
token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
Suppose 0 Su ose Xxxxx 3 0 word suppose / PROPN
I 1 I I X 2 0 word i / PRON
tell 1 te ell xxxx 1 0 word tell / VERB
you 1 yo you xxx 1 0 word you / PRON
that 1 th hat xxxx 1 0 word that / SCONJ
it 1 it it xx 1 0 word it / PRON
is 1 is is xx 1 0 word is / AUX
true 1 tr rue xxxx 1 0 word true / ADJ
. 0 . . . 0 0 punctuat . / PUNCT
the problem occurs only with some specific sentences or specific words, I haven't figured it out yet. for example:
nlp.readDoc('I like playing football').printTokens();
nlp.readDoc('Suppose I tell you that it is true.').printTokens();
produces correct response:
Suppose 0 Su ose Xxxxx 3 0 word suppose / VERB
can it be related cache? also is there an easy way to disable cache, or make lib to parse sentence in isolation without loading model again?
versions of packages:
"wink-eng-lite-web-model": "^1.8.0",
"wink-nlp": "^2.3.0",
The text was updated successfully, but these errors were encountered:
Hi, I have an update regarding the issue, I found where the issue is located,
it's in wink-eng-lite-web-model repo, https://github.com/winkjs/wink-eng-lite-web-model/blob/0cfed33874bb7675621d58db53ddb8f37db3c1ef/src/feature.js#L192
it's related to isFirstToken variable which sets all upper case words which are not not first token to PROPN, it doesn't matter if they are in the same sentence or next one.
So sentence like TV. Suppose I tell you that it is true.. is enough to reproduce the error. For now I just changed the logic to return original pos
Hi,
I ran into a corner case with pos tagging for imperative sentences like:
Suppose I tell you that it is true.
if run this sentence on its own then it works as expected
token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
Suppose 0 Su ose Xxxxx 3 0 word suppose /
VERB
I 1 I I X 2 0 word i / PRON
tell 1 te ell xxxx 1 0 word tell / VERB
you 1 yo you xxx 1 0 word you / PRON
that 1 th hat xxxx 1 0 word that / SCONJ
it 1 it it xx 1 0 word it / PRON
is 1 is is xx 1 0 word is / AUX
true 1 tr rue xxxx 1 0 word true / ADJ
. 0 . . . 0 0 punctuat . / PUNCT
if run it with text that contains one sentence before
it changes pos of suppose to pnoun
token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
I 0 I I X 2 0 word i / PRON
watch 1 wa tch xxxx 1 0 word watch / VERB
TV 1 TV TV XX 2 0 word tv / NOUN
every 1 ev ery xxxx 1 0 word every / DET
day 1 da day xxx 1 0 word day / NOUN
. 0 . . . 0 0 punctuat . / PUNCT
total number of tokens: 6
token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
Suppose 0 Su ose Xxxxx 3 0 word suppose /
PROPN
I 1 I I X 2 0 word i / PRON
tell 1 te ell xxxx 1 0 word tell / VERB
you 1 yo you xxx 1 0 word you / PRON
that 1 th hat xxxx 1 0 word that / SCONJ
it 1 it it xx 1 0 word it / PRON
is 1 is is xx 1 0 word is / AUX
true 1 tr rue xxxx 1 0 word true / ADJ
. 0 . . . 0 0 punctuat . / PUNCT
the problem occurs only with some specific sentences or specific words, I haven't figured it out yet. for example:
produces correct response:
Suppose 0 Su ose Xxxxx 3 0 word suppose /
VERB
can it be related cache? also is there an easy way to disable cache, or make lib to parse sentence in isolation without loading model again?
versions of packages:
"wink-eng-lite-web-model": "^1.8.0",
"wink-nlp": "^2.3.0",
The text was updated successfully, but these errors were encountered: