Multiword token range sometimes being saved as HEAD #475

nschneid · 2022-03-20T14:24:45Z

For words in a multiword token, when I export (download) a .conllu file, sometimes their dependents have the entire MWT in the HEAD column, e.g. 1-2 instead of 1. This breaks the viewer when I reopen the sentence.

The text was updated successfully, but these errors were encountered:

kmurphy4 · 2022-03-20T14:26:59Z

Could you post an example sentence or screenshot that breaks like this?

nschneid · 2022-03-20T14:36:42Z

Now my setup is borked and I can't get the viewer to work for any sentence even with a new upload. :( But here is the sentence that was giving me grief:

1-2	Here's	_	_	_	_	_	_	_	_
1	Here	here	ADV	RB	PronType=Dem	0	root	_	start_char=89|end_char=93
2	's	be	AUX	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	1	cop	_	start_char=93|end_char=95
3	the	the	DET	DT	Definite=Def|PronType=Art	4	det	_	start_char=96|end_char=99
4	paper	paper	NOUN	NN	Number=Sing	1	nsubj	_	start_char=100|end_char=105
5	that	that	SCONJ	WDT	PronType=Rel	11	mark	_	start_char=106|end_char=110
6	people	people	NOUN	NNS	Number=Plur	11	nsubj	_	start_char=111|end_char=117
7	who	who	PRON	WP	PronType=Rel	8	nsubj	_	start_char=118|end_char=121
8	read	read	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	6	acl:relcl	_	start_char=122|end_char=126
9	it	it	PRON	PRP	Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs	8	obj	_	start_char=127|end_char=129
10	will	will	AUX	MD	VerbForm=Fin	11	aux	_	start_char=130|end_char=134
11	find	find	VERB	VB	VerbForm=Inf	4	acl:relcl	_	start_char=135|end_char=139
12	out	out	ADP	RP	_	11	compound:prt	_	start_char=140|end_char=143
13	about	about	SCONJ	IN	_	17	mark	_	start_char=144|end_char=149
14	how	how	ADV	WRB	PronType=Int	17	advmod	_	start_char=150|end_char=153
15	resumptive	resumptive	ADJ	JJ	Degree=Pos	16	amod	_	start_char=154|end_char=164
16	pronouns	pronoun	NOUN	NNS	Number=Plur	17	nsubj	_	start_char=165|end_char=173
17	help	help	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	11	advcl	_	start_char=174|end_char=178
18	islands	island	NOUN	NNS	Number=Plur	17	obj	_	start_char=179|end_char=186
19	go	go	VERB	VB	VerbForm=Inf	17	xcomp	_	start_char=187|end_char=189
20	down	down	ADP	IN	_	19	compound:prt	_	start_char=190|end_char=194
21	a	a	DET	DT	Definite=Ind|PronType=Art	22	det	_	start_char=195|end_char=196
22	little	little	ADJ	JJ	Degree=Pos	23	obl:npmod	_	start_char=197|end_char=203
23	easier	easier	ADJ	JJR	Degree=Cmp	19	advmod	_	start_char=204|end_char=210
24	.	.	PUNCT	.	_	1	punct	_	start_char=210|end_char=211

Somehow token 2 was becoming unattached and token 24 was showing up with 1-2 as its head.

nschneid · 2022-03-20T14:44:12Z

The viewer is not working anymore in Firefox but I'm able to view and download it in Chrome just fine. And it downloads the correct parse. Do I need to clear my Firefox cache or something?

nschneid · 2022-03-20T18:56:59Z

Do I need to clear my Firefox cache or something?

And it works in Firefox Private Browsing mode, so something got messed up in my browser session.

kmurphy4 · 2022-03-20T22:22:09Z

Now my setup is borked and I can't get the viewer to work for any sentence even with a new upload. :( But here is the sentence that was giving me grief:

1-2	Here's	_	_	_	_	_	_	_	_
1	Here	here	ADV	RB	PronType=Dem	0	root	_	start_char=89|end_char=93
2	's	be	AUX	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	1	cop	_	start_char=93|end_char=95
3	the	the	DET	DT	Definite=Def|PronType=Art	4	det	_	start_char=96|end_char=99
4	paper	paper	NOUN	NN	Number=Sing	1	nsubj	_	start_char=100|end_char=105
5	that	that	SCONJ	WDT	PronType=Rel	11	mark	_	start_char=106|end_char=110
6	people	people	NOUN	NNS	Number=Plur	11	nsubj	_	start_char=111|end_char=117
7	who	who	PRON	WP	PronType=Rel	8	nsubj	_	start_char=118|end_char=121
8	read	read	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	6	acl:relcl	_	start_char=122|end_char=126
9	it	it	PRON	PRP	Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs	8	obj	_	start_char=127|end_char=129
10	will	will	AUX	MD	VerbForm=Fin	11	aux	_	start_char=130|end_char=134
11	find	find	VERB	VB	VerbForm=Inf	4	acl:relcl	_	start_char=135|end_char=139
12	out	out	ADP	RP	_	11	compound:prt	_	start_char=140|end_char=143
13	about	about	SCONJ	IN	_	17	mark	_	start_char=144|end_char=149
14	how	how	ADV	WRB	PronType=Int	17	advmod	_	start_char=150|end_char=153
15	resumptive	resumptive	ADJ	JJ	Degree=Pos	16	amod	_	start_char=154|end_char=164
16	pronouns	pronoun	NOUN	NNS	Number=Plur	17	nsubj	_	start_char=165|end_char=173
17	help	help	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	11	advcl	_	start_char=174|end_char=178
18	islands	island	NOUN	NNS	Number=Plur	17	obj	_	start_char=179|end_char=186
19	go	go	VERB	VB	VerbForm=Inf	17	xcomp	_	start_char=187|end_char=189
20	down	down	ADP	IN	_	19	compound:prt	_	start_char=190|end_char=194
21	a	a	DET	DT	Definite=Ind|PronType=Art	22	det	_	start_char=195|end_char=196
22	little	little	ADJ	JJ	Degree=Pos	23	obl:npmod	_	start_char=197|end_char=203
23	easier	easier	ADJ	JJR	Degree=Cmp	19	advmod	_	start_char=204|end_char=210
24	.	.	PUNCT	.	_	1	punct	_	start_char=210|end_char=211

Somehow token 2 was becoming unattached and token 24 was showing up with 1-2 as its head.

Hm, if I copy-paste that sentence into the textbox, it seems to work?

What else do I need to do to repro your issue?

nschneid · 2022-03-20T22:36:48Z

Not sure. In a new browser session I can't reproduce. Must have something to do with corrupted local storage or whatever in my original session.

nschneid · 2022-03-20T22:43:44Z

Oh but this is interesting: if I click the third word and enter "c" to merge left, it deletes the first word. So the merging functionality must be buggy.

nschneid · 2022-03-20T22:49:51Z

Another bug: if you initialize the tree without the multiword token, click the 2nd word and merge left, it concatenates the words in the wrong order:

1-2	'sHere	_	_	_	_	_	_	_	_

nschneid · 2022-03-20T23:38:44Z

This should check the indices to see which token is first (combining two tokens into a multiword token/supertoken):

ud-annotatrix/notatrix/src/nx/sentence.js

Line 381 in e951f72

superToken.form = (src.form || "") + (tar.form || "") || null;

also here (merging two tokens into one regular token):

ud-annotatrix/notatrix/src/nx/sentence.js

Line 324 in e951f72

src.form = (src.form || "") + (tar.form || "") || null;

keggsmurph21 · 2022-03-21T01:51:53Z

Oops, I didn't mean to close the whole issue ... but a3828a4 should fix this part:

Another bug: if you initialize the tree without the multiword token, click the 2nd word and merge left, it concatenates the words in the wrong order:
1-2	'sHere	_	_	_	_	_	_	_	_

Thanks for the hint 😁

#475 (comment)

keggsmurph21 · 2022-03-21T01:59:09Z

Oh but this is interesting: if I click the third word and enter "c" to merge left, it deletes the first word. So the merging functionality must be buggy.

f6865f1

nschneid · 2022-03-21T02:41:12Z

Thanks, pulled the update. Now I find that if I create several multiword tokens and then select one of them to split ("s"), it may split the wrong one.

keggsmurph21 closed this as completed in a3828a4 Mar 21, 2022

keggsmurph21 reopened this Mar 21, 2022

keggsmurph21 added a commit that referenced this issue Mar 21, 2022

notatrix: Prevent combining *into* subtokens

f6865f1

#475 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiword token range sometimes being saved as HEAD #475

Multiword token range sometimes being saved as HEAD #475

nschneid commented Mar 20, 2022

kmurphy4 commented Mar 20, 2022

nschneid commented Mar 20, 2022

nschneid commented Mar 20, 2022

nschneid commented Mar 20, 2022

kmurphy4 commented Mar 20, 2022

nschneid commented Mar 20, 2022

nschneid commented Mar 20, 2022

nschneid commented Mar 20, 2022

nschneid commented Mar 20, 2022 •

edited

Loading

keggsmurph21 commented Mar 21, 2022 •

edited

Loading

keggsmurph21 commented Mar 21, 2022

nschneid commented Mar 21, 2022

Multiword token range sometimes being saved as HEAD #475

Multiword token range sometimes being saved as HEAD #475

Comments

nschneid commented Mar 20, 2022

kmurphy4 commented Mar 20, 2022

nschneid commented Mar 20, 2022

nschneid commented Mar 20, 2022

nschneid commented Mar 20, 2022

kmurphy4 commented Mar 20, 2022

nschneid commented Mar 20, 2022

nschneid commented Mar 20, 2022

nschneid commented Mar 20, 2022

nschneid commented Mar 20, 2022 • edited Loading

keggsmurph21 commented Mar 21, 2022 • edited Loading

keggsmurph21 commented Mar 21, 2022

nschneid commented Mar 21, 2022

nschneid commented Mar 20, 2022 •

edited

Loading

keggsmurph21 commented Mar 21, 2022 •

edited

Loading