Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiword token range sometimes being saved as HEAD #475

Open
nschneid opened this issue Mar 20, 2022 · 12 comments
Open

Multiword token range sometimes being saved as HEAD #475

nschneid opened this issue Mar 20, 2022 · 12 comments

Comments

@nschneid
Copy link

For words in a multiword token, when I export (download) a .conllu file, sometimes their dependents have the entire MWT in the HEAD column, e.g. 1-2 instead of 1. This breaks the viewer when I reopen the sentence.

@kmurphy4
Copy link
Contributor

Could you post an example sentence or screenshot that breaks like this?

@nschneid
Copy link
Author

Now my setup is borked and I can't get the viewer to work for any sentence even with a new upload. :( But here is the sentence that was giving me grief:

1-2	Here's	_	_	_	_	_	_	_	_
1	Here	here	ADV	RB	PronType=Dem	0	root	_	start_char=89|end_char=93
2	's	be	AUX	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	1	cop	_	start_char=93|end_char=95
3	the	the	DET	DT	Definite=Def|PronType=Art	4	det	_	start_char=96|end_char=99
4	paper	paper	NOUN	NN	Number=Sing	1	nsubj	_	start_char=100|end_char=105
5	that	that	SCONJ	WDT	PronType=Rel	11	mark	_	start_char=106|end_char=110
6	people	people	NOUN	NNS	Number=Plur	11	nsubj	_	start_char=111|end_char=117
7	who	who	PRON	WP	PronType=Rel	8	nsubj	_	start_char=118|end_char=121
8	read	read	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	6	acl:relcl	_	start_char=122|end_char=126
9	it	it	PRON	PRP	Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs	8	obj	_	start_char=127|end_char=129
10	will	will	AUX	MD	VerbForm=Fin	11	aux	_	start_char=130|end_char=134
11	find	find	VERB	VB	VerbForm=Inf	4	acl:relcl	_	start_char=135|end_char=139
12	out	out	ADP	RP	_	11	compound:prt	_	start_char=140|end_char=143
13	about	about	SCONJ	IN	_	17	mark	_	start_char=144|end_char=149
14	how	how	ADV	WRB	PronType=Int	17	advmod	_	start_char=150|end_char=153
15	resumptive	resumptive	ADJ	JJ	Degree=Pos	16	amod	_	start_char=154|end_char=164
16	pronouns	pronoun	NOUN	NNS	Number=Plur	17	nsubj	_	start_char=165|end_char=173
17	help	help	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	11	advcl	_	start_char=174|end_char=178
18	islands	island	NOUN	NNS	Number=Plur	17	obj	_	start_char=179|end_char=186
19	go	go	VERB	VB	VerbForm=Inf	17	xcomp	_	start_char=187|end_char=189
20	down	down	ADP	IN	_	19	compound:prt	_	start_char=190|end_char=194
21	a	a	DET	DT	Definite=Ind|PronType=Art	22	det	_	start_char=195|end_char=196
22	little	little	ADJ	JJ	Degree=Pos	23	obl:npmod	_	start_char=197|end_char=203
23	easier	easier	ADJ	JJR	Degree=Cmp	19	advmod	_	start_char=204|end_char=210
24	.	.	PUNCT	.	_	1	punct	_	start_char=210|end_char=211

Somehow token 2 was becoming unattached and token 24 was showing up with 1-2 as its head.

@nschneid
Copy link
Author

The viewer is not working anymore in Firefox but I'm able to view and download it in Chrome just fine. And it downloads the correct parse. Do I need to clear my Firefox cache or something?

@nschneid
Copy link
Author

Do I need to clear my Firefox cache or something?

And it works in Firefox Private Browsing mode, so something got messed up in my browser session.

@kmurphy4
Copy link
Contributor

Now my setup is borked and I can't get the viewer to work for any sentence even with a new upload. :( But here is the sentence that was giving me grief:

1-2	Here's	_	_	_	_	_	_	_	_
1	Here	here	ADV	RB	PronType=Dem	0	root	_	start_char=89|end_char=93
2	's	be	AUX	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	1	cop	_	start_char=93|end_char=95
3	the	the	DET	DT	Definite=Def|PronType=Art	4	det	_	start_char=96|end_char=99
4	paper	paper	NOUN	NN	Number=Sing	1	nsubj	_	start_char=100|end_char=105
5	that	that	SCONJ	WDT	PronType=Rel	11	mark	_	start_char=106|end_char=110
6	people	people	NOUN	NNS	Number=Plur	11	nsubj	_	start_char=111|end_char=117
7	who	who	PRON	WP	PronType=Rel	8	nsubj	_	start_char=118|end_char=121
8	read	read	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	6	acl:relcl	_	start_char=122|end_char=126
9	it	it	PRON	PRP	Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs	8	obj	_	start_char=127|end_char=129
10	will	will	AUX	MD	VerbForm=Fin	11	aux	_	start_char=130|end_char=134
11	find	find	VERB	VB	VerbForm=Inf	4	acl:relcl	_	start_char=135|end_char=139
12	out	out	ADP	RP	_	11	compound:prt	_	start_char=140|end_char=143
13	about	about	SCONJ	IN	_	17	mark	_	start_char=144|end_char=149
14	how	how	ADV	WRB	PronType=Int	17	advmod	_	start_char=150|end_char=153
15	resumptive	resumptive	ADJ	JJ	Degree=Pos	16	amod	_	start_char=154|end_char=164
16	pronouns	pronoun	NOUN	NNS	Number=Plur	17	nsubj	_	start_char=165|end_char=173
17	help	help	VERB	VBP	Mood=Ind|Tense=Pres|VerbForm=Fin	11	advcl	_	start_char=174|end_char=178
18	islands	island	NOUN	NNS	Number=Plur	17	obj	_	start_char=179|end_char=186
19	go	go	VERB	VB	VerbForm=Inf	17	xcomp	_	start_char=187|end_char=189
20	down	down	ADP	IN	_	19	compound:prt	_	start_char=190|end_char=194
21	a	a	DET	DT	Definite=Ind|PronType=Art	22	det	_	start_char=195|end_char=196
22	little	little	ADJ	JJ	Degree=Pos	23	obl:npmod	_	start_char=197|end_char=203
23	easier	easier	ADJ	JJR	Degree=Cmp	19	advmod	_	start_char=204|end_char=210
24	.	.	PUNCT	.	_	1	punct	_	start_char=210|end_char=211

Somehow token 2 was becoming unattached and token 24 was showing up with 1-2 as its head.

Hm, if I copy-paste that sentence into the textbox, it seems to work?
image

What else do I need to do to repro your issue?

@nschneid
Copy link
Author

Not sure. In a new browser session I can't reproduce. Must have something to do with corrupted local storage or whatever in my original session.

@nschneid
Copy link
Author

Oh but this is interesting: if I click the third word and enter "c" to merge left, it deletes the first word. So the merging functionality must be buggy.

@nschneid
Copy link
Author

Another bug: if you initialize the tree without the multiword token, click the 2nd word and merge left, it concatenates the words in the wrong order:

1-2	'sHere	_	_	_	_	_	_	_	_

@nschneid
Copy link
Author

nschneid commented Mar 20, 2022

This should check the indices to see which token is first (combining two tokens into a multiword token/supertoken):

superToken.form = (src.form || "") + (tar.form || "") || null;

also here (merging two tokens into one regular token):

src.form = (src.form || "") + (tar.form || "") || null;

@keggsmurph21
Copy link
Collaborator

keggsmurph21 commented Mar 21, 2022

Oops, I didn't mean to close the whole issue ... but a3828a4 should fix this part:

Another bug: if you initialize the tree without the multiword token, click the 2nd word and merge left, it concatenates the words in the wrong order:

1-2	'sHere	_	_	_	_	_	_	_	_

Thanks for the hint 😁

@keggsmurph21
Copy link
Collaborator

Oh but this is interesting: if I click the third word and enter "c" to merge left, it deletes the first word. So the merging functionality must be buggy.

f6865f1

@nschneid
Copy link
Author

Thanks, pulled the update. Now I find that if I create several multiword tokens and then select one of them to split ("s"), it may split the wrong one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants