-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"at least", "in general", and related expressions: fixed? ExtPos? and validator rule prohibiting det(X, Y) & nmod(Y, Z) #553
Comments
Wouldn't that mean that we are starting to treat all "at least"s as fixed expressions? |
It would still be saying they are PPs internally, which I think is fair. Though TBH I don't understand the syntactic reason for distinguishing the quantitative and non-quantitative ones—isn't it just a matter of idiomatic meaning? |
Wait, are you saying you want to keep obl+case for non quantitative "at least", but you want to put ExtPos=ADV on the head? I'm not sure that makes sense to me - obl dependents are already adverbial in the sense that any adverbial PP is, so I don't see what this adds. It makes more sense for the fixed version (with quantities). |
Right now it's the nonquantity ones that are fixed. Which I find confusing. What about dispensing with fixed entirely: at/case least/ADJ[ExtPos=ADV]/advmod 3 books at/case least/ADJ[ExtPos=ADV]/advmod some homework at/case least/ADJ/obl you are having fun |
I agree it's confusing, and I'd be for making them all be the same. But I don't think we should have a compositional looking subtree with case and then use advmod + ExtPos. I'd do either or:
I feel like mixing the two strategies is confusing. |
The problem is we can't do nmod+case for "at least some (books)", because the validator now prohibits it. |
That suggests we should prefer fixed+advmod, which is fine by me, though I will note there are variants ("at the least" comes to mind, or "at the very least") |
Ruling from the Core Group: under a word functioning as |
I was able to easily update the EWT tokens of at least/at most + nummod/det to be
A broader query for this structure, https://universal.grew.fr/?custom=677d861f9252b, also surfaces many tokens that are ranges, e.g. "2-4 days". Our convention is to treat "- 4" as a PP equivalent to "to 4". Should these also be changed to |
I can't say I find it super intuitive (I really don't think of numbers as adjectives at all), but if that's the ruling then that's what we have to do. Or to put it differently, if we don't like this for the number ranges, I think we have to take it back for a second round of discussions with the Core Group, but otherwise, yeah, this falls under that decision IMO. |
You know I think "2-3 days" resembles "2 or 3 days" and would be open to treating ranges as coordination, but I don't expect to win that fight. :) Let's change to obl then. |
OK here's a case to consider: "80-120 million barrels". It uses |
I took @jnivre 's position to be that numbers fundamentally take obl rather than nmod, unless they are standing in as a nominal head. A |
I am not completely sure what the question is here, but if it's about an "at least/most" that attaches to "80", then I would say it's still |
OK, so as the head of a full nominal. I thought I understood @jnivre & @dan-zeman to be saying that we should be guided by parent deprels rather than UPOS in formulating these rules. But results of this query suggest a strict interpretation of this rule beyond determiners/numbers will run into problems. In attributive position, we find proper names like "World's Fair" ("World's Fair museum": So, let's just say that in English, DETs and NUMs are modifiers by default, and PP modifiers of them should be |
Multiword number names are tricky and we had a section exploring them in https://arxiv.org/abs/2108.12928. What would it be in Swedish? |
I agree that multiword numbers are tricky, but for the specific case of "80 miljoner" (lit. "80 millions"), we just treat "miljoner" as an ordinary noun and attach "80" as |
Hmm. Semantically, in "80 miljoner fat" / "80 million barrels", there is a complex number serving as a quantity modifier of an entity-referring-noun. I guess the question is whether (a) a simple numeral word modifying some other word (possibly another quantity), or (N.B. By "number expression" for present purposes I mean a cardinal numeral that can be expressed with a series of digits, even if verbalized as multiple words. By "ordinary construction" I mean to rule out other morphosyntactic forms being occasionally recruited for this purpose—I can't think of a great English example for small counts but something like "books numbering 3" meaning '3 books'. Or "books in the millions", "millions of books", etc. for large counts. None of these should be |
That is indeed the question. But since (unless I’m mistaken) the validator only allows ”nummod” with upos NUM, we cannot make ”miljoner” a ”nummod”.
Skickat från Outlook för iOS<https://aka.ms/o0ukef>
…________________________________
Från: Nathan Schneider ***@***.***>
Skickat: Saturday, January 11, 2025 6:33:10 PM
Till: UniversalDependencies/UD_English-EWT ***@***.***>
Kopia: Joakim Nivre ***@***.***>; Mention ***@***.***>
Ämne: Re: [UniversalDependencies/UD_English-EWT] ExtPos/advmod for quantitative "at (the) least/most" (Issue #553)
Hmm. Semantically, in "80 miljoner fat" / "80 million barrels", there is a complex number serving as a quantity modifier of an entity-referring-noun. I guess the question is whether nummod means
(a) a simple numeral word modifying some other word (possibly another quantity), or
(b) a number expression modifying a count noun, following the language's ordinary construction for expressing counts of entities, or
(c) both.
(N.B. By "number expression" for present purposes I mean an ordinal numeral that can be expressed with a series of digits, even if verbalized as multiple words. By "ordinary construction" I mean to rule out other morphosyntactic forms being occasionally recruited for this purpose—I can't think of a great English example but something like "books numbering 3" meaning '3 books'.)
—
Reply to this email directly, view it on GitHub<#553 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABZ7ZVSHR6JSH2TQNQQQ3232KFIVNAVCNFSM6AAAAABS6CZSDWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBVGM2TCMRTG4>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy
|
@jnivre in English we tag words like "million" as NUM, and attach them as |
I see. In Swedish the corresponding word is tagged as a noun, because it behaves like a noun morphosyntactically, so I guess it is yet another case where one has to choose whether to go with morphosyntax or with semantics.
Skickat från Outlook för iOS<https://aka.ms/o0ukef>
…________________________________
Från: Nathan Schneider ***@***.***>
Skickat: Saturday, January 11, 2025 7:12:02 PM
Till: UniversalDependencies/UD_English-EWT ***@***.***>
Kopia: Joakim Nivre ***@***.***>; Mention ***@***.***>
Ämne: Re: [UniversalDependencies/UD_English-EWT] ExtPos/advmod for quantitative "at (the) least/most" (Issue #553)
@jnivre<https://github.com/jnivre> in English we tag words like "million" as NUM, and attach them as nummod in expressing dollar amounts for instance. https://universal.grew.fr/?custom=6782b41c66e8f
—
Reply to this email directly, view it on GitHub<#553 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABZ7ZVVL53NNCZ54HDKH2L32KFNHFAVCNFSM6AAAAABS6CZSDWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBVGM3DEMJTGA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
VARNING: Klicka inte på länkar och öppna inte bilagor om du inte känner igen avsändaren och vet att innehållet är säkert.
CAUTION: Do not click on links or open attachments unless you recognise the sender and know the content is safe.
När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/
E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy
|
The validator currently permits |
I always thought NUM was a semantically-based cross between NOUN and DET. Though if there's plural marking (tens of books, the 1990s), we tag as NOUN in English. |
I think that's the issue with Swedish, isn't it? Isn't "miljoner" morphologically plural?
It's sort of an inheritance from SD, but I think it makes sense in its own right, because in English "million" is not pluralized, similarly to how compounds with a number don't trigger pluralization ("a two day trip", not "a two days trip") |
Well "million" is not attaching as compound; "80" is attaching as compound. So I don't think the singular status is relevant. |
Yeah, and in "two day plan" I think "two" is also compound, not nummod, right? |
Oh, now I see your point. (I am always confused by the overloading of the term "compound" to mean both the attributive relation of a word or phrase with respect to a noun, and the internal structure of that phrase.) With a standard N+N combination, like "egg carton", it is the dependent of the With "80 million dollars", it is the dependent of the Your point is that if "80" were the |
Exactly! |
…general" - "at least" is now never fixed (previously there was a semantic distinction) - "in general" is documented as non-fixed
@amir-zeldes and I have decided that, despite a strong presumption of stare decisis for entries in the Old guidelines& New guidelinesA technical issueIt is not immediately obvious whether the superlative words after "at" should be ADJ or ADV. The guidelines previously had ADV for at best/worst. Empirically, the situation in EWT and in GUM is a strong preference for the ADJ/JJS tags. I presume we should standardize that and update the RBS tokens. in generalI have added "in general", which was not previously documented, as non-fixed. It is another ADP+ADJ combination with idiomatic meaning but can be analyzed as a regular PP. |
Some instances of "at least" attaching as nmod to a det-dependent ("at least some...") are now triggering validator errors. See UniversalDependencies/docs#1059 (comment). We might as well change them all to specify ExtPos=ADV and attach as advmod rather than nmod.
Note: non-quantitative "at least" and "at most" are considered fixed expressions, so they are already taken care of.
The text was updated successfully, but these errors were encountered: