-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add to English validator: Style should almost always mean the form is distinct from the lemma #554
Comments
I'm not sure I understand it the same way: for example, English "because" has stylistic variants "cause" and "cuz", but I'm not sure I want to say the lemma of "cause" is "because". Using "cause" is a stylistic choice, but there's no inflection going on or anything - it's just a stylistically marked alternative to "because". |
Isn't it an abbreviation? Like "'fraid" for "afraid" etc. |
Not sure, I feel like it's its own word by now, no? dictionary.com lists both cause and cuz: |
"a shortened form of because" |
Right, but "lab" is also a shortened form of "laboratory", and I don't think we've been lemmatizing the first to the second - it's now an independent form, no? I think this is different with sort of ad hoc normalization of abbreviations that take on different forms, like e.g., eg, e.g -> e.g. |
Historically yes, but "lab" and "laboratory" are both mainstream rather than marked—I don't think any of us in normal parlance would say we hold laboratory meetings, though in writing "laboratory" is common enough. I guess I am thinking that if there is an obvious/standard form of the word and a minor stylistically marked form, it makes sense to call that a stylistic variant with a shared lemma. But if they are treated as fully independent words with distinct lemmas, I would not use the Style feature. (That would make it a pure meaning feature that could apply to all sorts of formal or informal words; I take it the Style feature is "morphological" because it implies a contrast with another form.) |
(I agree that "'cause" involves a degree of conventionalization that goes beyond spelling variation—it's also part of the spoken language. I think this is similar to "'em" for "them", or "gonna" and "wanna", which we normalize to the non-colloquial lemma in combination with the Style feature. Also "c'mon" in EWT.) |
If this annotation mandates that form != lemma, then I think it shouldn't be called Style - it would really be something like "NonCanonical" or similar. Lots of words have stylistic implications but are probably they're own lemmas (for example if I speak in pirate style and say "avast" or "ahoy", but I don't think there are other lemmas for that). Is the intention here to point out non-canonical language? Or just informal/spoken language features? |
There can be exceptions for clearly archaic forms (e.g. we have "thou" as archaic) but formal/informal seems very hard to apply throughout the lexicon. Should we be comparing lexical frequencies in speech vs. edited writing in order to mark some words ("expectorate", "deceased") as formal and others ("busted", "yucky") as informal? I bet agreement would not be high if we left it to annotators' intuitions. |
I agree - but all this makes me think we should maybe not be using Style as a feature? At the moment it's only used for very few items anyway. |
For words in a paradigm, like pronouns, it is useful for explaining the variants I think! And likewise if there is an alternative spelling that can be understood as colloquial/expressive ("walkin'", "looooong"). As it is a "morphological" feature I understand it as an explanation of why a variant form of a word exists, with the canonical form as lemma. |
My understanding of
Style
is that it indicates a stylistically marked form of a word, and should normally have a more canonical lemma.An exception could be
Style=Arch
on archaic pronouns thou, ye, etc., which have no precise modern equivalent to use as the lemma.Case-sensitive equivalence:
I don't know if Grew permits searching for case-insensitive equality between two attributes of a word (@bguil?). But I know GUM has "Hmm" with lemma "hmm" and
Style=Expr
.The text was updated successfully, but these errors were encountered: