Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump languagetool to 6.4 #492

Merged
merged 4 commits into from
Feb 4, 2025
Merged

Bump languagetool to 6.4 #492

merged 4 commits into from
Feb 4, 2025

Conversation

rhystmills
Copy link
Contributor

@rhystmills rhystmills commented Jan 23, 2025

What does this change?

This bumps languagetool to version 6.4. There are a few high priority vulnerabilities in our sbt dependency, and some of them are sub-dependencies of languagetool.

The newest languagetool version according to maven is 6.6, but sbt wasn't able to find it. I've only bumped to Version 6.4 because 6.5 introduced a fiddly-looking error (though it compiles successfully) and I'm hoping that 6.4 will be sufficient to solve some of our dependency woes.

languagetool does not appear to be semantically versioned as there were breaking changes in 6.1 and 6.3 as well as 6.5. I've had to make a couple of small changes to address the changes in the 6.1 and 6.3, namely:

  • Updating the grammar used in a test
  • Providing a language to getRules

It's definitely worth testing this PR in CODE Composer and seeing if checks from our rule manage show up successfully!

I'd be especially keen to check some dictionary rules, because they rely on some slightly arcane workarounds, and undocumented changes in the structure of languagetool could affect them.

How to test

  1. Deploy this branch to CODE.
  2. Find a couple of rules representing each rule type in the CODE rule manager(i.e. Dictionary, LT Built-In, Regex).
  3. Test those rules in CODE Composer. Is the Checker service working as intended, and surfacing them in Composer? E.g. do you see a red or orange underline where you would expect to?

I can advise on any aspects of testing Typerighter if you're not familiar with the tool, please reach out if you require some help

@rhystmills rhystmills requested a review from a team as a code owner January 23, 2025 16:18
@rhystmills rhystmills marked this pull request as draft January 23, 2025 16:18
@rhystmills rhystmills force-pushed the rm/bump-languagetool branch from b5c6434 to b784560 Compare January 23, 2025 16:39
@rhystmills rhystmills force-pushed the rm/bump-languagetool branch from b784560 to eb960f4 Compare January 23, 2025 16:45
…uble quotes around result word)

message aborts the commit. # # On branch rm/bump-languagetool # Your branch is up to date with
'origin/rm/bump-languagetool'. # # Changes to be committed: # modified:
apps/checker/test/scala/matchers/LanguageToolMatcherTest.scala # # Changes not staged for commit: #
modified:  project/plugins.sbt # modified:  script/citest # modified:  script/start-manager # # Untracked
files: # cdk/package-lock.json # output.dict #
@rhystmills rhystmills force-pushed the rm/bump-languagetool branch from 169f8ed to 55636d2 Compare January 28, 2025 13:43
@rhystmills rhystmills changed the title Bump languagetool to 6.3.14 Bump languagetool to 6.4 Jan 30, 2025
@rhystmills rhystmills marked this pull request as ready for review January 30, 2025 10:24
@emdash-ie
Copy link

The newest languagetool version according to maven is 6.6, but sbt wasn't able to find it.

It looks like this is because the 6.6 versions are in a new dedicated repository, based on the mvnrepository page for 6.6.9. Maven Central agrees with sbt that the newest version there is 6.5.

Do we want to use languagetool’s apparent dedicated repository? Is that a thing we generally do? (I suppose it means we don’t get the same third-party checking we get from Maven Central, but I don’t really know the details of what we get from them.) I tried looking at their developer docs, but didn’t find any explanation there about the new repository.

@@ -218,7 +218,7 @@ class LanguageToolMatcherTest extends AsyncFlatSpec with Matchers {
val eventuallyMatches = instance.check(request)

val expectedMatchMessages =
List("Did you mean <suggestion>fewer</suggestion>? The noun tests is countable.")
List("Did you mean <suggestion>fewer</suggestion>? The noun \"tests\" is countable.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m surprised this match text requires quotes but the one below (The noun mistakes…) doesn’t. What’s the difference?

@jonathonherbert
Copy link
Contributor

jonathonherbert commented Feb 3, 2025

A quick note to say that we rely on some languagetool rules which may have changed behaviour between releases. You can see the full list by filtering rules by 'LT Built-in' in the rule manager —

Screenshot 2025-02-03 at 16 28 51

There's a script here to check the difference between old and new rules at compare-rule-xml which might be worth running to see if any of the rule definition XML has changed. The script should document itself, but I don't think anyone's used it yet, so super happy to pair on this if it's not clear what to do with it.

I don't think this is necessarily blocking, as our users will let us know if something's obviously not right, but it might be nice to get ahead of any changes before they land in PROD.

@emdash-ie
Copy link

There's a script here to check the difference between old and new rules at compare-rule-xml which might be worth running to see if any of the rule definition XML has changed. The script should document itself, but I don't think anyone's used it yet, so super happy to pair on this if it's not clear what to do with it.

Thanks, will test this

@emdash-ie
Copy link

I have deployed this to CODE and tested it in this article. I've checked a number of built-in, regex, and dictionary rules, which appear to work correctly (though it would be good to take a look with someone who's more familiar with typerighter).

I have had a hard time triggering some specific dictionary rules, e.g. 1,2-dihydroxypropane. But several appear to have been triggered correctly, so maybe this is fine.

@emdash-ie
Copy link

emdash-ie commented Feb 3, 2025

I ran the script and got a slightly inconclusive answer. I downloaded the english grammar file the script points to from v6.0 and from v6.4, wrote a rules.txt based on the LT built-in rules listed in typerighter-CODE, and then ran the script:

  yarn
  node compare-rule-xml.js grammar-6.0.xml grammar-6.4.xml rules.txt

Which gave this output:

Checking PUBIC_X ...
PUBIC_X is unchanged
Checking IN_PRINCIPAL ...
IN_PRINCIPAL is unchanged
Checking UPPERCASE_SENTENCE_START ...
UPPERCASE_SENTENCE_START not found in either file
Checking WHITESPACE_RULE ...
WHITESPACE_RULE not found in either file
Checking SENTENCE_WHITESPACE ...
SENTENCE_WHITESPACE not found in either file
Checking NO_SPACE_CLOSING_QUOTE ...
NO_SPACE_CLOSING_QUOTE is unchanged
Checking CURRENCY_SPACE ...
CURRENCY_SPACE is unchanged
Checking DOUBLE_PUNCTUATION ...
DOUBLE_PUNCTUATION not found in either file
Checking ...
not found in either file

A few of the rules (UPPERCASE_SENTENCE_START, WHITESPACE_RULE, SENTENCE_WHITESPACE, and DOUBLE_PUNCTUATION) were not found in either file, which suggests they might be somewhere else? I haven't been able to locate a couple of them in xml files by searching the languagetool repository (they do appear in java files, which I assume this script can't check?).

@emdash-ie
Copy link

I think I have been able to test each of those rules in my test article and have seen them correctly flagged.

Copy link

@emdash-ie emdash-ie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but it would be good to discuss on a call before merging, to make sure I have tested it well enough.

@jonathonherbert
Copy link
Contributor

jonathonherbert commented Feb 4, 2025

A few of the rules (UPPERCASE_SENTENCE_START, WHITESPACE_RULE, SENTENCE_WHITESPACE, and DOUBLE_PUNCTUATION) were not found in either file, which suggests they might be somewhere else? I haven't been able to locate a couple of them in xml files by searching the languagetool repository (they do appear in java files, which I assume this script can't check?).

That's spot on, @emdash-ie, some rules are defined as Java classes in the library. Thanks for checking the others — that gives us some confidence they're working correctly. If the tool is parsing existing user-defined rules and the builtins are working as expected, or unchanged, I think this is fine to merge, but very happy to join a call if you'd like to go over this in more detail.

@rhystmills rhystmills merged commit fd70638 into main Feb 4, 2025
4 checks passed
@rhystmills rhystmills deleted the rm/bump-languagetool branch February 4, 2025 09:33
@prout-bot
Copy link

Seen on Rule Manager (merged by @rhystmills 9 minutes and 43 seconds ago) Please check your changes!

@prout-bot
Copy link

Overdue on Checker (merged by @rhystmills 15 minutes and 3 seconds ago) What's gone wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants