Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce/api/rules/csv-import endpoint #477

Merged
merged 9 commits into from
Jul 31, 2024
Merged

Introduce/api/rules/csv-import endpoint #477

merged 9 commits into from
Jul 31, 2024

Conversation

simonbyford
Copy link
Contributor

@simonbyford simonbyford commented Jul 23, 2024

What does this change?

Introduces a new endpoint /api/rules/csv-import which enables the bulk import of regex rules via a CSV file. In particular, this will provide a useful method to quickly populate typerighter with the names of incoming MPs after an election.

The endpoint accepts three parameters encoded as form-data:

  • file (required): the CSV file containing the rules data
  • tag (optional): a tag to be applied to all imported rules. The tag must already exist in typerighter.
  • category (optional): a category to be applied to all imported rules. Category is also known as "source" in the UI.

For example:

curl --location 'https://manager.typerighter.gutools.co.uk/api/rules/csv-import' \
--header 'accept: */*' \
--header 'content-type: application/json' \
--form 'file=@"/Users/Simon_Byford/Downloads/mps.csv"' \
--form 'tag="MP"'
--form 'category="Style guide and names"'

Note: this won't actually work for reasons I'll get to..

The CSV file should not contain headers, and the columns must appear in the following order: pattern,replacement,description

For example, the following CSV file is valid:

Diann?e Abbott?,Diane Abbott,"MP last elected in 2024: Labour, Hackney North and Stoke Newington"
Debbie Abrahams,Debbie Abrahams,"MP last elected in 2024: Labour, Oldham East and Saddleworth"
Rebecca Long-? ?Bailey,Rebecca Long-Bailey,"MP last elected in 2024: Labour, Salford"

If the operation is successful, the API will return the number of rules added.

How to test

For the first iteration, we decided not to expose this in the UI, instead the endpoint must be queried manually. This is a bit fiddly because it requires a cookie for authentication. The best way to get around this is to interact with the manager interface (manager.typerighter.gutools.co.uk) - for example, start creating a rule - and inspect any network request to /api/rules. You can then copy the request as cURL:

Screenshot 2024-07-23 at 16 11 56

You can then either make the necessary edits and run it directly on the command line (scary), or import it into a client like Postman:

Screenshot 2024-07-23 at 16 30 28

Screenshot 2024-07-24 at 14 28 14

Images

Using the above method, I tested this on CODE with Max's spreadsheet of 2024 MPs. I used the tag "CSV import" and category "Imported from CSV".

Screenshot 2024-07-23 at 14 52 01

@simonbyford simonbyford requested a review from a team as a code owner July 23, 2024 15:39
Copy link
Contributor

@jonathonherbert jonathonherbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant to see this! One drive-by comment.

@jonathonherbert
Copy link
Contributor

One other thing to mention while I'm here – there's a good testing story for DB operations in RuleManagerSpec that should let us test CSV data is written correctly to DB. Could we add a test? (See CAPIFixtures for an example of reading a file from our resources folder, we could add a proper CSV file to our test/resources folder, and be really sure this is doing the right thing ✨)

Very happy to pair!

@simonbyford
Copy link
Contributor Author

One other thing to mention while I'm here – there's a good testing story for DB operations in RuleManagerSpec that should let us test CSV data is written correctly to DB. Could we add a test? (See CAPIFixtures for an example of reading a file from our resources folder, we could add a proper CSV file to our test/resources folder, and be really sure this is doing the right thing ✨)

Very happy to pair!

Hey @jonathonherbert, I've added a test (with your help 😄), does it look okay?

@@ -279,4 +279,29 @@ class RulesController(
case Left(error) => BadRequest(s"Invalid request: $error")
}
}

def csvImport() = Action { implicit request =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be APIAuthAction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! 😅

Copy link
Contributor

@jonathonherbert jonathonherbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that looks great!

Tested locally with the endpoint, all works as expected. Yes, please don't merge before the Action is an APIAuthAction 😀

Two very non-blocking things I spotted whilst re-reviewing.

One broader point – it looks like we can enter identical rules, e.g. spamming the endpoint with repeated requests results in duplicates, which might be an easy mistake to make. We could make this safer with a smoke test on import that checks to see that no rules exactly match the description or pattern of a rule. Perhaps one to follow up with.

(To be fair, arguably the pattern of a rule should have a unique constraint at the DB level. I can't think of a scenario where we'd want two rules to have the same pattern.)

)

rules match {
case Right(int) => Ok(Json.toJson(int))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case Right(int) => Ok(Json.toJson(int))
case Right(noOfRulesAdded) => Ok(Json.toJson(noOfRulesAdded))

@@ -279,4 +279,29 @@ class RulesController(
case Left(error) => BadRequest(s"Invalid request: $error")
}
}

def csvImport() = Action { implicit request =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! 😅

Comment on lines 285 to 294
formData <- request.body.asMultipartFormData.toRight("No form data found in request")
file <- formData.file("file").toRight("No file found in request")
tag = formData.dataParts.get("tag") match {
case Some(tag) => Some(tag.head)
case None => None
}
category = formData.dataParts.get("category") match {
case Some(category) => Some(category.head)
case None => None
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be even neater:

Suggested change
formData <- request.body.asMultipartFormData.toRight("No form data found in request")
file <- formData.file("file").toRight("No file found in request")
tag = formData.dataParts.get("tag") match {
case Some(tag) => Some(tag.head)
case None => None
}
category = formData.dataParts.get("category") match {
case Some(category) => Some(category.head)
case None => None
}
formData <- request.body.asMultipartFormData.toRight("No form data found in request")
file <- formData.file("file").toRight("No file found in request")
tag = formData.dataParts.get("tag").flatMap(_.headOption)
category = formData.dataParts.get("category").flatMap(_.headOption)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely - thank you!

@simonbyford
Copy link
Contributor Author

One broader point – it looks like we can enter identical rules, e.g. spamming the endpoint with repeated requests results in duplicates, which might be an easy mistake to make. We could make this safer with a smoke test on import that checks to see that no rules exactly match the description or pattern of a rule. Perhaps one to follow up with.

Good point, thank you for raising it. Some further work is needed before this can actually be used, namely the introduction of another endpoint to bulk-archive existing rules (based on a tag), so I'll leave this for now

@simonbyford simonbyford merged commit cfd0bbf into main Jul 31, 2024
1 check passed
@simonbyford simonbyford deleted the csv-import branch July 31, 2024 09:57
@prout-bot
Copy link

Seen on Rule Manager (merged by @simonbyford 9 minutes and 30 seconds ago) Please check your changes!

@prout-bot
Copy link

Overdue on Checker (merged by @simonbyford 15 minutes and 3 seconds ago) What's gone wrong?

@jonathonherbert
Copy link
Contributor

jonathonherbert commented Jul 31, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants