Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve text part ingestion, tokenizers and text annotations #68

Merged
merged 27 commits into from
Jan 3, 2024

Conversation

jacobwegner
Copy link
Contributor

@jacobwegner jacobwegner commented Jul 18, 2023

We have to work around some of the "bulkification" present in CTSImporter.

Bulkification is really important for Brill (where there are textgroups with ~2.4k to 104k works.

I hope to circle back and test this branch out with Brill's content to see if there are more gains to be had, but this _should_ help within Beyond Translation, which was the original goal.
Merges CSV values with the text_part_id to fit into the parallel tokenizer workflow
@jacobwegner jacobwegner changed the base branch from atlas/partial-ingestion to main November 7, 2023 10:51
@jacobwegner jacobwegner changed the title WIP: Add get_prepared_tokens to hookset WIP: Improve text part ingestion, tokenizers and text annotations Nov 7, 2023
jacobwegner added a commit to scaife-viewer/beyond-translation-site that referenced this pull request Nov 7, 2023
- Follow ups to ROI / Scholia, v2 #77 (scaife-viewer/frontend#78)
- Improve text part ingestion, tokenizers and text annotations #68 (scaife-viewer/backend#68)
Something in implicit setup allowed SyntaxTreeNode to be registered instead
@jacobwegner jacobwegner changed the title WIP: Improve text part ingestion, tokenizers and text annotations Improve text part ingestion, tokenizers and text annotations Jan 3, 2024
@jacobwegner jacobwegner merged commit 57b4c73 into main Jan 3, 2024
@jacobwegner jacobwegner deleted the atlas/tokenizer-hookset branch January 3, 2024 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant