-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates to support custom tokenizer and improved RTL display #166
base: feature/ibnsaid
Are you sure you want to change the base?
Conversation
Backport review apps fixes from feature/cambridge-greek-lexicon
Allows us to build and deploy to review apps without overwriting the "latest" tag. Heroku deployments will remain isolated within the Heroku registry, so there is no chance of "overwriting" a production or staging deploy.
Conditionally apply latest tag
Add hookset and support for overriding tokenizer via get_prepared_tokens
# Conflicts: # backend/requirements.txt
@jchill-git I've done this as a PR so you can verify the changes before we merge into your feature branch. If things are working here, please go ahead and approve and merge this branch. If things aren't working, please let me know...happy to try and troubleshoot with you! |
# NOTE: Sample override for The life of Omar ben Saeed below | ||
if version_urn == "urn:cts:arabicLit:amedsaid1831.dw042.perseus-ara1:": | ||
from scaife_viewer.atlas.tokenizers import get_tokens_from_csv | ||
csv_path = Path("data/token-overrides/amedsaid1831.dw042.perseus-ara1.csv") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jchill-git This my "draft" placement for these files; I think we will likely establish a more formal convention in the future.
You can replace my stub file with your actual tokens once you've got them.
from scaife_viewer.atlas.tokenizers import get_tokens_from_csv | ||
csv_path = Path("data/token-overrides/amedsaid1831.dw042.perseus-ara1.csv") | ||
return get_tokens_from_csv(version_urn, csv_path) | ||
return super().get_prepared_tokens(version_urn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This line will be the fallback, which allows us to use the "default" tokenizer for other texts)
This PR lands a few things back to the
feature/ibnsaid
branch:Verified locally and in Codespaces:
/reader/urn:cts:arabicLit:amedsaid1831.dw042.perseus-ara1:1?mode=interlinear
To review these changes, you'll want to check out the
ibnsaid-tokenizer
branch in Codespaces and then: