Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to support custom tokenizer and improved RTL display #166

Open
wants to merge 13 commits into
base: feature/ibnsaid
Choose a base branch
from

Conversation

jacobwegner
Copy link
Contributor

This PR lands a few things back to the feature/ibnsaid branch:

  • Heroku / GitHub Actions fixes
  • Improved RTL support in the interlinear widget
  • Support for custom tokenizers and a stub file demonstrating them

Verified locally and in Codespaces:

/reader/urn:cts:arabicLit:amedsaid1831.dw042.perseus-ara1:1?mode=interlinear

image

To review these changes, you'll want to check out the ibnsaid-tokenizer branch in Codespaces and then:

  • Update frontend and backend dependencies:
# from backend shell
pip uninstall -y scaife-viewer-atlas
pip install "scaife-viewer-atlas @https://github.com/scaife-viewer/backend/archive/552ee49c150439f96bab9f40b46e822ac452f4d0.zip#subdirectory=atlas"
# from frontend shell
yarn install
  • Re-ingest content:
# from backend shell
python manage.py prepare_atlas_db --force

jacobwegner and others added 13 commits June 27, 2023 07:40
Backport review apps fixes from feature/cambridge-greek-lexicon
Allows us to build and deploy to review apps without overwriting the "latest" tag.

Heroku deployments will remain isolated within the Heroku registry, so there is no chance of "overwriting" a production  or staging deploy.
Add hookset and support for overriding tokenizer via get_prepared_tokens
# Conflicts:
#	backend/requirements.txt
@jacobwegner jacobwegner requested a review from jchill-git July 19, 2023 22:05
@jacobwegner
Copy link
Contributor Author

@jchill-git I've done this as a PR so you can verify the changes before we merge into your feature branch.

If things are working here, please go ahead and approve and merge this branch.

If things aren't working, please let me know...happy to try and troubleshoot with you!

# NOTE: Sample override for The life of Omar ben Saeed below
if version_urn == "urn:cts:arabicLit:amedsaid1831.dw042.perseus-ara1:":
from scaife_viewer.atlas.tokenizers import get_tokens_from_csv
csv_path = Path("data/token-overrides/amedsaid1831.dw042.perseus-ara1.csv")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jchill-git This my "draft" placement for these files; I think we will likely establish a more formal convention in the future.

You can replace my stub file with your actual tokens once you've got them.

from scaife_viewer.atlas.tokenizers import get_tokens_from_csv
csv_path = Path("data/token-overrides/amedsaid1831.dw042.perseus-ara1.csv")
return get_tokens_from_csv(version_urn, csv_path)
return super().get_prepared_tokens(version_urn)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This line will be the fallback, which allows us to use the "default" tokenizer for other texts)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant