Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SILE vs. TeX hyphenation patterns #2104

Open
Omikhleia opened this issue Sep 9, 2024 · 3 comments
Open

SILE vs. TeX hyphenation patterns #2104

Omikhleia opened this issue Sep 9, 2024 · 3 comments
Labels
bug Software bug issue enhancement Software improvement or feature request
Milestone

Comments

@Omikhleia
Copy link
Member

Omikhleia commented Sep 9, 2024

Things not addressed in #2102:

  • Things where we have our own upgrades, so likely won't change?
    • Turkish (tr) : many changes by @alerque
    • Esperanto (eo): solution by @ctrlcctrlv differing from what TeX currently has
  • Things that might have to wait for a better implementation of BCP47...
    • en-GB / en-US (but see also just below, last item)
    • zh-Latn-pinyin
    • mul-ethi (Ethiopic)
    • nb/nn variants (Norsk, we do have them though a refactor could be neat eventually)
    • several la variants (liturgic/classic Latin)
    • mn (Mongolian in Cyrl script or Cyrl-x-lmc for Xalx Mongolian variant)
    • sh (Cyrl vs. Latn, Serbo-Croatian, now deprecated?)
  • Things I am not sure of...
    • our "el" vs. TeX's "grc" is problematic and the situation is messy = 0.16.x ?
      • It seems SILE's "el" patterns were added by Simon in c6e8d0b ("Oops, forgot to include these") directly on master in Feb. 2014. The TeX "grc" patterns were updated in May 2016 ("added support for curly beta") and after better checking, I confirm that's our differences with it.
      • So my assumption is that Simon added the "grc" patterns of that time as "el" -- and this was likely wrong. It should have been kept as "grc", for Ancient Greek (→ added to my first comment above, we could safely add it too under that name Nah boustrophedon then pees on itself).
      • As for "el", we should likely alias it to "el-monoton": It seems to me that Modern Greek is monotonic in most contexts since the 1982 reform. (We should state it the documentation too...) = Strictly speaking, we should postpone it to 0.16.x as it could make rendered documents different?
      • However, the boustrophedon package would kill any standard "grc"...
  • Things where we differ but may change for better
    • pt (Portuguese): We have @jodros 's updates, but an incoming PR might improve upon it: additional set of rules to enhancing TeX hyphenation rules for Portug… hyphenation/tex-hyphen#62 Should we wait or go for an early adoption? = 0.16.x or earlier ?
    • es (Spanish): Lots of differences with our patterns = 0.16.x ?
    • th (Thai): Lots of differences with our patterns, and no idea where the latter come from. = (?)
    • bg (Bulgarian) = Lots of differences with our patterns, and no idea where the latter come from. = 0.16.x ? Note the TeX patterns were updated in 2017 so whatever we had might be outdated (?)
    • de (German) = We have something, TeX has several variants depending on orthography reforms, frankly I don't know.
    • en (English, see also above): Currently we are based on en-US, but we lack a whole "additional patterns" that were not present in the original TeX file and were added later (though at an earlier stage than our import). My understanding is that "old" TeX being memory constrained, these were absent from the original implementation. But it's pretty unclear why we don't have the extra patterns... = 0.16.x ? as it could make rendered documents different...

Originally posted by @Omikhleia in #2102 (comment)

@Omikhleia Omikhleia added bug Software bug issue enhancement Software improvement or feature request labels Sep 9, 2024
@Omikhleia Omikhleia added this to the v0.16.0 milestone Sep 9, 2024
@jodros
Copy link
Contributor

jodros commented Sep 9, 2024

pt (Portuguese): We have @jodros 's updates, but an incoming PR might improve upon it: hyphenation/tex-hyphen#62 Should we wait or go for an early adoption? = 0.16.x or earlier ?

Glad to know about @leolca's work! I can't see reasons to wait.

@Omikhleia
Copy link
Member Author

Omikhleia commented Sep 9, 2024

Glad to know about @leolca's work! I can't see reasons to wait.

Yep, it really made my day when I read their (proposed) TUGboart article. It's scholarly, plenty of attempts, plenty of tables with numbers... It"s very interesting (even the things that they tried, judged inefficient, etc.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Software bug issue enhancement Software improvement or feature request
Projects
Status: In Progress
Development

No branches or pull requests

4 participants
@jodros @Omikhleia and others