Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for iso-639-2t/b #46

Merged
merged 5 commits into from
Jun 7, 2024
Merged

Conversation

Cannedfood
Copy link
Contributor

Adds support for is iso-639-2t and iso-639-2b

This implementation avoids adding extra tables for 2t/b, with the tradeoff, that the functions for 639-2 will incorrectly accept 639-3 codes, and will also generate 639-3 codes if a Language doesn't have a code in that standard.

So for example, even though there is no 639-2 code for "Ghotuo" (iso 639-3 code "aaa") the following works:

assert_eq(Language::Aaa.to_639_2t(), "aaa"); // Works!
assert_eq(Language::from_639_2b("aaa"), Language::Aaa); // Works!

I can rewrite it to use phf tables if this behavior is a problem.

@humenda
Copy link
Owner

humenda commented May 19, 2024

Could you change iso_639_3_to_2t and friends to be located in a separate or
one of the existing tables? I would like to avoid to have a huge function.
Instead, the data should reside in a separate section, if the compiler wishes.
I also wonder whether 639-2 codes should be feature gated, for binary size
reasons.

Also, would you please update the README?
Thanks

@Cannedfood
Copy link
Contributor Author

Readme is updated. I hope the wording changes are ok.

Regarding the binary size: I don't think a feature gate is necessary.

The generated functions for converting between iso 639-3 and iso 639-2 are pretty small - only 64 LOC in total. In release mode this seems to come out at a slightly less than 2KiB binary size increase.

Considering they're so small I'm also not sure if it makes sense to move them to another file - I'm also not very keen on duplicating the code generation logic.

@humenda
Copy link
Owner

humenda commented May 20, 2024 via email

@Cannedfood
Copy link
Contributor Author

What is the purpose of iso_639_3_to_2t? It is effectively the identity map and could be removed, right? Same for the reverse.

True, I just wasn't sure if the standard guarantees iso 639-3 to be backward compatible (also in the future), so I generated these functions as well. Looking at it again though, I don't see how it would break compatibility, so I think it's safe to not generate the _2t functions at all.
=> Removed the _2t functions.

Regarding the functions: I would like to have the data autogenerated

The _2b functions are autogenerated

Where do you see the code duplication?

I thought you indicated that iso_639_3_to_2b/iso_639_2b_to_3 should go into another file - in which case the codegen/formatting would need a refactor to work with another file. I think there might have been a misunderstanding here?

Let me know if you want me to change anything in particular, in general I think the PR is in a pretty mergable state now.

@humenda humenda merged commit 6d85cf9 into humenda:master Jun 7, 2024
4 checks passed
@humenda
Copy link
Owner

humenda commented Jun 7, 2024

What are the data sources for your changes? I.e. could we automate the import of
the codes for next time?

@Cannedfood
Copy link
Contributor Author

The data was already part of the iso-639-3.tab file that is being used for the iso-639-3 language codes.
The functions are also automatically generated the same way as the iso-639-3 language code tables are. I think this is already as automated as it can be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants