-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v2] Remove multilingual task #1926
Conversation
I have also made some changes on this over on main. I would probably do a merge before (my fix removed |
Co-authored-by: Kenneth Enevoldsen <[email protected]>
No, Line 65 in dba7a95
|
(we could also consider merging this and resolving it in a merge/partial merge) |
Yes, but it is not in So yea, no merge is needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, looking a bit more into this. Are there cases where we can't replace .is_multilingual
?
No, we can totally remove |
b74cbf3
to
d0cdc39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something: how do we determine which subsets to load? Does is_multilingual
play a role or do we just check for the presence of a default
subset?
Co-authored-by: Isaac Chung <[email protected]>
By Line 315 in dba7a95
Previously yes, but now we can get subsets with |
Sorry, I didn't phrase the question right. What I'm curious about is the evidence supporting this statement:
|
ef07074
to
d0eb782
Compare
Yes, for now, hf_subsets = list(self.dataset) if self.is_multilingual else ["default"] |
Hmm it seems like there are a few more references, like
Do they seem easy to refactor? |
I think we can make is_multilingual just property of TaskMetadata that would be calculated based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just got one small suggestion.
Co-authored-by: Isaac Chung <[email protected]>
Closes #1832
I've removed
MultilingualTask
and explicitly addedis_multilingual
to tasks, which was previously handled byMultilingualTask
. I also moved thefast_loading
method fromMultilingualTask
toAbsTask
, but I think we should reupload these datasets in the correct format.Additionally, we might want to determine
is_multilingual
automatically based oneval_langs
, but I’m not entirely in favor of this since it feels a bit like magic. Another option is to integrateis_multilingual
intoTaskMetadata
.Code Quality
make lint
to maintain consistent style.Documentation
Testing
make test-with-coverage
.make test
ormake test-with-coverage
to ensure no existing functionality is broken.