-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: adds support for reading mosaic mds written dataset #210
Feat: adds support for reading mosaic mds written dataset #210
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #210 +/- ##
=====================================
Coverage ? 77%
=====================================
Files ? 33
Lines ? 4704
Branches ? 0
=====================================
Hits ? 3625
Misses ? 1079
Partials ? 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. Few suggestions from my side.
And update the wrong spelling of Parallelize
in readme, which is causing pre-commit-ci to fail.
README.md:55: Paralellize ==> Parallelize
README.md:653: Paralellize ==> Parallelize
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Added tests for checking if the mds dataset supports Litdata streamingDataset features
like train_test_split
, subsample
, dataloader (last_drop=True | False)
.
One test fails on mac (timeout), even though I've not changed anything in any function.
Thank you @deependujha for adding these additional tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work ! It would be good to benchmark this once more.
What does this PR do?
Fixes #195.