2.18.0
Dataset features
- Make JSON builder support an array of strings by @albertvillanova in #6696
- Base parquet batch_size on parquet row group size by @lhoestq in #6701
- Faster cold start for streaming
- Change default compression argument for JsonDatasetWriter by @Rexhaif in #6659
- Automatic Conversion for uint16/uint32 to Compatible PyTorch Dtypes by @mohalisad in #6660
- fsspec: support fsspec>=2023.12.0 glob changes by @pmrowla in #6687
- Support latest fsspec up to 2024.2.0
General improvements and bug fixes
- Fix for Incorrect ex_iterable used with multi num_worker by @kq-chen in #6582
- Previously using PyTorch DDP and
num_workers
could lead to incorrect shards assignments to workers and cause errors
- Previously using PyTorch DDP and
- Fix imagefolder dataset url by @mariosasko in #6683
- Improve error message for gated datasets on load by @lewtun in #6684
- Updated Quickstart Notebook link by @Codeblockz in #6685
- Update the print message for chunked_dataset in process.mdx by @gzbfgjf2 in #6693
- Faster
xlistdir
by @mariosasko in #6698 - Update GitHub Actions to Node 20 by @albertvillanova in #6682
- Update release instructions by @albertvillanova in #6681
- Pass through information about location of cache directory. by @stridge-cruxml in #6677
- Allow SplitDict setitem to replace existing SplitInfo by @lhoestq in #6665
- Update ruff by @lhoestq in #6706
- Silence ruff deprecation messages by @mariosasko in #6707
- fix: show correct package name to install biopython by @BioGeek in #6662
- Fix data_files when passing data_dir by @lhoestq in #6705
- Release: 2.18.0 by @lhoestq in #6708
New Contributors
- @Codeblockz made their first contribution in #6685
- @gzbfgjf2 made their first contribution in #6693
- @stridge-cruxml made their first contribution in #6677
- @pmrowla made their first contribution in #6687
- @BioGeek made their first contribution in #6662
- @Rexhaif made their first contribution in #6659
- @mohalisad made their first contribution in #6660
- @kq-chen made their first contribution in #6582
Full Changelog: 2.17.1...2.18.0