Skip to content

1.8.0

Compare
Choose a tag to compare
@lhoestq lhoestq released this 08 Jun 18:23

Datasets Changes

Datasets Features

  • Add desc parameter in map for DatasetDict object #2423 (@bhavitvyamalik)
  • Support sliced list arrays in cast #2461 (@lhoestq)
    • Dataset.cast can now change the feature types of Sequence fields
  • Revert default in-memory for small datasets #2460 (@albertvillanova) Breaking:
    • we used to have the datasets IN_MEMORY_MAX_SIZE to 250MB
    • we changed this to zero: by default datasets are loaded from the disk with memory mapping and not copied in memory
    • users can still set keep_in_memory=True when loading a dataset to load it in memory

Datasets Cards

General improvements and bug fixes

Experimental and work in progress: Format a dataset for specific tasks

  • Update text classification template labels in DatasetInfo post_init #2392 (@lewtun)
  • Insert task templates for text classification #2389 (@lewtun)
  • Rename QuestionAnswering template to QuestionAnsweringExtractive #2429 (@lewtun)
  • Insert Extractive QA templates for SQuAD-like datasets #2435 (@lewtun)