Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependency datasets to v3.3.2 #652

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Jan 5, 2025

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
datasets 3.1.0 -> 3.3.2 age adoption passing confidence

Release Notes

huggingface/datasets (datasets)

v3.3.2

Compare Source

Bug fixes

Other general improvements

New Contributors

Full Changelog: huggingface/datasets@3.3.1...3.3.2

v3.3.1

Compare Source

Bug fixes

Full Changelog: huggingface/datasets@3.3.0...3.3.1

v3.3.0

Compare Source

Dataset Features

  • Support async functions in map() by @​lhoestq in https://github.com/huggingface/datasets/pull/7384

    • Especially useful to download content like images or call inference APIs
    prompt = "Answer the following question: {question}. You should think step by step."
    async def ask_llm(example):
        return await query_model(prompt.format(question=example["question"]))
    ds = ds.map(ask_llm)
  • Add repeat method to datasets by @​alex-hh in https://github.com/huggingface/datasets/pull/7198

    ds = ds.repeat(10)
  • Support faster processing using pandas or polars functions in IterableDataset.map() by @​lhoestq in https://github.com/huggingface/datasets/pull/7370

    • Add support for "pandas" and "polars" formats in IterableDatasets
    • This enables optimized data processing using pandas or polars functions with zero-copy, e.g.
    ds = load_dataset("ServiceNow-AI/R1-Distill-SFT", "v0", split="train", streaming=True)
    ds = ds.with_format("polars")
    expr = pl.col("solution").str.extract("boxed\\{(.*)\\}").alias("value_solution")
    ds = ds.map(lambda df: df.with_columns(expr), batched=True)
  • Apply formatting after iter_arrow to speed up format -> map, filter for iterable datasets by @​alex-hh in https://github.com/huggingface/datasets/pull/7207

    • IterableDatasets with "numpy" format are now much faster

What's Changed

New Contributors

Full Changelog: huggingface/datasets@3.2.0...3.3.0

v3.2.0

Compare Source

Dataset Features

  • Faster parquet streaming + filters with predicate pushdown by @​lhoestq in https://github.com/huggingface/datasets/pull/7309
    • Up to +100% streaming speed
    • Fast filtering via predicate pushdown (skip files/row groups based on predicate instead of downloading the full data), e.g.
      from datasets import load_dataset
      filters = [('date', '>=', '2023')]
      ds = load_dataset("HuggingFaceFW/fineweb-2", "fra_Latn", streaming=True, filters=filters)

Other improvements and bug fixes

New Contributors

Full Changelog: huggingface/datasets@3.1.0...3.2.0


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot requested a review from rashley-iqt as a code owner January 5, 2025 22:51
@renovate renovate bot force-pushed the renovate/datasets-3.x-lockfile branch from 86dba46 to 94a1ad5 Compare January 14, 2025 19:37
@renovate renovate bot changed the title Update dependency datasets to v3.2.0 Update dependency datasets to v3.3.0 Feb 14, 2025
@renovate renovate bot force-pushed the renovate/datasets-3.x-lockfile branch from 94a1ad5 to e0cb3f4 Compare February 17, 2025 16:54
@renovate renovate bot changed the title Update dependency datasets to v3.3.0 Update dependency datasets to v3.3.1 Feb 17, 2025
@renovate renovate bot force-pushed the renovate/datasets-3.x-lockfile branch from e0cb3f4 to 48fbf3c Compare February 20, 2025 17:58
@renovate renovate bot changed the title Update dependency datasets to v3.3.1 Update dependency datasets to v3.3.2 Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants