Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==3.0.6
->==3.1.3
==v1.0.1
->==1.2.1
1.3.0
==0.44.1
->==0.45.0
==8.1.7
->==8.1.8
==43.0.1
->==43.0.3
==3.1.0
->==3.2.0
==8.15.0
->==8.17.0
==0.114.0
->==0.115.6
==2024.9.0
->==2024.12.0
~=2.34.0
->~=2.37.0
==1.65.0
->==1.76.0
1.77.0
~=2.23.0
->~=2.27.2
==2.18.2
->==2.19.0
==5.11.0
->==5.12.0
==0.2.16
->==0.3.14
==0.0.2
->==0.0.3
==2.17.2
->==2.19.0
==2.1.1
->==2.2.1
2.2.2
==2.2.2
->==2.2.3
==v0.13.2
->==0.14.0
==0.3.2
->==0.3.6
==6.1.0
->==6.1.1
==2.9.9
->==2.9.10
==1.24.10
->==1.25.1
1.25.2
==1.11.1
->==1.12.2
1.13.0
==5.0.8
->==5.2.1
>=0.6,<=0.6.4
->>=0.9,<=0.9.1
0.9.2
==1.5.1
->==1.6.1
==1.14.1
->==1.15.1
==1.16.0
->==1.17.0
==1.38.0
->==1.41.1
==2.17.0
->==2.18.0
==0.0.3
->==0.0.5
==4.44.2
->==4.48.0
==v4.46.1
->==4.48.0
==v0.11.4
->==0.13.0
==2024.1
->==2024.2
==0.30.6
->==0.34.0
==4.7.1
->==4.10.2
4.10.4
(+1)Warning
Some dependencies could not be looked up. Check the Dependency Dashboard for more information.
Release Notes
huggingface/accelerate (accelerate)
v1.2.1
: : PatchfixCompare Source
Full Changelog: huggingface/accelerate@v1.2.0...v1.2.1
v1.2.0
: : Bug Squashing & Fixes across the boardCompare Source
Core
find_executable_batch_size
on XPU by @faaany in https://github.com/huggingface/accelerate/pull/3236numpy._core
instead ofnumpy.core
by @qgallouedec in https://github.com/huggingface/accelerate/pull/3247data_loader
] Optionally also propagate set_epoch to batch sampler by @tomaarsen in https://github.com/huggingface/accelerate/pull/3246accelerate config
prompt text by @faaany in https://github.com/huggingface/accelerate/pull/3268Big Modeling
align_module_device
, ensure only cpu tensors forget_state_dict_offloaded_model
by @kylesayrs in https://github.com/huggingface/accelerate/pull/3217get_state_dict_from_offload
by @kylesayrs in https://github.com/huggingface/accelerate/pull/3253preload_module_classes
is lost for nested modules by @wejoncy in https://github.com/huggingface/accelerate/pull/3248DeepSpeed
Documentation
Update code in tracking documentation by @faaany in https://github.com/huggingface/accelerate/pull/3235
Replaced set/check breakpoint with set/check trigger in the troubleshooting documentation by @relh in https://github.com/huggingface/accelerate/pull/3259
Update set-seed by @faaany in https://github.com/huggingface/accelerate/pull/3228
Fix typo by @faaany in https://github.com/huggingface/accelerate/pull/3221
Use real path for
checkpoint
by @faaany in https://github.com/huggingface/accelerate/pull/3220Fixed multiple typos for Tutorials and Guides docs by @henryhmko in https://github.com/huggingface/accelerate/pull/3274
New Contributors
Full Changelog
align_module_device
, ensure only cpu tensors forget_state_dict_offloaded_model
by @kylesayrs in https://github.com/huggingface/accelerate/pull/3217find_executable_batch_size
on XPU by @faaany in https://github.com/huggingface/accelerate/pull/3236data_loader
] Optionally also propagate set_epoch to batch sampler by @tomaarsen in https://github.com/huggingface/accelerate/pull/3246numpy._core
instead ofnumpy.core
by @qgallouedec in https://github.com/huggingface/accelerate/pull/3247accelerate config
prompt text by @faaany in https://github.com/huggingface/accelerate/pull/3268get_state_dict_from_offload
by @kylesayrs in https://github.com/huggingface/accelerate/pull/3253preload_module_classes
is lost for nested modules by @wejoncy in https://github.com/huggingface/accelerate/pull/3248checkpoint
by @faaany in https://github.com/huggingface/accelerate/pull/3220Code Diff
Release diff: huggingface/accelerate@v1.1.1...v1.2.0
v1.1.1
Compare Source
v1.1.0
: : Python 3.9 minimum, torch dynamo deepspeed support, and bug fixesCompare Source
Internals:
data_seed
argument in https://github.com/huggingface/accelerate/pull/3150weights_only=True
by default for all compatible objects when checkpointing and saving withtorch.save
in https://github.com/huggingface/accelerate/pull/3036dim
input inpad_across_processes
in https://github.com/huggingface/accelerate/pull/3114DeepSpeed
Megatron
Big Model Inference
has_offloaded_params
utility added in https://github.com/huggingface/accelerate/pull/3188Examples
Full Changelog
dim
input inpad_across_processes
by @mariusarvinte in https://github.com/huggingface/accelerate/pull/3114data_seed
by @muellerzr in https://github.com/huggingface/accelerate/pull/3150save_model
by @muellerzr in https://github.com/huggingface/accelerate/pull/3146weights_only=True
by default for all compatible objects by @muellerzr in https://github.com/huggingface/accelerate/pull/3036get_xpu_available_memory
by @faaany in https://github.com/huggingface/accelerate/pull/3165has_offloaded_params
by @kylesayrs in https://github.com/huggingface/accelerate/pull/3188torch.nn.Module
model into account when moving to device by @faaany in https://github.com/huggingface/accelerate/pull/3167torchrun
by @faaany in https://github.com/huggingface/accelerate/pull/3166align_module_device
by @kylesayrs in https://github.com/huggingface/accelerate/pull/3204New Contributors
Full Changelog: huggingface/accelerate@v1.0.1...v1.1.0
bitsandbytes-foundation/bitsandbytes (bitsandbytes)
v0.45.0
: : LLM.int8() support for H100; faster 4-bit/8-bit inferenceCompare Source
Highlights
H100 Support for LLM.int8()
PR #1401 brings full LLM.int8() support for NVIDIA Hopper GPUs such as the H100, H200, and H800!
As part of the compatibility enhancements, we've rebuilt much of the LLM.int8() code in order to simplify for future compatibility and maintenance. We no longer use the
col32
or architecture-specific tensor layout formats while maintaining backwards compatibility. We additionally bring performance improvements targeted for inference scenarios.Performance Improvements
This release includes broad performance improvements for a wide variety of inference scenarios. See this X thread for a detailed explanation.
The improvements were measured using the 🤗optimum-benchmark tool.
For more benchmark results, see benchmarking/README.md.
LLM.int8()
Example throughput improvement for Qwen 2.5 14B Instruct on RTX 4090:
Example throughput improvement for Qwen 2.5 3B Instruct on T4:
NF4/FP4
Example throughput improvement for Qwen 2.5 14B Instruct on RTX 4090:
Example throughput improvement for Qwen 2.5 3B Instruct on T4:
Changes
Packaging Changes
The size of our wheel has been reduced by ~43.5% from 122.4 MB to 69.1 MB! This results in an on-disk size decrease from ~396MB to ~224MB.
CUDA Toolkit Versions
Breaking
🤗PEFT users wishing to merge adapters with 8-bit weights will need to upgrade to
peft>=0.14.0
.New
bitsandbytes.functional.int8_vectorwise_dequant()
. This functionality is being integrated into 🤗PEFT and 🤗transformers.bitsandbytes.functional
module now has an API documentation page.Deprecations
A number of public API functions have been marked for deprecation and will emit
FutureWarning
when used. These functions will become unavailable in future releases. This should have minimal impact on most end-users.k-bit quantization
The k-bit quantization features are deprecated in favor of blockwise quantization. For all optimizers, using
block_wise=False
is not recommended and support will be removed in a future release.LLM.int8() deprecations:
As part of the refactoring process, we've implemented many new 8bit operations. These operations no longer use specialized data layouts.
The following relevant functions from
bitsandbytes.functional
are now deprecated :General Deprecations
Additionally the following functions from
bitsandbytes.functional
are deprecated:What's Changed
Configuration
📅 Schedule: Branch creation - "* 0-3 * * 1" (UTC), Automerge - At any time (no schedule defined).
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.
👻 Immortal: This PR will be recreated if closed unmerged. Get config help if that's undesired.
This PR was generated by Mend Renovate. View the repository job log.