0.43.3: enabling LLama 405b with 8xH/A100 + 256GB RAM

Titus-von-Koeller released this 30 Jul 20:48

· 38 commits to main since this release

Improvements:

FSDP: Enable loading prequantized weights with bf16/fp16/fp32 quant_storage
- Background: This update, linked to Transformer PR #32276, allows loading prequantized weights with alternative storage formats. Metadata is tracked similarly to Params4bit.__new__ post PR #970. It supports models exported with non-default quant_storage, such as this NF4 model with BF16 storage.
- Special thanks to @winglian and @matthewdouglas for enabling FSDP+QLoRA finetuning of Llama 3.1 405B on a single 8xH100 or 8xA100 node with as little as 256GB system RAM.

Contributors

winglian and matthewdouglas

Assets 4

0 Join discussion