Skip to content

0.43.3: enabling LLama 405b with 8xH/A100 + 256GB RAM

Compare
Choose a tag to compare
@Titus-von-Koeller Titus-von-Koeller released this 30 Jul 20:48
· 38 commits to main since this release

Improvements:

  • FSDP: Enable loading prequantized weights with bf16/fp16/fp32 quant_storage
    • Background: This update, linked to Transformer PR #32276, allows loading prequantized weights with alternative storage formats. Metadata is tracked similarly to Params4bit.__new__ post PR #970. It supports models exported with non-default quant_storage, such as this NF4 model with BF16 storage.
    • Special thanks to @winglian and @matthewdouglas for enabling FSDP+QLoRA finetuning of Llama 3.1 405B on a single 8xH100 or 8xA100 node with as little as 256GB system RAM.