-
Notifications
You must be signed in to change notification settings - Fork 326
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Inference PagedAttention] Integrate initial paged attention implemen…
…tation into maxengine (2/N) This change is based on a branch from Pate and Rupeng with code refactoring and modifications. What: * This PR integrate initial paged attention components into maxengine, guarded behind `attention=paged` config setting. Impact of this change: * This PR is a NOOP. Paged attention is not enabled unless `attention=paged` is set in the config. The default `attention=autoselected` will NOT trigger paged attention. Key changes: * MaxText/layers/attentions.py: Use paged attention op when `attention=paged` for all model mode other than MODEL_MODE_TRAIN * MaxText/layers/models.py: Initialize paged attention components when `attention=paged` Why: * Page attention should be able to enhance inference performance. Testing: * python -m unittest tests/inference/paged_attention_test.py * python MaxText/decode.py MaxText/configs/base.yml tokenizer_path=assets/tokenizer.llama2 \ load_parameters_path=gs://msingh-bkt/checkpoints/quant_llama2-7b-chat/20241120034012/int8_ \ max_prefill_predict_length=16 max_target_length=32 model_name=llama2-7b \ ici_fsdp_parallelism=1 ici_autoregressive_parallelism=1 ici_tensor_parallelism=-1 \ scan_layers=false weight_dtype=bfloat16 per_device_batch_size=1 \ checkpoint_is_quantized=true quantization=int8 \ attention=paged pagedattn_num_pages=64 pagedattn_tokens_per_page=8 pagedattn_pages_per_compute_block=4
- Loading branch information
Showing
6 changed files
with
138 additions
and
26 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters