Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set paged attn block size as a env parameter #1109

Merged
merged 4 commits into from
Jan 16, 2025

Conversation

jiqing-feng
Copy link
Collaborator

@jiqing-feng jiqing-feng commented Jan 15, 2025

Based on our latest tests, the #1065 has 20%+ speed-up for 1st token but will cause 2nd token latency regression, especially on short prompt input. We can set the block size as a ENV parameter and default to 16 to eliminate the 2nd latency regression mostly, and it also keeps most of the acceleration for 1st token.

Overall, with #1065 and this PR, we got the following performance speed-up table:
input len: 512; output len: 128; bs: 2; num_beams: 2

|  speed-up   |   gpt2-xl   | llama3.1-8B |
|  1st token  |    1.37x    |    1.29x    |
|  2nd token  |    0.98x    |    0.97x    |

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
@jiqing-feng jiqing-feng marked this pull request as draft January 16, 2025 02:17
@jiqing-feng jiqing-feng changed the title set default block size set paged attn block size as a env parameter Jan 16, 2025
@jiqing-feng jiqing-feng marked this pull request as ready for review January 16, 2025 05:46
Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the perf numbers !

@IlyasMoutawwakil IlyasMoutawwakil merged commit 7b4044d into huggingface:main Jan 16, 2025
22 checks passed
@jiqing-feng jiqing-feng deleted the block_size branch January 17, 2025 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants