Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce latency in fetch phase for large shard counts #120419

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

original-brownbear
Copy link
Member

We can reduce the latency of the fetch response by a non-trivial amount by moving the context-freeing for irrelevant shards to the end of the forked action.

For large shard counts (and/or with security in the mix) the old comment is not entirely correct and waking up the selector many times over + doing some authz work might consume macroscopic time.
Moving the freeing to the end of the task causes the fetches to be sent out potentially much quicker, reduces contention on the counter and reduces the impact of potential head-of-line blocking issues on the connections that might see context freeing needlessly queue after actual fetches.

We can reduce the latency of the fetch response by a non-trivial amount by moving
the context-freeing for irrelevant shards to the end of the forked action.

For large shard counts (and/or with security in the mix) the old comment is not entirely
correct and waking up the selector many times over + doing some authz work might consume
macroscopic time.
Moving the freeing to the end of the task causes the fetches to be sent out potentially much quicker,
reduces contention on the counter and reduces the impact of potential head-of-line blocking issues
on the connections that might see context freeing needlessly queue after actual fetches.
@original-brownbear original-brownbear added >non-issue :Search Foundations/Search Catch all for Search Foundations labels Jan 17, 2025
@elasticsearchmachine elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v9.0.0 labels Jan 17, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@original-brownbear
Copy link
Member Author

Just to illustrate how much of a concern this is:

image

For a request across ~5k shards that has only has matches in 10 of them after the final reduce step, sending out these requests uses close to 100% of the CPU time in fetch (that is including merging results! ... albeit without aggregations in this flamegraph). This translates into O(100ms) latency for 5k shards and is highly variable as well. In fact, after #118490 the sending of the free requests makes up the majority of the end-to-end latency for many queries. Counting down early and moving the freeing to the end of the action solves the problem just fine in practice and there's no need to fork this long action again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants