Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process kafka requests in a separate scheduling group #24973

Merged
merged 4 commits into from
Feb 5, 2025

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Jan 29, 2025

Added code that allows using separate scheduling group for Kafka
requests processing. The scheduling group can be switched based on the
API type. By default processing is done in the group returned by
kafka::server::get_request_handler_sg

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

@mmaslankaprv mmaslankaprv requested a review from a team as a code owner January 29, 2025 13:15
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 29, 2025

CI test results

test results on build#61350
test_id test_kind job_url test_status passed
rptest.tests.cloud_storage_timing_stress_test.CloudStorageTimingStressTest.test_cloud_storage.cleanup_policy=compact.delete ducktape https://buildkite.com/redpanda/redpanda/builds/61350#0194b27d-3fa4-4236-9700-4026364e9e88 FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61350#0194b27d-3fa6-4560-9166-e42bc00c4ad4 FLAKY 1/3
rptest.tests.datalake.simple_connect_test.RedpandaConnectIcebergTest.test_translating_avro_serialized_records.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/61350#0194b27d-3fa4-4236-9700-4026364e9e88 FLAKY 1/2
storage_single_thread_rpunit.storage_single_thread_rpunit unit https://buildkite.com/redpanda/redpanda/builds/61350#0194b234-9348-4a86-8544-18742a9bcac9 FLAKY 1/2
test results on build#61495
test_id test_kind job_url test_status passed
gtest_raft_rpunit.gtest_raft_rpunit unit https://buildkite.com/redpanda/redpanda/builds/61495#0194cae6-8b05-415e-9a88-16dd9862d27f FLAKY 1/2
rptest.tests.cloud_storage_timing_stress_test.CloudStorageTimingStressTest.test_cloud_storage_with_partition_moves.cleanup_policy=compact.delete ducktape https://buildkite.com/redpanda/redpanda/builds/61495#0194cb2f-65f7-4b48-9249-1c6490663826 FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61495#0194cb41-4747-44ce-b0b0-bd6ac40fd5f2 FLAKY 1/3
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/61495#0194cb41-4748-4c9d-8925-dc384ea5af02 FLAKY 1/2
rptest.tests.datalake.compaction_test.CompactionGapsTest.test_translation_no_gaps.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/61495#0194cb2f-65f6-459f-ab3f-955fbacc41b9 FLAKY 1/2

@@ -682,6 +684,10 @@ connection_context::dispatch_method_once(request_header hdr, size_t size) {
? std::make_optional<ss::sstring>(*hdr.client_id)
: std::nullopt,
};

co_await ss::coroutine::switch_to(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisdowns related to the discussion from yesterday, does this even allocate the coroutine frame if await_ready returns true (because we are already in the right group - ignoring for a second whether we actually are here or not)? I thought not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which coroutine do you refer to? In any case there shouldn't be a coroutine frame allocated per switch_to, whether the switch happens or not. What await_ready returning true does is avoids an unnecessary task switch (which is probably more expensive even than a coro frame in "micro" costs, and certainly in macro costs as it breaks out of the current executing request, etc).

/**
* Scheduling group to process requests received via the REST API of
* admin server.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love to see these comments, thanks!

travisdowns
travisdowns previously approved these changes Jan 31, 2025
Copy link
Member

@travisdowns travisdowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good pending moving the SG definition into the handler.

Added property allowing us to control if `kafka` scheduling group should
be used to handle parsing and handling Kafka API requests.

Signed-off-by: Michał Maślanka <[email protected]>
Added code that allows handler to control the scheduling group they are
processed in. If a handler returns no override then the handler runs in
a default `kafka` scheduling group.

Signed-off-by: Michał Maślanka <[email protected]>
@mmaslankaprv mmaslankaprv merged commit 7561344 into redpanda-data:dev Feb 5, 2025
17 checks passed
@mmaslankaprv mmaslankaprv deleted the handler-sg branch February 5, 2025 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants