Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve invoker observability #2779

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

muhamadazmy
Copy link
Contributor

@muhamadazmy muhamadazmy commented Feb 25, 2025

Improve invoker observability

Add few extra metrics to measure number of queued commands,
and number of in flight invocation tasks


Stack created with Sapling. Best reviewed with ReviewStack.

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this improvement @muhamadazmy. LGTM. +1 for merging :-)

Comment on lines +35 to +37
INVOKER_SEG_QUEUE_LEN,
Unit::Count,
"Number of invocations in the queue"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we call it pending invoker tasks to align it with the INVOKER_INVOCATION_TASK?

@muhamadazmy muhamadazmy force-pushed the pr2779 branch 3 times, most recently from 075fd92 to 04caf76 Compare March 3, 2025 14:59
Summary:
FindTailAttr allows find_tail users to adjust durability and accuracy. Currently, two modes are supported:

Approximate: A fast check that usually returns immediately with the last known tail.
Durable (default): Ensures a reliable tail find.

Loglet implementations can choose to always run Durable find_tail.
This PR improves only on the PP <-> Bifrost observability.
We already collect meterics for both PP and Bifrost but I think
those extra metrics can be useful

- Utilization of the PP requests queue. Once full
the PP will start dropping requests as 'Busy' and I thought
it's important to measure how utilized this queue is

- Commit to read latency of records. Measures the
latency between the moment a record is committed until
it's read by the PP

- LSN lag (applied to the log tail) This is also shown in `resatectl partition list`

Fixes restatedev#2756
Add few extra metrics to measure number of queued commands,
and number of in flight invocation tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants