-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor handling of orphaned resources in the coordinator #1580
refactor handling of orphaned resources in the coordinator #1580
Conversation
Check if a place acquire request tries to lock a resource that is marked as orphaned. This means we no longer need to reacquire orphaned resource before trying to acquire a places resources since we now refuse to acquire the resources. This avoids long delays on aquire calls if the exporter responsible for an (unrelated) orphaned resource doesn't process commands quickly. Signed-off-by: Rouven Czerwinski <[email protected]>
Labgrid currently uses bidirectional streams between coordinator and client/exporter. For the client side this is a good fit, since the client sends requests and the coordinator can answer directly. However for the exporter we have a case where nested calls were used in the old crossbar infrastructure, namely when re-acquiring a resource after the exporter was offline but the place was kept acquired. We call these orphaned resources. They replace the real resource on the coordinator side until the resource can be reacquired on the respective exporter after it has restarted. With crossbar, when seeing the resource update, the coordinator could directly call the exporter to acquire the resource for the specific place. This was possible since crossbar did the RPC route handling and arbitrary services connected to the crossbar could provide RPC calls to the service. With GRPC, we are more constrained. Since we only have a single Input/Output stream which needs to multiplex different objects, nested calls are not directly supported, since the exporter side would still wait for the coordinator to answer its own request. A different approach to orphaned resource handling is required. The coordinator now uses a loop where it checks the orphaned resources and tries to reacquire them if the exporter reappears. This however introduces another problem, the exporter can be under high load and thus the acquire request from the coordinator can time out. In this case, we need to abort the acquisition during a regular lock and in case of an orphaned resource need to replace the orphaned resource with the eventually acquired resource from the exporter. We also need to handle the case where the exporter has an acquired resource, but the place has been released in the meantime (perhaps due to a timeout on a normal place acquire), the same poll loop handles this in the coordinator as well. All in all this means that the resource acquired state for each place is not necessarily consistent on the coordinator, but will reach an eventual consistent state. This should be sufficient, since exporter restarts with orphaned resources should be relatively rare. Signed-off-by: Jan Luebbe <[email protected]>
8ca0da8
to
109ef0d
Compare
I've fixed the error detected by pylint. |
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #1580 +/- ##
========================================
- Coverage 56.0% 55.8% -0.3%
========================================
Files 170 170
Lines 13307 13365 +58
========================================
Hits 7458 7458
- Misses 5849 5907 +58
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Labgrid currently uses bidirectional streams between coordinator and
client/exporter. For the client side this is a good fit, since the
client sends requests and the coordinator can answer directly. However
for the exporter we have a case where nested calls were used in the old
crossbar infrastructure, namely when re-acquiring a resource after the
exporter was offline but the place was kept acquired. We call these
orphaned resources. They replace the real resource on the coordinator
side until the resource can be reacquired on the respective exporter
after it has restarted.
With crossbar, when seeing the resource update, the coordinator could
directly call the exporter to acquire the resource for the specific
place. This was possible since crossbar did the RPC route handling and
arbitrary services connected to the crossbar could provide RPC calls to
the service. With GRPC, we are more constrained. Since we only have a
single Input/Output stream which needs to multiplex different objects,
nested calls are not directly supported, since the exporter side would
still wait for the coordinator to answer its own request.
A different approach to orphaned resource handling is required. The
coordinator now uses a loop where it checks the orphaned resources and
tries to reacquire them if the exporter reappears. This however
introduces another problem, the exporter can be under high load and thus
the acquire request from the coordinator can time out. In this case, we
need to abort the acquisition during a regular lock and in case of an
orphaned resource need to replace the orphaned resource with the
eventually acquired resource from the exporter.
We also need to handle the case where the exporter has an acquired
resource, but the place has been released in the meantime (perhaps due
to a timeout on a normal place acquire), the same poll loop handles this
in the coordinator as well.
All in all this means that the resource acquired state for each place is
not necessarily consistent on the coordinator, but will reach an
eventual consistent state. This should be sufficient, since exporter
restarts with orphaned resources should be relatively rare.