Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate established targets from one node to another in cluster mode #573

Open
Bapths opened this issue Dec 24, 2024 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@Bapths
Copy link

Bapths commented Dec 24, 2024

Hello,

Thank a lot for developing and maintaining such an useful tool!

I may be missing something but here is my concern:

Let's say that I have a cluster of 3 gNMIc nodes and I want to update the subscriptions (or perform an upgrade), when I will shut one node, I will loose every connected targets on this node for ~30s to 1min. Is there any way to force every active connections to close and reopen quickly on another one before doing anything that may have an impact?

Maybe including such a feature in the new API graceful shutdown endpoint? Or maybe there are already solutions to this issue that I am still missing?

Thanks for anyone that could help me to solve this issue 😄

@karimra
Copy link
Collaborator

karimra commented Jan 20, 2025

in #579 I added 3 REST endpoints (all must be made to the cluster leader, other instances will throw an error)

  1. Switch the cluster leader:
    DELETE /api/v1/cluster/leader will make the leader release its lock to allow another instance to grab the leader lock.

  2. Drain an instance:
    POST /api/v1/members/{id}/drain where id is the instance name to be drained, the leader will move all the targets that instance is subscribed to to the other instances in the cluster. This is an async call, if you have a huge number of targets in that instanced it might takes sometime to drained, the API call will return while the targets will continue to be moved to other instances.

  3. Rebalance the load between instances:
    POST /api/v1/cluster/rebalance will rebalance the number of targets between the cluster instances if it's not balanced. This is also an async call it might take some time to happen.

So for your case you can run a drain on the instance you want to shutdown. Once that instance is back up, rebalance the cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants