You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When operating a multi-cluster configuration in http://Backend.AI , there seems to be an issue with how cluster statuses are reported. Specifically, in an 8-cluster setup, if one cluster shuts down, the overall status still displays as running instead of updating to degraded. This makes it difficult for users to identify issues since the actual state of the clusters does not reflect accurately. Users have to deduce on their own that a problem exists due to not all kernels being active as expected.
Expected Behavior
The expected behavior in such scenarios is twofold:
The status should automatically update to degraded if any of the clusters within a multi-cluster setup shuts down or becomes unresponsive.
Implement a healing mechanism that either restarts all containers or only the ones that are missing, to ensure the integrity and the expected functionality of the multi-cluster setup.
Steps to Reproduce
Set up a multi-cluster configuration.
Shut down or disconnect one of the clusters.
Observe that the overall status of the multi-cluster setup remains as running.
Possible Solution
Implementing a monitoring and healing mechanism that can accurately detect the status of each cluster and take necessary actions such as updating the status to degraded and initiating a restart of the containers (either all or the ones that are missing) could be a potential solution.
This issue is critical for maintaining the reliability and usability of http://Backend.AI in multi-cluster environments, and addressing it would greatly enhance user experience by providing a more accurate system state and automating recovery processes.
The text was updated successfully, but these errors were encountered:
Description
When operating a multi-cluster configuration in http://Backend.AI , there seems to be an issue with how cluster statuses are reported. Specifically, in an 8-cluster setup, if one cluster shuts down, the overall status still displays as
running
instead of updating todegraded
. This makes it difficult for users to identify issues since the actual state of the clusters does not reflect accurately. Users have to deduce on their own that a problem exists due to not all kernels being active as expected.Expected Behavior
The expected behavior in such scenarios is twofold:
degraded
if any of the clusters within a multi-cluster setup shuts down or becomes unresponsive.Steps to Reproduce
running
.Possible Solution
Implementing a monitoring and healing mechanism that can accurately detect the status of each cluster and take necessary actions such as updating the status to
degraded
and initiating a restart of the containers (either all or the ones that are missing) could be a potential solution.This issue is critical for maintaining the reliability and usability of http://Backend.AI in multi-cluster environments, and addressing it would greatly enhance user experience by providing a more accurate system state and automating recovery processes.
The text was updated successfully, but these errors were encountered: