-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clusters autodiscovery #629
Conversation
Resolve the issue of leaking keeper watches.
This is an automated comment for commit 6e08f9a with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
@@ -110,6 +111,54 @@ class ClusterDiscovery::ConcurrentFlags | |||
bool stop_flag = false; | |||
}; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, explain why Flags and ConcurrentFlags coexist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I left exists ConcurrentFlag for "static" clusters, it is immutable and more optimized, because can return reference of map outside and code outside can iterate without lock. Flag
can be changed and return copy of map. But looks like it's a economy on matches for current case.
I'll remove ConcurrentFlag and use only one struct, thanks!
/* cluster_secret */ cluster_secret | ||
); | ||
|
||
multicluster_discovery_paths->push_back( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, we overuse shared_ptr in this code (not only in this particular place).
May be it worth getting rid of this extra layer or at least introduce types like FlagsPtr? I think that emplace_back should work effortlessly.
|
||
for (auto & path : (*multicluster_discovery_paths)) | ||
{ | ||
++zk_root_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, explain why we start index from 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 is reserved for "static" clusters.
Dynamic clustres are from Keeper, records in Keeper are created by other nodes during selfregister.
Static clusters are from local xml config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be reflect it as a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a comment in ClusterDiscovery.h near zk_root_index.
{ | ||
LOG_ERROR(log, "Unknown cluster '{}'", cluster_name); | ||
continue; | ||
if (!info.zk_root_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it actually possible? In which circumstances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Static" clusters from local xml-config.
This (existed before) ability allows to discover nodes for cluster static_cluster_name
:
<clickhouse>
<remote_servers>
<static_cluster_name>
<discovery>
<path>/clickhouse/discovery/static_cluster_name</path>
This (new from this PR) ability allows to discover clustres, and these clusters can be created and removed dynamically:
<clickhouse>
<remote_servers>
<dynamic_clusters> # Actually this name is not used, can be any
<discovery>
<observer />
<multicluster_root_path>/clickhouse/discovery</multicluster_root_path>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right,
but does not !multicluster_discovery_paths->empty()
guarantee that it is a dynamic cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, can be both static and dynamic in the same time.
continue; | ||
auto p = new_dynamic_clusters_info.find(cluster_name); | ||
if (p != new_dynamic_clusters_info.end()) | ||
new_dynamic_clusters_info.erase(p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_dynamic_clusters_info.erase(cluster_name) is enough,
though I am not sure, may be it is less verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I used p
also for logging during development, forget to optimize. Thanks!
|
||
if (!need_update.exchange(false)) | ||
auto clusters = dynamic_clusters_to_update->wait(5s, finished); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'5s' probably worth moving to header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? It is not used outside this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure
auto cluster_info_it = clusters_info.find(cluster_name); | ||
if (cluster_info_it == clusters_info.end()) | ||
{ | ||
LOG_ERROR(log, "Unknown cluster '{}'", cluster_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be a bit more verbose message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's from old code. Technically it can be if some removed cluster re-created and callback from keeper watched triggered in the same moment, between getting nodes in findDynamicClusters and this line. I think, logging error can be removed, cluster should be added at next iteration in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Clusters autodiscovery
Cherry-pick of bugfix ClickHouse#74521
Documentation entry for user-facing changes
ClickHouse has a feature to discovery nodes in cluster (see setting
allow_experimental_cluster_discovery
).This PR add ability to discover new clusters.
Nodes in new clusters keep the same way to register in Keeper
Added ability to discover all clusters in Keeper node
Cherry-pick of bugfix ClickHouse#74521