Clusters autodiscovery #629

ianton-ru · 2025-02-14T15:12:02Z

Changelog category (leave one):

Experimental Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Clusters autodiscovery
Cherry-pick of bugfix ClickHouse#74521

Documentation entry for user-facing changes

ClickHouse has a feature to discovery nodes in cluster (see setting allow_experimental_cluster_discovery).
This PR add ability to discover new clusters.
Nodes in new clusters keep the same way to register in Keeper

<clickhouse>
    <remote_servers>
        <cluster_name>
            <discovery>
                <path>/clickhouse/discovery/cluster_name</path>

Added ability to discover all clusters in Keeper node

<clickhouse>
    <remote_servers>
        <dynamic_clusters> # Actually this name is not used, can be any
            <discovery>
                <observer />
                <multicluster_root_path>/clickhouse/discovery</multicluster_root_path>

Cherry-pick of bugfix ClickHouse#74521

Resolve the issue of leaking keeper watches.

altinity-robot · 2025-02-14T16:09:00Z

This is an automated comment for commit 6e08f9a with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check name	Description	Status
Integration tests	The integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests	❌ failure
Regression aarch64 S3 aws_s3	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	❌ error
Regression aarch64 S3 azure	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	❌ failure
Regression aarch64 S3 gcs	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	❌ failure
Regression release S3 aws_s3	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	❌ failure
Regression release S3 azure	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	❌ failure
Regression release S3 gcs	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	❌ failure
Sign aarch64	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	❌ error
Sign release	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	❌ error
Stateless tests	Runs stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	❌ failure
Stress test	Runs stateless functional tests concurrently from several clients to detect concurrency-related errors	❌ failure

Successful checks

Check name	Description	Status
Builds	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Compatibility check	Checks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help	✅ success
Docker keeper image	The check to build and optionally push the mentioned image to docker hub	✅ success
Docker server image	The check to build and optionally push the mentioned image to docker hub	✅ success
Install packages	Checks that the built packages are installable in a clear environment	✅ success
Ready for release	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 Alter attach partition	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 Alter move partition	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 Alter replace partition	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 Benchmark aws_s3	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 Benchmark gcs	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 Benchmark minio	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 Clickhouse Keeper SSL	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 LDAP authentication	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 LDAP external_user_directory	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 LDAP role_mapping	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 Parquet aws_s3	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 Parquet minio	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 Parquet	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 S3 minio	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 Tiered Storage minio	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 Tiered Storage s3amazon	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 Tiered Storage s3gcs	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 aes_encryption	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 aggregate_functions	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 atomic_insert	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 base_58	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 clickhouse_keeper	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 data_types	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 datetime64_extended_range	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 disk_level_encryption	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 dns	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 engines	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 example	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 extended_precision_data_types	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 kafka	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 kerberos	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 key_value	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 lightweight_delete	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 memory	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 part_moves_between_shards	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 rbac	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 selects	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 session_timezone	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 ssl_server	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 tiered_storage	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression aarch64 window_functions	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release Alter attach partition	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release Alter move partition	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release Alter replace partition	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release Benchmark aws_s3	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release Benchmark gcs	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release Benchmark minio	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release Clickhouse Keeper SSL	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release LDAP authentication	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release LDAP external_user_directory	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release LDAP role_mapping	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release Parquet aws_s3	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release Parquet minio	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release Parquet	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release S3 minio	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release Tiered Storage minio	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release Tiered Storage s3amazon	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release Tiered Storage s3gcs	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release aes_encryption	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release aggregate_functions	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release atomic_insert	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release base_58	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release clickhouse_keeper	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release data_types	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release datetime64_extended_range	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release disk_level_encryption	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release dns	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release engines	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release example	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release extended_precision_data_types	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release kafka	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release kerberos	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release key_value	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release lightweight_delete	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release memory	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release part_moves_between_shards	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release rbac	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release selects	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release session_timezone	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release ssl_server	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release tiered_storage	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Regression release window_functions	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Stateful tests	Runs stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	✅ success

ilejn · 2025-02-21T18:53:48Z

src/Interpreters/ClusterDiscovery.cpp

@@ -110,6 +111,54 @@ class ClusterDiscovery::ConcurrentFlags
    bool stop_flag = false;
 };



Please, explain why Flags and ConcurrentFlags coexist.

Good point. I left exists ConcurrentFlag for "static" clusters, it is immutable and more optimized, because can return reference of map outside and code outside can iterate without lock. Flag can be changed and return copy of map. But looks like it's a economy on matches for current case.
I'll remove ConcurrentFlag and use only one struct, thanks!

ilejn · 2025-02-21T18:59:59Z

src/Interpreters/ClusterDiscovery.cpp

+                    /* cluster_secret */ cluster_secret
+                );
+
+            multicluster_discovery_paths->push_back(


To me, we overuse shared_ptr in this code (not only in this particular place).
May be it worth getting rid of this extra layer or at least introduce types like FlagsPtr? I think that emplace_back should work effortlessly.

ilejn · 2025-02-21T19:18:39Z

src/Interpreters/ClusterDiscovery.cpp

+
+    for (auto & path : (*multicluster_discovery_paths))
+    {
+        ++zk_root_index;


Please, explain why we start index from 1.

0 is reserved for "static" clusters.
Dynamic clustres are from Keeper, records in Keeper are created by other nodes during selfregister.
Static clusters are from local xml config.

May be reflect it as a comment?

There is a comment in ClusterDiscovery.h near zk_root_index.

src/Interpreters/ClusterDiscovery.cpp

ilejn · 2025-02-21T19:30:20Z

src/Interpreters/ClusterDiscovery.cpp

            {
-                LOG_ERROR(log, "Unknown cluster '{}'", cluster_name);
-                continue;
+                if (!info.zk_root_index)


Is it actually possible? In which circumstances?

"Static" clusters from local xml-config.
This (existed before) ability allows to discover nodes for cluster static_cluster_name:

<clickhouse> <remote_servers> <static_cluster_name> <discovery> <path>/clickhouse/discovery/static_cluster_name</path>

This (new from this PR) ability allows to discover clustres, and these clusters can be created and removed dynamically:

<clickhouse> <remote_servers> <dynamic_clusters> # Actually this name is not used, can be any <discovery> <observer /> <multicluster_root_path>/clickhouse/discovery</multicluster_root_path>

Right,
but does not !multicluster_discovery_paths->empty()
guarantee that it is a dynamic cluster?

No, can be both static and dynamic in the same time.

ilejn · 2025-02-21T19:39:10Z

src/Interpreters/ClusterDiscovery.cpp

+                    continue;
+                auto p = new_dynamic_clusters_info.find(cluster_name);
+                if (p != new_dynamic_clusters_info.end())
+                    new_dynamic_clusters_info.erase(p);


new_dynamic_clusters_info.erase(cluster_name) is enough,
though I am not sure, may be it is less verbose.

Agree. I used p also for logging during development, forget to optimize. Thanks!

ilejn · 2025-02-21T19:41:24Z

src/Interpreters/ClusterDiscovery.cpp


-            if (!need_update.exchange(false))
+            auto clusters = dynamic_clusters_to_update->wait(5s, finished);


'5s' probably worth moving to header.

Are you sure? It is not used outside this method.

I am not sure

ilejn · 2025-02-21T19:42:24Z

src/Interpreters/ClusterDiscovery.cpp

+                auto cluster_info_it = clusters_info.find(cluster_name);
+                if (cluster_info_it == clusters_info.end())
+                {
+                    LOG_ERROR(log, "Unknown cluster '{}'", cluster_name);


May be a bit more verbose message?

It's from old code. Technically it can be if some removed cluster re-created and callback from keeper watched triggered in the same moment, between getting nodes in findDynamicClusters and this line. I think, logging error can be removed, cluster should be added at next iteration in this case.

ilejn

LGTM

…overy

vdimir and others added 2 commits February 14, 2025 16:06

Merge pull request ClickHouse#74521 from RinChanNOWWW/fix-watch-leak

502e03c

Resolve the issue of leaking keeper watches.

Autodiscovery dynamic clusters

6e68a61

ianton-ru mentioned this pull request Feb 14, 2025

Discover all clusters using auto discovery #612

Open

Enmk changed the base branch from project-antalya-24.12.2 to antalya February 19, 2025 21:29

ianton-ru added 2 commits February 21, 2025 12:27

More simple code

6c54b5a

Fix watching for empty cluster nodes

0575462

ilejn reviewed Feb 21, 2025

View reviewed changes

ianton-ru added 3 commits February 21, 2025 22:29

Fix style and tidy build

8fb51c7

Use single structure for update flags

5af952d

Fix use string after remove

7a6921f

ilejn approved these changes Feb 25, 2025

View reviewed changes

Merge branch 'antalya' into project-antalya-24.12.2-clusters-autodisc…

6e08f9a

…overy

Enmk merged commit dc21284 into antalya Mar 3, 2025
289 of 338 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clusters autodiscovery #629

Clusters autodiscovery #629

ianton-ru commented Feb 14, 2025

altinity-robot commented Feb 14, 2025 •

edited

Loading

ilejn Feb 21, 2025

ianton-ru Feb 21, 2025

ilejn Feb 21, 2025

ilejn Feb 21, 2025

ianton-ru Feb 21, 2025

ilejn Feb 21, 2025

ianton-ru Feb 22, 2025

ilejn Feb 21, 2025

ianton-ru Feb 21, 2025

ilejn Feb 21, 2025

ianton-ru Feb 22, 2025

ilejn Feb 21, 2025

ianton-ru Feb 21, 2025

ilejn Feb 21, 2025

ianton-ru Feb 21, 2025

ilejn Feb 21, 2025

ilejn Feb 21, 2025

ianton-ru Feb 21, 2025

ilejn left a comment

		@@ -110,6 +111,54 @@ class ClusterDiscovery::ConcurrentFlags
		bool stop_flag = false;
		};


		if (!need_update.exchange(false))
		auto clusters = dynamic_clusters_to_update->wait(5s, finished);

Clusters autodiscovery #629

Clusters autodiscovery #629

Conversation

ianton-ru commented Feb 14, 2025

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

altinity-robot commented Feb 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilejn left a comment

Choose a reason for hiding this comment

altinity-robot commented Feb 14, 2025 •

edited

Loading