From 6c69abc6c1818a1a62bf7bd4b9aac07ceb5f0f67 Mon Sep 17 00:00:00 2001 From: Wei-Chiu Chuang <weichiu@apache.org> Date: Wed, 8 Jan 2025 16:40:55 -0800 Subject: [PATCH 1/6] HDDS-12030. Update SCM-HA.zh.md Change-Id: I0d7cf24c00aab38148e9ffd3e6e38d98e9beb318 --- hadoop-hdds/docs/content/feature/SCM-HA.md | 10 ++-------- hadoop-hdds/docs/content/feature/SCM-HA.zh.md | 16 ++-------------- 2 files changed, 4 insertions(+), 22 deletions(-) diff --git a/hadoop-hdds/docs/content/feature/SCM-HA.md b/hadoop-hdds/docs/content/feature/SCM-HA.md index 333c908275d..1907fdb03b5 100644 --- a/hadoop-hdds/docs/content/feature/SCM-HA.md +++ b/hadoop-hdds/docs/content/feature/SCM-HA.md @@ -33,14 +33,8 @@ This document explains the HA setup of Storage Container Manager (SCM), please c ## Configuration -HA mode of Storage Container Manager can be enabled with the following settings in `ozone-site.xml`: +As of Apache Ozone 2.0, SCM HA is always enabled. Test deployments can continue to use single node Ratis for SCM if needed. -```XML -<property> - <name>ozone.scm.ratis.enable</name> - <value>true</value> -</property> -``` One Ozone configuration (`ozone-site.xml`) can support multiple SCM HA node set, multiple Ozone clusters. To select between the available SCM nodes a logical name is required for each of the clusters which can be resolved to the IP addresses (and domain names) of the Storage Container Managers. This logical name is called `serviceId` and can be configured in the `ozone-site.xml` @@ -234,7 +228,7 @@ bin/ozone debug ldb --db=/tmp/metadata/scm.db scan --column-family=containers ## Migrating from existing SCM -SCM HA can be turned on on any Ozone cluster. First enable Ratis (`ozone.scm.ratis.enable`) and configure only one node for the Ratis ring (`ozone.scm.nodes.serviceId` should have one element). +SCM HA is turned on for any Ozone cluster after upgrade to 2.0.0 or later. First configure only one node for the Ratis ring (`ozone.scm.nodes.serviceId` should have one element). Start the cluster and test if it works well. diff --git a/hadoop-hdds/docs/content/feature/SCM-HA.zh.md b/hadoop-hdds/docs/content/feature/SCM-HA.zh.md index a5382735b7a..c8eff4397a5 100644 --- a/hadoop-hdds/docs/content/feature/SCM-HA.zh.md +++ b/hadoop-hdds/docs/content/feature/SCM-HA.zh.md @@ -33,20 +33,8 @@ Ozone Manager 和 Storage Container Manager 都支持 HA。在这种模式下, ## 配置 -> ⚠️ **注意** ⚠️ -> -> SCM HA 目前仅支持新初始化的集群。 -> SCM HA 必须在 Ozone 服务首次启动前开启。 -> 当某个 SCM 以非 HA 的模式启动后,不支持将其改为 HA 模式。 +从 Apache Ozone 2.0 开始,SCM HA 默认启用。测试部署如果需要,仍然可以继续使用单节点 Ratis 来运行 SCM。 -Storage Container Manager 的 HA 模式可以在 `ozone-site.xml` 中进行以下设置: - -```XML -<property> - <name>ozone.scm.ratis.enable</name> - <value>true</value> -</property> -``` 一个 Ozone 配置(`ozone-site.xml`)可以支持多个SCM HA节点集,多个 Ozone 集群。要在可用的 SCM 节点之间进行选择,每个集群都需要一个逻辑名称,可以将其解析为 Storage Container Manage 的 IP 地址(和域名)。 这个逻辑名称称为 `serviceId`,可以在 `ozone-site.xml` 中配置。 @@ -218,7 +206,7 @@ bin/ozone debug ldb --db=/tmp/metadata/scm.db scan --column-family=containers ## 从现有的SCM迁移 -可以在任何 Ozone 集群上打开 SCM HA。 首先启用 Ratis(`ozone.scm.ratis.enable`)并为 Ratis ring 配置一个节点(`ozone.scm.nodes.serviceId` 应该有一个元素)。 +在升级到 2.0.0 或更高版本后,任何 Ozone 集群的 SCM HA都会启用。 首先为 Ratis ring 配置一个节点(`ozone.scm.nodes.serviceId` 应该有一个元素)。 启动集群并测试它是否正常工作。 From 635730733876fbed651b0366917eba9451e0954d Mon Sep 17 00:00:00 2001 From: Wei-Chiu Chuang <jojochuang@gmail.com> Date: Thu, 9 Jan 2025 13:52:52 -0800 Subject: [PATCH 2/6] Update hadoop-hdds/docs/content/feature/SCM-HA.md Co-authored-by: Nandakumar Vadivelu <nanda@apache.org> --- hadoop-hdds/docs/content/feature/SCM-HA.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/hadoop-hdds/docs/content/feature/SCM-HA.md b/hadoop-hdds/docs/content/feature/SCM-HA.md index 1907fdb03b5..4e96588828f 100644 --- a/hadoop-hdds/docs/content/feature/SCM-HA.md +++ b/hadoop-hdds/docs/content/feature/SCM-HA.md @@ -228,8 +228,8 @@ bin/ozone debug ldb --db=/tmp/metadata/scm.db scan --column-family=containers ## Migrating from existing SCM -SCM HA is turned on for any Ozone cluster after upgrade to 2.0.0 or later. First configure only one node for the Ratis ring (`ozone.scm.nodes.serviceId` should have one element). - -Start the cluster and test if it works well. +Add additional SCM nodes and extend the cluster configuration to reflect the newly added nodes. +Bootstrap the newly added SCM nodes with `scm --bootstrap` command and start the SCM service. +Note: Make sure that the `ozone.scm.primordial.node.id` property is pointed to the existing SCM before you run the `bootstrap` command on the newly added SCM nodes. If everything is fine, you can extend the cluster configuration with multiple nodes, restart SCM node, and initialize the additional nodes with `scm --bootstrap` command. From 5b64416d8687df8e8bec4cf5e567796a3d8a5975 Mon Sep 17 00:00:00 2001 From: Wei-Chiu Chuang <jojochuang@gmail.com> Date: Thu, 9 Jan 2025 13:53:00 -0800 Subject: [PATCH 3/6] Update hadoop-hdds/docs/content/feature/SCM-HA.md Co-authored-by: Nandakumar Vadivelu <nanda@apache.org> --- hadoop-hdds/docs/content/feature/SCM-HA.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hadoop-hdds/docs/content/feature/SCM-HA.md b/hadoop-hdds/docs/content/feature/SCM-HA.md index 4e96588828f..b9b3c7e2e23 100644 --- a/hadoop-hdds/docs/content/feature/SCM-HA.md +++ b/hadoop-hdds/docs/content/feature/SCM-HA.md @@ -226,7 +226,7 @@ bin/ozone debug ldb --db=/tmp/metadata/scm.db ls bin/ozone debug ldb --db=/tmp/metadata/scm.db scan --column-family=containers ``` -## Migrating from existing SCM +## Migrating from Non-HA to HA SCM Add additional SCM nodes and extend the cluster configuration to reflect the newly added nodes. Bootstrap the newly added SCM nodes with `scm --bootstrap` command and start the SCM service. From 2f94bc75e4d7a59623c1c2b501f0838bf3b50cdf Mon Sep 17 00:00:00 2001 From: Wei-Chiu Chuang <weichiu@apache.org> Date: Thu, 9 Jan 2025 14:01:55 -0800 Subject: [PATCH 4/6] Update Chiinese doc. Change-Id: Iaec7a7ada558ea251d362c6d3b687a252469ec93 --- hadoop-hdds/docs/content/feature/SCM-HA.md | 4 ---- hadoop-hdds/docs/content/feature/SCM-HA.zh.md | 12 ++++-------- 2 files changed, 4 insertions(+), 12 deletions(-) diff --git a/hadoop-hdds/docs/content/feature/SCM-HA.md b/hadoop-hdds/docs/content/feature/SCM-HA.md index b9b3c7e2e23..54c5c66ca94 100644 --- a/hadoop-hdds/docs/content/feature/SCM-HA.md +++ b/hadoop-hdds/docs/content/feature/SCM-HA.md @@ -33,8 +33,6 @@ This document explains the HA setup of Storage Container Manager (SCM), please c ## Configuration -As of Apache Ozone 2.0, SCM HA is always enabled. Test deployments can continue to use single node Ratis for SCM if needed. - One Ozone configuration (`ozone-site.xml`) can support multiple SCM HA node set, multiple Ozone clusters. To select between the available SCM nodes a logical name is required for each of the clusters which can be resolved to the IP addresses (and domain names) of the Storage Container Managers. This logical name is called `serviceId` and can be configured in the `ozone-site.xml` @@ -231,5 +229,3 @@ bin/ozone debug ldb --db=/tmp/metadata/scm.db scan --column-family=containers Add additional SCM nodes and extend the cluster configuration to reflect the newly added nodes. Bootstrap the newly added SCM nodes with `scm --bootstrap` command and start the SCM service. Note: Make sure that the `ozone.scm.primordial.node.id` property is pointed to the existing SCM before you run the `bootstrap` command on the newly added SCM nodes. - -If everything is fine, you can extend the cluster configuration with multiple nodes, restart SCM node, and initialize the additional nodes with `scm --bootstrap` command. diff --git a/hadoop-hdds/docs/content/feature/SCM-HA.zh.md b/hadoop-hdds/docs/content/feature/SCM-HA.zh.md index c8eff4397a5..05fa3f79a65 100644 --- a/hadoop-hdds/docs/content/feature/SCM-HA.zh.md +++ b/hadoop-hdds/docs/content/feature/SCM-HA.zh.md @@ -33,8 +33,6 @@ Ozone Manager 和 Storage Container Manager 都支持 HA。在这种模式下, ## 配置 -从 Apache Ozone 2.0 开始,SCM HA 默认启用。测试部署如果需要,仍然可以继续使用单节点 Ratis 来运行 SCM。 - 一个 Ozone 配置(`ozone-site.xml`)可以支持多个SCM HA节点集,多个 Ozone 集群。要在可用的 SCM 节点之间进行选择,每个集群都需要一个逻辑名称,可以将其解析为 Storage Container Manage 的 IP 地址(和域名)。 这个逻辑名称称为 `serviceId`,可以在 `ozone-site.xml` 中配置。 @@ -204,10 +202,8 @@ bin/ozone debug ldb --db=/tmp/metadata/scm.db ls bin/ozone debug ldb --db=/tmp/metadata/scm.db scan --column-family=containers ``` -## 从现有的SCM迁移 - -在升级到 2.0.0 或更高版本后,任何 Ozone 集群的 SCM HA都会启用。 首先为 Ratis ring 配置一个节点(`ozone.scm.nodes.serviceId` 应该有一个元素)。 - -启动集群并测试它是否正常工作。 +## 从非HA SCM迁移到SCM HA -如果一切正常,您可以用多个节点扩展集群配置,重新启动 SCM 节点,并使用 `scm --bootstrap` 命令初始化其他节点。 +添加额外的 SCM 节点,并扩展集群配置以包含新添加的节点。 +使用 `scm --bootstrap` 命令为新添加的 SCM 节点引导启动,然后启动 SCM 服务。 +注意:在新添加的 SCM 节点上运行 bootstrap 命令之前,请确保 `ozone.scm.primordial.node.id` 属性指向现有的 SCM。 From 5191cbcac01d3d27ad3b682de9443d36910a71b1 Mon Sep 17 00:00:00 2001 From: Wei-Chiu Chuang <weichiu@apache.org> Date: Wed, 15 Jan 2025 17:55:05 +0800 Subject: [PATCH 5/6] Fix typo Change-Id: I6d91479a181942d48837ea1c0d5a887f22469947 --- hadoop-hdds/docs/content/feature/SCM-HA.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hadoop-hdds/docs/content/feature/SCM-HA.md b/hadoop-hdds/docs/content/feature/SCM-HA.md index 54c5c66ca94..722f0ea3392 100644 --- a/hadoop-hdds/docs/content/feature/SCM-HA.md +++ b/hadoop-hdds/docs/content/feature/SCM-HA.md @@ -188,7 +188,7 @@ SCM HA uses Apache Ratis to replicate state between the members of the SCM HA qu This replication process is a simpler version of OM HA replication process as it doesn't use any double buffer (as the overall db thourghput of SCM requests are lower) -Datanodes are sending all the reports (Container reports, Pipeline reports...) to *all* the Datanodes parallel. Only the leader node can assign/create new containers, and only the leader node sends command back to the Datanodes. +Datanodes are sending all the reports (Container reports, Pipeline reports...) to *all* the Datanodes in parallel. Only the leader node can assign/create new containers, and only the leader node sends commands back to the Datanodes. ## Verify SCM HA setup From f21fd0a643e3249248046275abb1380ddf794f64 Mon Sep 17 00:00:00 2001 From: Wei-Chiu Chuang <weichiu@apache.org> Date: Sun, 19 Jan 2025 07:54:30 +0800 Subject: [PATCH 6/6] Fix typos and remove outdated descriptions Change-Id: Ib8297f80bf2e35bab65151128c0b999c817edf42 --- hadoop-hdds/docs/content/feature/SCM-HA.md | 6 ++---- hadoop-hdds/docs/content/feature/SCM-HA.zh.md | 3 +-- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/hadoop-hdds/docs/content/feature/SCM-HA.md b/hadoop-hdds/docs/content/feature/SCM-HA.md index 722f0ea3392..2b6ee72b7cf 100644 --- a/hadoop-hdds/docs/content/feature/SCM-HA.md +++ b/hadoop-hdds/docs/content/feature/SCM-HA.md @@ -177,9 +177,7 @@ signed certificate for sub-CA from root CA. primordial SCM is not defined. Bring up other SCM's using **--bootstrap**. ### Current SCM HA Security limitation: -1. When primordial SCM is down, new SCM’s cannot be bootstrapped and join the -quorum. -2. Secure cluster upgrade to ratis-enable secure cluster is not supported. +* Unsecure HA cluster upgrade to secure HA cluster is not supported. ## Implementation details @@ -188,7 +186,7 @@ SCM HA uses Apache Ratis to replicate state between the members of the SCM HA qu This replication process is a simpler version of OM HA replication process as it doesn't use any double buffer (as the overall db thourghput of SCM requests are lower) -Datanodes are sending all the reports (Container reports, Pipeline reports...) to *all* the Datanodes in parallel. Only the leader node can assign/create new containers, and only the leader node sends commands back to the Datanodes. +Datanodes are sending all the reports (Container reports, Pipeline reports...) to *all* SCM nodes in parallel. Only the leader node can assign/create new containers, and only the leader node sends commands back to the Datanodes. ## Verify SCM HA setup diff --git a/hadoop-hdds/docs/content/feature/SCM-HA.zh.md b/hadoop-hdds/docs/content/feature/SCM-HA.zh.md index 05fa3f79a65..66d2b885fbe 100644 --- a/hadoop-hdds/docs/content/feature/SCM-HA.zh.md +++ b/hadoop-hdds/docs/content/feature/SCM-HA.zh.md @@ -157,8 +157,7 @@ bin/ozone scm --bootstrap ### 目前 SCM HA 安全的限制 -1. 当原始 SCM 失效时, 新的 SCM 不能被引导并添加到 HA 节点中。 -2. 尚未支持从非 HA 安全集群升级到 HA 安全集群。 +* 尚未支持从非 HA 安全集群升级到 HA 安全集群。 ## 实现细节