diff --git a/hadoop-hdds/docs/content/feature/SCM-HA.md b/hadoop-hdds/docs/content/feature/SCM-HA.md index 333c908275d..2b6ee72b7cf 100644 --- a/hadoop-hdds/docs/content/feature/SCM-HA.md +++ b/hadoop-hdds/docs/content/feature/SCM-HA.md @@ -33,14 +33,6 @@ This document explains the HA setup of Storage Container Manager (SCM), please c ## Configuration -HA mode of Storage Container Manager can be enabled with the following settings in `ozone-site.xml`: - -```XML - - ozone.scm.ratis.enable - true - -``` One Ozone configuration (`ozone-site.xml`) can support multiple SCM HA node set, multiple Ozone clusters. To select between the available SCM nodes a logical name is required for each of the clusters which can be resolved to the IP addresses (and domain names) of the Storage Container Managers. This logical name is called `serviceId` and can be configured in the `ozone-site.xml` @@ -185,9 +177,7 @@ signed certificate for sub-CA from root CA. primordial SCM is not defined. Bring up other SCM's using **--bootstrap**. ### Current SCM HA Security limitation: -1. When primordial SCM is down, new SCM’s cannot be bootstrapped and join the -quorum. -2. Secure cluster upgrade to ratis-enable secure cluster is not supported. +* Unsecure HA cluster upgrade to secure HA cluster is not supported. ## Implementation details @@ -196,7 +186,7 @@ SCM HA uses Apache Ratis to replicate state between the members of the SCM HA qu This replication process is a simpler version of OM HA replication process as it doesn't use any double buffer (as the overall db thourghput of SCM requests are lower) -Datanodes are sending all the reports (Container reports, Pipeline reports...) to *all* the Datanodes parallel. Only the leader node can assign/create new containers, and only the leader node sends command back to the Datanodes. +Datanodes are sending all the reports (Container reports, Pipeline reports...) to *all* SCM nodes in parallel. Only the leader node can assign/create new containers, and only the leader node sends commands back to the Datanodes. ## Verify SCM HA setup @@ -232,10 +222,8 @@ bin/ozone debug ldb --db=/tmp/metadata/scm.db ls bin/ozone debug ldb --db=/tmp/metadata/scm.db scan --column-family=containers ``` -## Migrating from existing SCM - -SCM HA can be turned on on any Ozone cluster. First enable Ratis (`ozone.scm.ratis.enable`) and configure only one node for the Ratis ring (`ozone.scm.nodes.serviceId` should have one element). - -Start the cluster and test if it works well. +## Migrating from Non-HA to HA SCM -If everything is fine, you can extend the cluster configuration with multiple nodes, restart SCM node, and initialize the additional nodes with `scm --bootstrap` command. +Add additional SCM nodes and extend the cluster configuration to reflect the newly added nodes. +Bootstrap the newly added SCM nodes with `scm --bootstrap` command and start the SCM service. +Note: Make sure that the `ozone.scm.primordial.node.id` property is pointed to the existing SCM before you run the `bootstrap` command on the newly added SCM nodes. diff --git a/hadoop-hdds/docs/content/feature/SCM-HA.zh.md b/hadoop-hdds/docs/content/feature/SCM-HA.zh.md index a5382735b7a..66d2b885fbe 100644 --- a/hadoop-hdds/docs/content/feature/SCM-HA.zh.md +++ b/hadoop-hdds/docs/content/feature/SCM-HA.zh.md @@ -33,20 +33,6 @@ Ozone Manager 和 Storage Container Manager 都支持 HA。在这种模式下, ## 配置 -> ⚠️ **注意** ⚠️ -> -> SCM HA 目前仅支持新初始化的集群。 -> SCM HA 必须在 Ozone 服务首次启动前开启。 -> 当某个 SCM 以非 HA 的模式启动后,不支持将其改为 HA 模式。 - -Storage Container Manager 的 HA 模式可以在 `ozone-site.xml` 中进行以下设置: - -```XML - - ozone.scm.ratis.enable - true - -``` 一个 Ozone 配置(`ozone-site.xml`)可以支持多个SCM HA节点集,多个 Ozone 集群。要在可用的 SCM 节点之间进行选择,每个集群都需要一个逻辑名称,可以将其解析为 Storage Container Manage 的 IP 地址(和域名)。 这个逻辑名称称为 `serviceId`,可以在 `ozone-site.xml` 中配置。 @@ -171,8 +157,7 @@ bin/ozone scm --bootstrap ### 目前 SCM HA 安全的限制 -1. 当原始 SCM 失效时, 新的 SCM 不能被引导并添加到 HA 节点中。 -2. 尚未支持从非 HA 安全集群升级到 HA 安全集群。 +* 尚未支持从非 HA 安全集群升级到 HA 安全集群。 ## 实现细节 @@ -216,10 +201,8 @@ bin/ozone debug ldb --db=/tmp/metadata/scm.db ls bin/ozone debug ldb --db=/tmp/metadata/scm.db scan --column-family=containers ``` -## 从现有的SCM迁移 - -可以在任何 Ozone 集群上打开 SCM HA。 首先启用 Ratis(`ozone.scm.ratis.enable`)并为 Ratis ring 配置一个节点(`ozone.scm.nodes.serviceId` 应该有一个元素)。 - -启动集群并测试它是否正常工作。 +## 从非HA SCM迁移到SCM HA -如果一切正常,您可以用多个节点扩展集群配置,重新启动 SCM 节点,并使用 `scm --bootstrap` 命令初始化其他节点。 +添加额外的 SCM 节点,并扩展集群配置以包含新添加的节点。 +使用 `scm --bootstrap` 命令为新添加的 SCM 节点引导启动,然后启动 SCM 服务。 +注意:在新添加的 SCM 节点上运行 bootstrap 命令之前,请确保 `ozone.scm.primordial.node.id` 属性指向现有的 SCM。