From 28eba92b0c7bba76bd2198eb877e0be810713de8 Mon Sep 17 00:00:00 2001 From: alluxio-bot Date: Tue, 14 May 2024 23:17:06 -0700 Subject: [PATCH] [DOCFIX] Add warning for distributedCp limitations Cherry-pick of existing commit. orig-pr: Alluxio/alluxio#18608 orig-commit: Alluxio/alluxio@f3cf05410c600c25c192006f5e3d67839cf0c7c4 orig-commit-author: Rico Chiu pr-link: Alluxio/alluxio#18609 change-id: cid-be4807b887956808990a5094b4afee63780bfd84 --- docs/en/operation/User-CLI.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/en/operation/User-CLI.md b/docs/en/operation/User-CLI.md index c42558427d4b..171e69820418 100644 --- a/docs/en/operation/User-CLI.md +++ b/docs/en/operation/User-CLI.md @@ -847,6 +847,12 @@ Please wait for command submission to finish.. Submitted migrate job successfully, jobControlId = JOB_CONTROL_ID_2 ``` +Please note below are known limitations for the distributed copy command. +- Limited Scalability: No more than 1 million total number of files should be moved concurrently. Note that a copy job may stay active for a short period after the last file is copied. +- Manual Integrity Validation: Verification between source and destination files relies on the response code from the underlying data lake storage. In case the response code is unreliable, we recommend manual verification of source and destination checksums. +- Manual Cleanup: In certain failure scenarios, a user may need to manually remove partially written contents in destination directories and restart the failed jobs. +- Limited Observability: Status checks are limited to using the command line for each job individually. + ### du The `du` command outputs the total size and amount stored in Alluxio of files and folders.