From 24cc00bd5021dbf7fbf6935b16b2bebf64399d1a Mon Sep 17 00:00:00 2001 From: Bala Harish Date: Thu, 7 Mar 2024 16:16:47 +0530 Subject: [PATCH 01/32] docs: merged all the mayastor topics/pages with the openebs topics/pages Signed-off-by: Bala Harish --- .../commercial.md => commercial-support.md} | 0 docs/main/{introduction => }/community.md | 0 ...storage.md => container-native-storage.md} | 0 .../{ => data-engines}/data-engines.md | 0 .../local-engine.md} | 0 .../replicated-engine.md} | 0 .../_category_.json | 0 .../benefits.mdx | 0 .../features.mdx | 0 .../introduction-to-openebs.md} | 0 .../use-cases-and-examples.mdx} | 0 .../deploy-a-test-application.md | 247 +++++++++ .../installation.md | 0 .../quickstart.md | 0 docs/main/{introduction => }/releases.md | 0 ...ing.md => troubleshooting-local-engine.md} | 0 .../troubleshooting-replicated-engine.md | 314 +++++++++++ .../additional-information}/alphafeatures.md | 0 .../additional-information}/faq.md | 0 .../additional-information}/k8supgrades.md | 0 .../additional-information}/kb.md | 0 .../additional-information}/performance.md | 0 .../prerequisites.mdx | 0 .../additional-information/call-home.md | 72 +++ .../i-o-path-description.md | 167 ++++++ .../additional-information/migrate-etcd.md | 139 +++++ .../distributeddb-backup.md | 115 ++++ .../distributeddb-overview.md | 12 + .../distributeddb-restore.md | 345 ++++++++++++ .../replicateddb-backup.md | 204 +++++++ .../replicateddb-overview.md | 16 + .../replicateddb-restore.md | 513 ++++++++++++++++++ .../performance-tips.md | 108 ++++ .../replica-operations.md | 70 +++ .../additional-information/scale-etcd.md | 167 ++++++ .../tested-third-party-software.md | 16 + .../additional-information/tips-and-tricks.md | 94 ++++ .../advanced-operations/HA.md | 17 + .../advanced-operations/kubectl-plugin.md | 218 ++++++++ .../advanced-operations/monitoring.md | 110 ++++ .../advanced-operations/node-cordon.md | 38 ++ .../advanced-operations/node-drain.md | 27 + .../advanced-operations/replica-rebuild.md | 86 +++ .../advanced-operations/snapshot-restore.md | 110 ++++ .../advanced-operations/snapshot.md | 265 +++++++++ .../advanced-operations/supportability.md | 334 ++++++++++++ .../advanced-operations/upgrade.md | 97 ++++ .../platform-support/microk8s-installation.md | 58 ++ .../prerequisites.md | 287 ++++++++++ .../{uninstall.md => uninstallation.md} | 0 .../user-guides/{upgrade.md => upgrades.md} | 0 51 files changed, 4246 insertions(+) rename docs/main/{introduction/commercial.md => commercial-support.md} (100%) rename docs/main/{introduction => }/community.md (100%) rename docs/main/concepts/{container-attached-storage.md => container-native-storage.md} (100%) rename docs/main/concepts/{ => data-engines}/data-engines.md (100%) rename docs/main/concepts/{localpv.md => data-engines/local-engine.md} (100%) rename docs/main/concepts/{mayastor.md => data-engines/replicated-engine.md} (100%) rename docs/main/{introduction => introduction-to-openebs}/_category_.json (100%) rename docs/main/{introduction => introduction-to-openebs}/benefits.mdx (100%) rename docs/main/{introduction => introduction-to-openebs}/features.mdx (100%) rename docs/main/{introduction/overview.md => introduction-to-openebs/introduction-to-openebs.md} (100%) rename docs/main/{introduction/use-cases.mdx => introduction-to-openebs/use-cases-and-examples.mdx} (100%) create mode 100644 docs/main/quickstart-guide/deploy-a-test-application.md rename docs/main/{user-guides => quickstart-guide}/installation.md (100%) rename docs/main/{user-guides => quickstart-guide}/quickstart.md (100%) rename docs/main/{introduction => }/releases.md (100%) rename docs/main/troubleshooting/{troubleshooting.md => troubleshooting-local-engine.md} (100%) create mode 100644 docs/main/troubleshooting/troubleshooting-replicated-engine.md rename docs/main/{additional-info => user-guides/local-engine-user-guide/additional-information}/alphafeatures.md (100%) rename docs/main/{additional-info => user-guides/local-engine-user-guide/additional-information}/faq.md (100%) rename docs/main/{additional-info => user-guides/local-engine-user-guide/additional-information}/k8supgrades.md (100%) rename docs/main/{additional-info => user-guides/local-engine-user-guide/additional-information}/kb.md (100%) rename docs/main/{additional-info => user-guides/local-engine-user-guide/additional-information}/performance.md (100%) rename docs/main/user-guides/{ => local-engine-user-guide}/prerequisites.mdx (100%) create mode 100644 docs/main/user-guides/replicated-engine-user-guide/additional-information/call-home.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/additional-information/i-o-path-description.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/additional-information/migrate-etcd.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-distributed-db/distributeddb-backup.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-distributed-db/distributeddb-overview.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-distributed-db/distributeddb-restore.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-replicated-db/replicateddb-backup.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-replicated-db/replicateddb-overview.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-replicated-db/replicateddb-restore.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/additional-information/performance-tips.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/additional-information/replica-operations.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/additional-information/scale-etcd.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/additional-information/tested-third-party-software.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/additional-information/tips-and-tricks.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/advanced-operations/HA.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/advanced-operations/kubectl-plugin.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/advanced-operations/monitoring.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/advanced-operations/node-cordon.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/advanced-operations/node-drain.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/advanced-operations/replica-rebuild.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/advanced-operations/snapshot-restore.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/advanced-operations/snapshot.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/advanced-operations/supportability.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/advanced-operations/upgrade.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/platform-support/microk8s-installation.md create mode 100644 docs/main/user-guides/replicated-engine-user-guide/prerequisites.md rename docs/main/user-guides/{uninstall.md => uninstallation.md} (100%) rename docs/main/user-guides/{upgrade.md => upgrades.md} (100%) diff --git a/docs/main/introduction/commercial.md b/docs/main/commercial-support.md similarity index 100% rename from docs/main/introduction/commercial.md rename to docs/main/commercial-support.md diff --git a/docs/main/introduction/community.md b/docs/main/community.md similarity index 100% rename from docs/main/introduction/community.md rename to docs/main/community.md diff --git a/docs/main/concepts/container-attached-storage.md b/docs/main/concepts/container-native-storage.md similarity index 100% rename from docs/main/concepts/container-attached-storage.md rename to docs/main/concepts/container-native-storage.md diff --git a/docs/main/concepts/data-engines.md b/docs/main/concepts/data-engines/data-engines.md similarity index 100% rename from docs/main/concepts/data-engines.md rename to docs/main/concepts/data-engines/data-engines.md diff --git a/docs/main/concepts/localpv.md b/docs/main/concepts/data-engines/local-engine.md similarity index 100% rename from docs/main/concepts/localpv.md rename to docs/main/concepts/data-engines/local-engine.md diff --git a/docs/main/concepts/mayastor.md b/docs/main/concepts/data-engines/replicated-engine.md similarity index 100% rename from docs/main/concepts/mayastor.md rename to docs/main/concepts/data-engines/replicated-engine.md diff --git a/docs/main/introduction/_category_.json b/docs/main/introduction-to-openebs/_category_.json similarity index 100% rename from docs/main/introduction/_category_.json rename to docs/main/introduction-to-openebs/_category_.json diff --git a/docs/main/introduction/benefits.mdx b/docs/main/introduction-to-openebs/benefits.mdx similarity index 100% rename from docs/main/introduction/benefits.mdx rename to docs/main/introduction-to-openebs/benefits.mdx diff --git a/docs/main/introduction/features.mdx b/docs/main/introduction-to-openebs/features.mdx similarity index 100% rename from docs/main/introduction/features.mdx rename to docs/main/introduction-to-openebs/features.mdx diff --git a/docs/main/introduction/overview.md b/docs/main/introduction-to-openebs/introduction-to-openebs.md similarity index 100% rename from docs/main/introduction/overview.md rename to docs/main/introduction-to-openebs/introduction-to-openebs.md diff --git a/docs/main/introduction/use-cases.mdx b/docs/main/introduction-to-openebs/use-cases-and-examples.mdx similarity index 100% rename from docs/main/introduction/use-cases.mdx rename to docs/main/introduction-to-openebs/use-cases-and-examples.mdx diff --git a/docs/main/quickstart-guide/deploy-a-test-application.md b/docs/main/quickstart-guide/deploy-a-test-application.md new file mode 100644 index 000000000..e11f582ac --- /dev/null +++ b/docs/main/quickstart-guide/deploy-a-test-application.md @@ -0,0 +1,247 @@ +# Deploy a Test Application + +## Objective + +If all verification steps in the preceding stages were satisfied, then Mayastor has been successfully deployed within the cluster. In order to verify basic functionality, we will now dynamically provision a Persistent Volume based on a Mayastor StorageClass, mount that volume within a small test pod which we'll create, and use the [**Flexible I/O Tester**](https://github.com/axboe/fio) utility within that pod to check that I/O to the volume is processed correctly. + +## Define the PVC + +Use `kubectl` to create a PVC based on a StorageClass that you created in the [previous stage](configure-mayastor.md#create-mayastor-storageclass-s). In the example shown below, we'll consider that StorageClass to have been named "mayastor-1". Replace the value of the field "storageClassName" with the name of your own Mayastor-based StorageClass. + +For the purposes of this quickstart guide, it is suggested to name the PVC "ms-volume-claim", as this is what will be illustrated in the example steps which follow. + +{% tabs %} +{% tab title="Command" %} +```text +cat <=64=0.0% + submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% + complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% + issued rwts: total=40801,40696,0,0 short=0,0,0,0 dropped=0,0,0,0 + latency : target=0, window=0, percentile=100.00%, depth=16 + +Run status group 0 (all jobs): + READ: bw=2720KiB/s (2785kB/s), 2720KiB/s-2720KiB/s (2785kB/s-2785kB/s), io=159MiB (167MB), run=60011-60011msec + WRITE: bw=2713KiB/s (2778kB/s), 2713KiB/s-2713KiB/s (2778kB/s-2778kB/s), io=159MiB (167MB), run=60011-60011msec + +Disk stats (read/write): + sdd: ios=40795/40692, merge=0/9, ticks=375308/568708, in_queue=891648, util=99.53% +``` +{% endtab %} +{% endtabs %} + +If no errors are reported in the output then Mayastor has been correctly configured and is operating as expected. You may create and consume additional Persistent Volumes with your own test applications. + diff --git a/docs/main/user-guides/installation.md b/docs/main/quickstart-guide/installation.md similarity index 100% rename from docs/main/user-guides/installation.md rename to docs/main/quickstart-guide/installation.md diff --git a/docs/main/user-guides/quickstart.md b/docs/main/quickstart-guide/quickstart.md similarity index 100% rename from docs/main/user-guides/quickstart.md rename to docs/main/quickstart-guide/quickstart.md diff --git a/docs/main/introduction/releases.md b/docs/main/releases.md similarity index 100% rename from docs/main/introduction/releases.md rename to docs/main/releases.md diff --git a/docs/main/troubleshooting/troubleshooting.md b/docs/main/troubleshooting/troubleshooting-local-engine.md similarity index 100% rename from docs/main/troubleshooting/troubleshooting.md rename to docs/main/troubleshooting/troubleshooting-local-engine.md diff --git a/docs/main/troubleshooting/troubleshooting-replicated-engine.md b/docs/main/troubleshooting/troubleshooting-replicated-engine.md new file mode 100644 index 000000000..8bee2f587 --- /dev/null +++ b/docs/main/troubleshooting/troubleshooting-replicated-engine.md @@ -0,0 +1,314 @@ +# Basic Troubleshooting + +## Logs + +The correct set of log file to collect depends on the nature of the problem. If unsure, then it is best to collect log files for all Mayastor containers. In nearly every case, the logs of all of the control plane component pods will be needed; + +* csi-controller +* core-agent +* rest +* msp-operator + +{% tabs %} +{% tab title="List all Mayastor pods" %} +```bash +kubectl -n mayastor get pods -o wide +``` +{% endtab %} + +{% tab title="Output example" %} +```text +NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES +mayastor-csi-7pg82 2/2 Running 0 15m 10.0.84.131 worker-2 +mayastor-csi-gmpq6 2/2 Running 0 15m 10.0.239.174 worker-1 +mayastor-csi-xrmxx 2/2 Running 0 15m 10.0.85.71 worker-0 +mayastor-qgpw6 1/1 Running 0 14m 10.0.85.71 worker-0 +mayastor-qr84q 1/1 Running 0 14m 10.0.239.174 worker-1 +mayastor-xhmj5 1/1 Running 0 14m 10.0.84.131 worker-2 +... etc (output truncated for brevity) +``` +{% endtab %} +{% endtabs %} + + +### Mayastor pod log file + +Mayastor containers form the data plane of a Mayastor deployment. A cluster should schedule as many mayastor container instances as required storage nodes have been defined. This log file is most useful when troubleshooting I/O errors however, provisioning and management operations might also fail because of a problem on a storage node. + +{% tabs %} +{% tab title="Example obtaining mayastor\'s log" %} +```bash +kubectl -n mayastor logs mayastor-qgpw6 mayastor +``` +{% endtab %} +{% endtabs %} + +### CSI agent pod log file + +If experiencing problems with \(un\)mounting a volume on an application node, this log file can be useful. Generally all worker nodes in the cluster will be configured to schedule a mayastor CSI agent pod, so it's good to know which specific node is experiencing the issue and inspect the log file only for that node. + +{% tabs %} +{% tab title="Example obtaining mayastor CSI driver\'s log" %} +```bash +kubectl -n mayastor logs mayastor-csi-7pg82 mayastor-csi +``` +{% endtab %} +{% endtabs %} + +### CSI sidecars + +These containers implement the CSI spec for Kubernetes and run within the same pods as the csi-controller and mayastor-csi (node plugin) containers. Whilst they are not part of Mayastor's code, they can contain useful information when a Mayastor CSI controller/node plugin fails to register with k8s cluster. + +{% tabs %} +{% tab title="Obtaining CSI control containers logs" %} +```bash +kubectl -n mayastor logs $(kubectl -n mayastor get pod -l app=moac -o jsonpath="{.items[0].metadata.name}") csi-attacher +kubectl -n mayastor logs $(kubectl -n mayastor get pod -l app=moac -o jsonpath="{.items[0].metadata.name}") csi-provisioner +``` +{% endtab %} +{% endtabs %} + +{% tabs %} +{% tab title="Example obtaining CSI node container log" %} +```bash +kubectl -n mayastor logs mayastor-csi-7pg82 csi-driver-registrar +``` +{% endtab %} +{% endtabs %} + +## Coredumps + +A coredump is a snapshot of process' memory combined with auxiliary information \(PID, state of registers, etc.\) and saved to a file. It is used for post-mortem analysis and it is generated automatically by the operating system in case of a severe, unrecoverable error \(i.e. memory corruption\) causing the process to panic. Using a coredump for a problem analysis requires deep knowledge of program internals and is usually done only by developers. However, there is a very useful piece of information that users can retrieve from it and this information alone can often identify the root cause of the problem. That is the stack \(backtrace\) - a record of the last action that the program was performing at the time when it crashed. Here we describe how to get it. The steps as shown apply specifically to Ubuntu, other linux distros might employ variations. + +We rely on systemd-coredump that saves and manages coredumps on the system, `coredumpctl` utility that is part of the same package and finally the `gdb` debugger. + +{% tabs %} +{% tab title="Install systemd-coredump and gdb" %} +```bash +sudo apt-get install -y systemd-coredump gdb lz4 +``` +{% endtab %} +{% endtabs %} + +If installed correctly then the global core pattern will be set so that all generated coredumps will be piped to the `systemd-coredump` binary. + +{% tabs %} +{% tab title="Verify coredump configuration" %} +```bash +cat /proc/sys/kernel/core_pattern +``` +{% endtab %} + +{% tab title="Output example" %} +```text +|/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h +``` +{% endtab %} +{% endtabs %} + +{% tabs %} +{% tab title="List coredumps" %} +```bash +coredumpctl list +``` +{% endtab %} + +{% tab title="Output example" %} +```text +TIME PID UID GID SIG COREFILE EXE +Tue 2021-03-09 17:43:46 UTC 206366 0 0 6 present /bin/mayastor +``` +{% endtab %} +{% endtabs %} + +If there is a new coredump from the mayastor container, the coredump alone won't be that useful. GDB needs to access the binary of crashed process in order to be able to print at least some information in the backtrace. For that, we need to copy the contents of the container's filesystem to the host. + +{% tabs %} +{% tab title="Get ID of the mayastor container" %} +```bash +docker ps | grep mayadata/mayastor +``` +{% endtab %} + +{% tab title="Output example" %} +```text +b3db4615d5e1 mayadata/mayastor "sleep 100000" 27 minutes ago Up 27 minutes k8s_mayastor_mayastor-n682s_mayastor_51d26ee0-1a96-44c7-85ba-6e50767cd5ce_0 +d72afea5bcc2 mayadata/mayastor-csi "/bin/mayastor-csi -…" 7 hours ago Up 7 hours k8s_mayastor-csi_mayastor-csi-xrmxx_mayastor_d24017f2-5268-44a0-9fcd-84a593d7acb2_0 +``` +{% endtab %} +{% endtabs %} + +{% tabs %} +{% tab title="Copy relevant parts of the container\'s fs" %} +```bash +mkdir -p /tmp/rootdir +docker cp b3db4615d5e1:/bin /tmp/rootdir +docker cp b3db4615d5e1:/nix /tmp/rootdir +``` +{% endtab %} +{% endtabs %} + +Now we can start GDB. _Don't_ use the `coredumpctl` command for starting the debugger. It invokes GDB with invalid path to the debugged binary hence stack unwinding fails for Rust functions. At first we extract the compressed coredump. + +{% tabs %} +{% tab title="Find location of the compressed coredump" %} +```bash +coredumpctl info | grep Storage | awk '{ print $2 }' +``` +{% endtab %} + +{% tab title="Output example" %} +```text +/var/lib/systemd/coredump/core.mayastor.0.6a5e550e77ee4e77a19bd67436ce7a98.64074.1615374302000000000000.lz4 +``` +{% endtab %} +{% endtabs %} + +{% tabs %} +{% tab title="Extract the coredump" %} +```bash +sudo lz4cat /var/lib/systemd/coredump/core.mayastor.0.6a5e550e77ee4e77a19bd67436ce7a98.64074.1615374302000000000000.lz4 >core +``` +{% endtab %} +{% endtabs %} + +{% tabs %} +{% tab title="Open coredump in GDB" %} +```bash +gdb -c core /tmp/rootdir$(readlink /tmp/rootdir/bin/mayastor) +``` +{% endtab %} + +{% tab title="Output example" %} +```text +GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 +Copyright (C) 2020 Free Software Foundation, Inc. +License GPLv3+: GNU GPL version 3 or later +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. +Type "show copying" and "show warranty" for details. +This GDB was configured as "x86_64-linux-gnu". +Type "show configuration" for configuration details. +For bug reporting instructions, please see: +. +Find the GDB manual and other documentation resources online at: + . + +For help, type "help". +Type "apropos word" to search for commands related to "word"... +[New LWP 13] +[New LWP 17] +[New LWP 14] +[New LWP 16] +[New LWP 18] +Core was generated by `/bin/mayastor -l0 -nnats'. +Program terminated with signal SIGABRT, Aborted. +#0 0x00007ffdad99fb37 in clock_gettime () +[Current thread is 1 (LWP 13)] +``` +{% endtab %} +{% endtabs %} + +Once in GDB we need to set a sysroot so that GDB knows where to find the binary for the debugged program. + +{% tabs %} +{% tab title="Set sysroot in GDB" %} +```bash +set auto-load safe-path /tmp/rootdir +set sysroot /tmp/rootdir +``` +{% endtab %} + +{% tab title="Output example" %} +```text +Reading symbols from /tmp/rootdir/nix/store/f1gzfqq10dlha1qw10sqvgil34qh30af-systemd-246.6/lib/libudev.so.1... +(No debugging symbols found in /tmp/rootdir/nix/store/f1gzfqq10dlha1qw10sqvgil34qh30af-systemd-246.6/lib/libudev.so.1) +Reading symbols from /tmp/rootdir/nix/store/0kdiav729rrcdwbxws653zxz5kngx8aa-libspdk-dev-21.01/lib/libspdk.so... +Reading symbols from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libdl.so.2... +(No debugging symbols found in /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libdl.so.2) +Reading symbols from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libgcc_s.so.1... +(No debugging symbols found in /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libgcc_s.so.1) +Reading symbols from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libpthread.so.0... +... +``` +{% endtab %} +{% endtabs %} + +After that we can print backtrace\(s\). + +{% tabs %} +{% tab title="Obtain backtraces for all threads in GDB" %} +```bash +thread apply all bt +``` +{% endtab %} + +{% tab title="Output example" %} +```text +Thread 5 (Thread 0x7f78248bb640 (LWP 59)): +#0 0x00007f7825ac0582 in __lll_lock_wait () from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libpthread.so.0 +#1 0x00007f7825ab90c1 in pthread_mutex_lock () from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libpthread.so.0 +#2 0x00005633ca2e287e in async_io::driver::main_loop () +#3 0x00005633ca2e27d9 in async_io::driver::UNPARKER::{{closure}}::{{closure}} () +#4 0x00005633ca2e27c9 in std::sys_common::backtrace::__rust_begin_short_backtrace () +#5 0x00005633ca2e27b9 in std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}} () +#6 0x00005633ca2e27a9 in as core::ops::function::FnOnce<()>>::call_once () +#7 0x00005633ca2e26b4 in core::ops::function::FnOnce::call_once{{vtable-shim}} () +#8 0x00005633ca723cda in as core::ops::function::FnOnce>::call_once () at /rustc/d1206f950ffb76c76e1b74a19ae33c2b7d949454/library/alloc/src/boxed.rs:1546 +#9 as core::ops::function::FnOnce>::call_once () at /rustc/d1206f950ffb76c76e1b74a19ae33c2b7d949454/library/alloc/src/boxed.rs:1546 +#10 std::sys::unix::thread::Thread::new::thread_start () at /rustc/d1206f950ffb76c76e1b74a19ae33c2b7d949454//library/std/src/sys/unix/thread.rs:71 +#11 0x00007f7825ab6e9e in start_thread () from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libpthread.so.0 +#12 0x00007f78259e566f in clone () from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libc.so.6 + +Thread 4 (Thread 0x7f7824cbd640 (LWP 57)): +#0 0x00007f78259e598f in epoll_wait () from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libc.so.6 +#1 0x00005633ca2e414b in async_io::reactor::ReactorLock::react () +#2 0x00005633ca583c11 in async_io::driver::block_on () +#3 0x00005633ca5810dd in std::sys_common::backtrace::__rust_begin_short_backtrace () +#4 0x00005633ca580e5c in core::ops::function::FnOnce::call_once{{vtable-shim}} () +#5 0x00005633ca723cda in as core::ops::function::FnOnce>::call_once () at /rustc/d1206f950ffb76c76e1b74a19ae33c2b7d949454/library/alloc/src/boxed.rs:1546 +#6 as core::ops::function::FnOnce>::call_once () at /rustc/d1206f950ffb76c76e1b74a19ae33c2b7d949454/library/alloc/src/boxed.rs:1546 +#7 std::sys::unix::thread::Thread::new::thread_start () at /rustc/d1206f950ffb76c76e1b74a19ae33c2b7d949454//library/std/src/sys/unix/thread.rs:71 +#8 0x00007f7825ab6e9e in start_thread () from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libpthread.so.0 +#9 0x00007f78259e566f in clone () from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libc.so.6 + +Thread 3 (Thread 0x7f78177fe640 (LWP 61)): +#0 0x00007f7825ac08b7 in accept () from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libpthread.so.0 +#1 0x00007f7825c930bb in socket_listener () from /tmp/rootdir/nix/store/0kdiav729rrcdwbxws653zxz5kngx8aa-libspdk-dev-21.01/lib/libspdk.so +#2 0x00007f7825ab6e9e in start_thread () from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libpthread.so.0 +#3 0x00007f78259e566f in clone () from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libc.so.6 + +Thread 2 (Thread 0x7f7817fff640 (LWP 60)): +#0 0x00007f78259e598f in epoll_wait () from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libc.so.6 +#1 0x00007f7825c7f174 in eal_intr_thread_main () from /tmp/rootdir/nix/store/0kdiav729rrcdwbxws653zxz5kngx8aa-libspdk-dev-21.01/lib/libspdk.so +#2 0x00007f7825ab6e9e in start_thread () from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libpthread.so.0 +#3 0x00007f78259e566f in clone () from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libc.so.6 + +Thread 1 (Thread 0x7f782559f040 (LWP 56)): +#0 0x00007fff849bcb37 in clock_gettime () +#1 0x00007f78259af1d0 in clock_gettime@GLIBC_2.2.5 () from /tmp/rootdir/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libc.so.6 +#2 0x00005633ca23ebc5 in as tokio::park::Park>::park () +#3 0x00005633ca2c86dd in mayastor::main () +#4 0x00005633ca2000d6 in std::sys_common::backtrace::__rust_begin_short_backtrace () +#5 0x00005633ca2cad5f in main () +``` +{% endtab %} +{% endtabs %} + +------------- + +## Diskpool behaviour + +The below behaviour may be encountered while uprading from older releases to Mayastor 2.4 release and above. + +### Get Dsp + +Running `kubectl get dsp -n mayastor` could result in the error due to the `v1alpha1` schema in the discovery cache. To resolve this, run the command `kubectl get diskpools.openebs.io -n mayastor`. After this kubectl discovery cache will be updated with `v1beta1` object for dsp. + +### Create API + +When creating a Disk Pool with `kubectl create -f dsp.yaml`, you might encounter an error related to `v1alpha1` CR definitions. To resolve this, ensure your CR definition is updated to `v1beta1` in the YAML file (for example, `apiVersion: openebs.io/v1beta1`). + +{% hint style="note" %} +You can validate the schema changes by executing `kubectl get crd diskpools.openebs.io`. +{% endhint %} diff --git a/docs/main/additional-info/alphafeatures.md b/docs/main/user-guides/local-engine-user-guide/additional-information/alphafeatures.md similarity index 100% rename from docs/main/additional-info/alphafeatures.md rename to docs/main/user-guides/local-engine-user-guide/additional-information/alphafeatures.md diff --git a/docs/main/additional-info/faq.md b/docs/main/user-guides/local-engine-user-guide/additional-information/faq.md similarity index 100% rename from docs/main/additional-info/faq.md rename to docs/main/user-guides/local-engine-user-guide/additional-information/faq.md diff --git a/docs/main/additional-info/k8supgrades.md b/docs/main/user-guides/local-engine-user-guide/additional-information/k8supgrades.md similarity index 100% rename from docs/main/additional-info/k8supgrades.md rename to docs/main/user-guides/local-engine-user-guide/additional-information/k8supgrades.md diff --git a/docs/main/additional-info/kb.md b/docs/main/user-guides/local-engine-user-guide/additional-information/kb.md similarity index 100% rename from docs/main/additional-info/kb.md rename to docs/main/user-guides/local-engine-user-guide/additional-information/kb.md diff --git a/docs/main/additional-info/performance.md b/docs/main/user-guides/local-engine-user-guide/additional-information/performance.md similarity index 100% rename from docs/main/additional-info/performance.md rename to docs/main/user-guides/local-engine-user-guide/additional-information/performance.md diff --git a/docs/main/user-guides/prerequisites.mdx b/docs/main/user-guides/local-engine-user-guide/prerequisites.mdx similarity index 100% rename from docs/main/user-guides/prerequisites.mdx rename to docs/main/user-guides/local-engine-user-guide/prerequisites.mdx diff --git a/docs/main/user-guides/replicated-engine-user-guide/additional-information/call-home.md b/docs/main/user-guides/replicated-engine-user-guide/additional-information/call-home.md new file mode 100644 index 000000000..1c2616340 --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/additional-information/call-home.md @@ -0,0 +1,72 @@ +# Call-home metrics + +## Mayastor default information collection + +By default, Mayastor collects basic information related to the number and scale of user-deployed instances. The collected data is anonymous and is encrypted at rest. This data is used to understand storage usage trends, which in turn helps maintainers prioritize their contributions to maximize the benefit to the community as a whole. + +{% hint style="info" %} +No user-identifiable information, hostnames, passwords, or volume data are collected. **ONLY** the below-mentioned information is collected from the cluster. +{% endhint %} + +A summary of the information collected is given below: + +| **Cluster information** | +| :--- | +|**K8s cluster ID**: This is a SHA-256 hashed value of the UID of your Kubernetes cluster's `kube-system` namespace.| +|**K8s node count**: This is the number of nodes in your Kubernetes cluster.| +|**Product name**: This field displays the name Mayastor | +|**Product version**: This is the deployed version of Mayastor.| +|**Deploy namespace**: This is a SHA-256 hashed value of the name of the Kubernetes namespace where Mayastor Helm chart is deployed.| +|**Storage node count**: This is the number of nodes on which the Mayastor I/O engine is scheduled.| + +|**Pool information**| +| :--- | +|**Pool count**: This is the number of Mayastor DiskPools in your cluster.| +|**Pool maximum size**: This is the capacity of the Mayastor DiskPool with the highest capacity.| +|**Pool minimum size**: This is the capacity of the Mayastor DiskPool with the lowest capacity.| +|**Pool mean size**: This is the average capacity of the Mayastor DiskPools in your cluster.| +|**Pool capacity percentiles**: This calculates and returns the capacity distribution of Mayastor DiskPools for the 50th, 75th and the 90th percentiles.| +| **Pools created**: This is the number of successful pool creation attempts.| +| **Pools deleted**: This is the number of successful pool deletion attempts.| + +|**Volume information**| +| :--- | +|**Volume count**: This is the number of Mayastor Volumes in your cluster.| +|**Volume minimum size**: This is the capacity of the Mayastor Volume with the lowest capacity.| +|**Volume mean size**: This is the average capacity of the Mayastor Volumes in your cluster.| +|**Volume capacity percentiles**: This calculates and returns the capacity distribution of Mayastor Volumes for the 50th, 75th and the 90th percentiles.| +| **Volumes created**: This is the number of successful volume creation attempts.| +| **Volumes deleted**: This is the number of successful volume deletion attempts. | + +|**Replica Information**| +| :--- | +|**Replica count**: This is the number of Mayastor Volume replicas in your cluster.| +|**Average replica count per volume**: This is the average number of replicas each Mayastor Volume has in your cluster.| + + +### Storage location of collected data + +The collected information is stored on behalf of the OpenEBS project by DataCore Software Inc. in data centers located in Texas, USA. + +---- + +## Disable specific data collection + +To disable collection of **usage data** or generation of **events**, the following Helm command, along with the flag, can either be executed during installation or can be re-executed post-installation. + +### Disable collection of usage data + +To disable the collection of data metrics from the cluster, add the following flag to the Helm install command. + +``` +--set obs.callhome.enabled=false +``` + +### Disable generation of events data + +When eventing is enabled, NATS pods are created to gather various events from the cluster, including statistical metrics such as *pools created*. To deactivate eventing within the cluster, include the following flag in the Helm installation command. + +``` +--set eventing.enabled=false +``` + diff --git a/docs/main/user-guides/replicated-engine-user-guide/additional-information/i-o-path-description.md b/docs/main/user-guides/replicated-engine-user-guide/additional-information/i-o-path-description.md new file mode 100644 index 000000000..93b579562 --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/additional-information/i-o-path-description.md @@ -0,0 +1,167 @@ +--- +description: >- + This section provides an overview of the topology and function of the Mayastor + data plane. Developer level documentation is maintained within the project's + GitHub repository. +--- + +# I/O Path Description + +## Glossary of Terms + +### Mayastor Instance + +An instance of the `mayastor` binary running inside a Mayastor container, which is encapsulated by a Mayastor Pod. + +### Nexus + +Mayastor terminology. A data structure instantiated within a Mayastor instance which performs I/O operations for a single Mayastor volume. Each nexus acts as an NVMe controller for the volume it exports. Logically it is composed chiefly of a 'static' function table which determines its base I/O handling behaviour \(held in common with all other nexus of the cluster\), combined with configuration information specific to the Mayastor volume it [_exports_](i-o-path-description.md#export), such as the identity of its [_children_](i-o-path-description.md#child). The function of a nexus is to route I/O requests for its exported volume which are received on its host container's target to the underlying persistence layer, via any applied transformations \("data services"\), and to return responses to the calling initiator back along that same I/O path. + +### Pool / Storage Pool / Mayastor Storage Pool \(MSP\) + +Mayastor's volume management abstraction. Block devices contributing storage capacity to a Mayastor deployment do so by their inclusion within configured storage pools. Each Mayastor node can host zero or more pools and each pool can "contain" a single base block device as a member. The total capacity of the pool is therefore determined by the size of that device. Pools can only be hosted on nodes running an instance of a mayastor pod. + +Multiple volumes can share the capacity of one pool but thin provisioning is not supported. Volumes cannot span multiple pools for the purposes of creating a volume larger in size than could be accommodated by the free capacity in any one pool. + +Internally a storage pool is an implementation of an SPDK [Logical Volume Store](https://spdk.io/doc/logical_volumes.html) + +### Bdev + +A code abstraction of a block-level device to which I/O requests may be sent, presenting a consistent device-independent interface. Mayastor's bdev abstraction layer is based upon that of Intel's [Storage Performance Development Kit](https://spdk.io/) \(SPDK\). + +* **base** bdev - Handles I/O directly, e.g. a representation of a physical SSD device +* **logical volume** - A bdev representing an [SPDK Logical Volume](https://spdk.io/doc/logical_volumes.html) \("lvol bdev"\) + +### Replica + +Mayastor terminology. An lvol bdev \(a "logical volume", created within a pool and consuming pool capacity\) which is being exported by a Mayastor instance, for consumption by a nexus \(local or remote to the exporting instance\) as a "child" + +### Child + +Mayastor terminology. A NVMe controller created and owned by a given Nexus and which handles I/O downstream from the nexus' target, by routing it to a replica associated with that child. + +A nexus has a minimum of one child, which must be local \(local: exported as a replica from a pool hosted by the same mayastor instance as hosts the nexus itself\). If the Mayastor volume being exported by the nexus is derived from a StorageClass with a replication factor greater than 1 \(i.e. synchronous N-way mirroring is enabled\), then the nexus will have additional children, up to the desired number of data copies. + +### Export + +To allow the discovery of, and acceptance of I/O for, a volume by a client initiator, over a Mayastor storage target. + +## Basics of I/O Flow + +### Non-Replicated Volume I/O Path + +```text + ____________________________________________________________ +| Front-end | +| NVMe-oF | +| (user space) | +|____________________________________________________________| + | + _____________________________v______________________________ +| [Nexus] | I/O path | +| | | +| ________V________ | +| | | | | +| | NexusChild | | +| | | | +| |________|_________| | +|_____________________________|______________________________| + | + + | + ______V________ + | Replica | + | (local) | + |==== pool =====| + | | + | +----+ | + | |lvol| | + | +----+ | + |_______________| + | + ______V________ + | base bdev | + |_______________| + | + V + DISK DEVICE + e.g. /dev/sda +``` + +For volumes based on a StorageClass defined as having a replication factor of 1, a single data copy is maintained by Mayastor. The I/O path is largely \(entirely, if using malloc:/// pool devices\) constrained to within the bounds of a single mayastor instance, which hosts both the volume's nexus and the storage pool in use as its persistence layer. + +Each mayastor instance presents a user-space storage target over NVMe-oF TCP. Worker nodes mounting a Mayastor volume for a scheduled application pod to consume are directed by Mayastor's CSI driver implementation to connect to the appropriate transport target for that volume and perform discovery, after which they are able to send I/O to it, directed at the volume in question. Regardless of how many volumes, and by extension how many nexus a mayastor instance hosts, all share the same target instances. + +Application I/O received on a target for a volume is passed to the virtual bdev at the front-end of the nexus hosting that volume. In the case of a non-replicated volume, the nexus is composed of a single child, to which the I/O is necessarily routed. As a virtual bdev itself, the child handles the I/O by routing it to the next device, in this case the replica that was created for this child. In non-replicated scenarios, both the volume's nexus and the pool which hosts its replica are co-located within the same mayastor instance, hence the I/O is passed from child to replica using SPDK bdev routines, rather than a network level transport. At the pool layer, a blobstore maps the lvol bdev exported as the replica concerned to the base bdev on which the pool was constructed. From there, other than for malloc:/// devices, the I/O passes to the host kernel via either aio or io\_uring, thence via the appropriate storage driver to the physical disk device. + +The disk devices' response to the I/O request is returns back along the same path to the caller's initiator. + +### Replicated Volume I/O Path + +```text + _______________________________________________________________ _ +| Front-end | | +| NVMe-oF | | +| (user space) | | +|_______________________________________________________________| | + | | + _______________________________|_______________________________ | +| [Nexus] | I/O path | | +| ____________________|____________________ | | +| | | | | | +| ________V________ ________V________ ________V________ | | +| |child 1 | |child 2 | |child 3 | | | +| | | | | | | | | +| | NVMe-oF | | NVMe-oF | | NVMe-oF | | | +| | | | | | | | | Mayastor +| |________|________| |________|________| |________|________| | | Instance +|__________|____________________|____________________|__________| | "A" + | | | | + | + | | | | + | ______V________ | | + | | Replica | | | + | | (local) | | | + | |==== pool =====| | | + | | | | | + | | +----+ | | | + | | |lvol| | | | + | | +----+ | | | + | |_______________| | | + | | | | + | ______V________ | | + | | base bdev | | | + | |_______________| | _| + | | | + | V | + | DISK DEVICE | ] Node "A" + | | + | | + | | + | | + ______V________ _ ______V________ _ + | Replica | | | Replica | | + | (remote) | | | (remote) | | + | nvmf target | | | nvmf target | | + | | | | | | + |==== pool =====| | |==== pool =====| | + | | | Mayastor | | | Mayastor + | +----+ | | Instance | +----+ | | Instance + | |lvol| | | "B" | |lvol| | | "C" + | +----+ | | | +----+ | | + |_______________| | |_______________| | + | | | | + ______V________ | ______V________ | + | base bdev | | | base bdev | | + |_______________| _| |_______________| _| + | | + V V + DISK DEVICE ] Node "B" DISK DEVICE ] Node "C" +``` + +If the StorageClass on which a volume is based specifies a replication factor of greater than one, then a synchronous mirroring scheme is employed to maintain multiple redundant data copies. For a replicated volume, creation and configuration of the volume's nexus requires additional orchestration steps. Prior to creating the nexus, not only must a local replica be created and exported as for the non-replicated case, but the requisite count of additional remote replicas required to meet the replication factor must be created and exported from Mayastor instances other than that hosting the nexus itself. The control plane core-agent component will select appropriate pool candidates, which includes ensuring sufficient available capacity and that no two replicas are sited on the same Mayastor instance \(which would compromise availability during co-incident failures\). Once suitable replicas have been successfully exported, the control plane completes the creation and configuration of the volume's nexus, with the replicas as its children. In contrast to their local counterparts, remote replicas are exported, and so connected to by the nexus, over NVMe-F using a user-mode initiator and target implementation from the SPDK. + +Write I/O requests to the nexus are handled synchronously; the I/O is dispatched to all \(healthy\) children and only when completion is acknowledged by all is the I/O acknowledged to the calling initiator via the nexus front-end. Read I/O requests are similarly issued to all children, with just the first response returned to the caller. + + + diff --git a/docs/main/user-guides/replicated-engine-user-guide/additional-information/migrate-etcd.md b/docs/main/user-guides/replicated-engine-user-guide/additional-information/migrate-etcd.md new file mode 100644 index 000000000..6cb44f4ac --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/additional-information/migrate-etcd.md @@ -0,0 +1,139 @@ +--- +Title: Etcd Migration Procedure +--- + +By following the given steps, you can successfully migrate etcd from one node to another during maintenance activities like node drain etc., ensuring the continuity and integrity of the etcd data. + +{% hint style="note" %} +Take a snapshot of the etcd. Click [here](https://etcd.io/docs/v3.5/op-guide/recovery/) for the detailed documentation. +{% endhint %} + +## Step 1: Draining the etcd Node + +1. Assuming we have a three-node cluster with three etcd replicas, verify the etcd pods with the following commands: + +**Command to verify pods**: + +{% tabs %} +{% tab title="Command" %} + +```text +kubectl get pods -n mayastor -l app=etcd -o wide +``` +{% endtab %} +{% tab title="Output" %} + +```text +NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES +mayastor-etcd-0 1/1 Running 0 4m9s 10.244.1.212 worker-1 +mayastor-etcd-1 1/1 Running 0 5m16s 10.244.2.219 worker-2 +mayastor-etcd-2 1/1 Running 0 6m28s 10.244.3.203 worker-0 +``` +{% endtab %} +{% endtabs %} + +2. From etcd-0/1/2 we could see all the values are registered in database, once we migrated etcd to new node, all the key-value pairs should be available across all the pods. Run the following commands from any etcd pod. + +**Commands to get etcd data**: + +{% tabs %} +{% tab title="Command" %} + +```text +kubectl exec -it mayastor-etcd-0 -n mayastor -- bash +#ETCDCTL_API=3 +#etcdctl get --prefix "" +``` +{% endtab %} +{% endtabs %} + + +3. In this example, we drain the etcd node **worker-0** and migrate it to the next available node (in this case, the worker-4 node), use the following command: + +**Command to drain the node**: + +{% tabs %} +{% tab title="Command" %} + +```text +kubectl drain worker-0 --ignore-daemonsets --delete-emptydir-data +``` + +{% endtab %} +{% tab title="Output" %} + +```text +node/worker-0 cordoned +Warning: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-pbm7r, kube-system/kube-proxy-jgjs4, mayastor/mayastor-agent-ha-node-jkd4c, mayastor/mayastor-csi-node-mb89n, mayastor/mayastor-io-engine-q2n28, mayastor/mayastor-promethues-prometheus-node-exporter-v6mfs, mayastor/mayastor-promtail-6vgvm, monitoring/node-exporter-fz247 +evicting pod mayastor/mayastor-etcd-2 +evicting pod mayastor/mayastor-agent-core-7c594ff676-2ph69 +evicting pod mayastor/mayastor-operator-diskpool-c8ddb588-cgr29 +pod/mayastor-operator-diskpool-c8ddb588-cgr29 evicted +pod/mayastor-agent-core-7c594ff676-2ph69 evicted +pod/mayastor-etcd-2 evicted +node/worker-0 drained +``` +{% endtab %} +{% endtabs %} + +## Step 2: Migrating etcd to the New Node + +1. After draining the **worker-0** node, the etcd pod will be scheduled on the next available node, which is the worker-4 node. +2. The pod may end up in a **CrashLoopBackOff status** with specific errors in the logs. +3. When the pod is scheduled on the new node, it attempts to bootstrap the member again, but since the member is already registered in the cluster, it fails to start the etcd server with the error message **member already bootstrapped**. +4. To fix this issue, change the cluster's initial state from **new** to **existing** by editing the StatefulSet for etcd: + +**Command to check new etcd pod status** + +{% tabs %} +{% tab title="Command" %} + +```text +kubectl get pods -n mayastor -l app=etcd -o wide +``` +{% endtab %} +{% tab title="Output" %} + +```text +NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES +mayastor-etcd-0 1/1 Running 0 35m 10.244.1.212 worker-1 +mayastor-etcd-1 1/1 Running 0 36m 10.244.2.219 worker-2 +mayastor-etcd-2 0/1 CrashLoopBackOff 5 (44s ago) 10m 10.244.0.121 worker-4 + +``` +{% endtab %} +{% endtabs %} + +**Command to edit the StatefulSet**: + +{% tabs %} +{% tab title="Command" %} + +```text +kubectl edit sts mayastor-etcd -n mayastor +``` +{% endtab %} +{% tab title="Output" %} + +```text + - name: ETCD_INITIAL_CLUSTER_STATE + value: existing +``` +{% endtab %} +{% endtabs %} + +## Step 3: Validating etcd Key-Value Pairs + +Run the appropriate command from the migrated etcd pod to validate the key-value pairs and ensure they are the same as in the existing etcd. This step is crucial to avoid any data loss during the migration process. + + +{% tabs %} +{% tab title="Command" %} + +```text +kubectl exec -it mayastor-etcd-0 -n mayastor -- bash +#ETCDCTL_API=3 +#etcdctl get --prefix "" +``` +{% endtab %} +{% endtabs %} \ No newline at end of file diff --git a/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-distributed-db/distributeddb-backup.md b/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-distributed-db/distributeddb-backup.md new file mode 100644 index 000000000..cfe26388f --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-distributed-db/distributeddb-backup.md @@ -0,0 +1,115 @@ +# Steps to take a Backup from cStor for Distributed DBs (Cassandra) + +## Step 1: Backup from CStor Cluster + +In the current setup, we have a CStor cluster serving as the source, with Cassandra running as a StatefulSet, utilizing CStor volumes. + +{% tabs %} +{% tab title="Command" %} +```text +kubectl get pods -n cassandra +``` +{% endtab %} + +{% tab title="Example Output" %} +```text +NAME READY STATUS RESTARTS AGE +cassandra-0 1/1 Running 0 6m22s +cassandra-1 1/1 Running 0 4m23s +cassandra-2 1/1 Running 0 2m15s +``` +{% endtab %} +{% endtabs %} + +{% tabs %} +{% tab title="Command" %} +```text +kubectl get pvc -n cassandra +``` +{% endtab %} + +{% tab title="Example Output" %} +```text +NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE +data-cassandra-0 Bound pvc-05c464de-f273-4d04-b915-600bc434d762 3Gi RWO cstor-csi-disk 6m37s +data-cassandra-1 Bound pvc-a7ac4af9-6cc9-4722-aee1-b8c9e1c1f8c8 3Gi RWO cstor-csi-disk 4m38s +data-cassandra-2 Bound pvc-0980ea22-0b4b-4f02-bc57-81c4089cf55a 3Gi RWO cstor-csi-disk 2m30s +``` +{% endtab %} +{% endtabs %} + +{% tabs %} +{% tab title="Command" %} +```text +kubectl get cvc -n openebs +``` +{% endtab %} + +{% tab title="Example Output" %} +```text +NAME CAPACITY STATUS AGE +pvc-05c464de-f273-4d04-b915-600bc434d762 3Gi Bound 6m47s +pvc-0980ea22-0b4b-4f02-bc57-81c4089cf55a 3Gi Bound 2m40s +pvc-a7ac4af9-6cc9-4722-aee1-b8c9e1c1f8c8 3Gi Bound 4m48s +``` +{% endtab %} +{% endtabs %} + + +## Step 2: Velero Installation + +To initiate Velero, execute the following command: + +``` +velero install --use-node-agent --provider gcp --plugins velero/velero-plugin-for-gcp:v1.6.0 --bucket velero-backup-datacore --secret-file ./credentials-velero --uploader-type restic +``` + +Verify the Velero namespace for Node Agent and Velero pods: + + +``` +kubectl get pods -n velero +``` + +## Step 3: Create a Sample Database + +In this example, we create a new database with sample data in Cassandra, a distributed database. + +![](https://hackmd.io/_uploads/ryvcoj-l6.png) + + +The data is distributed across all replication instances. + +![](https://hackmd.io/_uploads/ryzoojZgT.png) + +## Step 4: Taking Velero Backup + +Cassandra is a distributed wide-column store database running in clusters called **rings**. Each node in a Cassandra ring stores some data ranges and replicates others for scaling and fault tolerance. To back up Cassandra, we must back up all three volumes and restore them at the destination. + +Velero offers two approaches for discovering pod volumes to back up using File System Backup (FSB): +- **Opt-in Approach**: Annotate every pod containing a volume to be backed up with the volume's name. +- **Opt-out Approach**: Back up all pod volumes using FSB, with the option to exclude specific volumes. + +**Opt-in**: + +In this case, we opt-in all Cassandra pods and volumes for backup: + +``` +kubectl -n cassandra annotate pod/cassandra-0 backup.velero.io/backup-volumes=data +kubectl -n cassandra annotate pod/cassandra-1 backup.velero.io/backup-volumes=data +kubectl -n cassandra annotate pod/cassandra-2 backup.velero.io/backup-volumes=data +``` + +To perform the backup, run the following command: + +``` +velero backup create cassandra-backup-19-09-23 --include-namespaces cassandra --default-volumes-to-fs-backup --wait +``` + +Check the backup status, run the following command: + +``` +velero get backup | grep cassandra-backup-19-09-23 +``` + + diff --git a/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-distributed-db/distributeddb-overview.md b/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-distributed-db/distributeddb-overview.md new file mode 100644 index 000000000..1b68bda7e --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-distributed-db/distributeddb-overview.md @@ -0,0 +1,12 @@ +# Migration from CStor to Mayastor for Distributed Databases (Cassandra) + + +This documentation outlines the process of migrating application volumes from CStor to Mayastor. We will leverage Velero for backup and restoration, facilitating the transition from a CStor cluster to a Mayastor cluster. This example specifically focuses on a Google Kubernetes Engine (GKE) cluster. + +**Velero Support**: Velero supports the backup and restoration of Kubernetes volumes attached to pods through File System Backup (FSB) or Pod Volume Backup. This process involves using modules from popular open-source backup tools like Restic (which we will utilize). + +- For **cloud provider plugins**, refer to the [Velero Docs - Providers section](https://velero.io/docs/main/supported-providers/). +- **Velero GKE Configuration (Prerequisites)**: You can find the prerequisites and configuration details for Velero in a Google Kubernetes Engine (GKE) environment on the GitHub [here](https://github.com/vmware-tanzu/velero-plugin-for-gcp#setup). +- **Object Storage Requirement**: To store backups, Velero necessitates an object storage bucket. In our case, we utilize a Google Cloud Storage (GCS) bucket. Configuration details and setup can be found on the GitHub [here](https://github.com/vmware-tanzu/velero-plugin-for-gcp#setup). +- **Velero Basic Installation**: For a step-by-step guide on the basic installation of Velero, refer to the [Velero Docs - Basic Install section](https://velero.io/docs/v1.11/basic-install/). + diff --git a/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-distributed-db/distributeddb-restore.md b/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-distributed-db/distributeddb-restore.md new file mode 100644 index 000000000..a53084a62 --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-distributed-db/distributeddb-restore.md @@ -0,0 +1,345 @@ +# Steps to Restore from cStor Backup to Mayastor for Replicated DBs (Cassandra) + +Cassandra is a popular NoSQL database used for handling large amounts of data with high availability and scalability. In Kubernetes environments, managing and restoring Cassandra backups efficiently is crucial. In this article, we'll walk you through the process of restoring a Cassandra database in a Kubernetes cluster using Velero, and we'll change the storage class to Mayastor for improved performance. + + +{% hint style="info" %} +Before you begin, make sure you have the following: +- Access to a Kubernetes cluster with Velero installed. +- A backup of your Cassandra database created using Velero. +- Mayastor configured in your Kubernetes environment. +{% endhint %} + +## Step 1: Set Up Kubernetes Credentials and Install Velero + +Set up your Kubernetes cluster credentials for the target cluster where you want to restore your Cassandra database. +Use the same values for the BUCKET-NAME and SECRET-FILENAME placeholders that you used during the initial Velero installation. This ensures that Velero has the correct credentials to access the previously saved backups. +Use the gcloud command if you are using Google Kubernetes Engine (GKE) as shown below: + +``` +gcloud container clusters get-credentials CLUSTER_NAME --zone ZONE --project PROJECT_NAME +``` + +Install Velero with the necessary plugins, specifying your backup bucket, secret file, and uploader type. Make sure to replace the placeholders with your specific values: + + +``` +velero get backup | grep YOUR_BACKUP_NAME +``` + +## Step 2: Verify Backup Availability and Check BackupStorageLocation Status + +Confirm that your Cassandra backup is available in Velero. This step ensures that there are no credentials or bucket mismatches: + +``` +velero get backup | grep YOUR_BACKUP_NAME +``` + +Check the status of the BackupStorageLocation to ensure it's available: + +``` +kubectl get backupstoragelocation -n velero +``` + +## Step 3: Create a Restore Request + +Create a Velero restore request for your Cassandra backup: + +``` +velero restore create RESTORE_NAME --from-backup YOUR_BACKUP_NAME +``` + +## Step 4: Monitor Restore Progress + +Monitor the progress of the restore operation using the below commands. +Velero initiates the restore process by creating an initialization container within the application pod. This container is responsible for restoring the volumes from the backup. As the restore operation proceeds, you can track its status, which typically transitions from **in progress** to **Completed**. + + + +In this scenario, the storage class for the PVCs remains as `cstor-csi-disk` since these PVCs were originally imported from a cStor volume. + +{% hint style="note" %} +Your storage class was originally set to `cstor-csi-disk` because you imported this PVC from a cStor volume, the status might temporarily stay as **In Progress** and your PVC will be in **Pending** status. +{% endhint %} + + +``` +velero get restore | grep RESTORE_NAME +``` +Inspect the status of the PVCs in the cassandra namespace: + +``` +kubectl get pvc -n cassandra +``` +``` +kubectl get pods -n cassandra +``` + + +## Step 5: Back Up PVC YAML + +Create a backup of the Persistent Volume Claims (PVCs) and then modify their storage class to `mayastor-single-replica`. + +{% hint style="note" %} +The statefulset for Cassandra will still have the `cstor-csi-disk` storage class at this point. This will be addressed in the further steps. +{% endhint %} + +``` +kubectl get pvc -n cassandra -o yaml > cassandra_pvc_19-09.yaml +``` + +``` +ls -lrt | grep cassandra_pvc_19-09.yaml +``` + +Edit the PVC YAML to change the storage class to `mayastor-single-replica`. You can use the provided example YAML snippet and apply it to your PVCs. + +``` +apiVersion: v1 +items: +- apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + finalizers: + - kubernetes.io/pvc-protection + labels: + app.kubernetes.io/instance: cassandra + app.kubernetes.io/name: cassandra + velero.io/backup-name: cassandra-backup-19-09-23 + velero.io/restore-name: cassandra-restore-19-09-23 + name: data-cassandra-0 + namespace: cassandra + spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 3Gi + storageClassName: mayastor-single-replica + volumeMode: Filesystem +- apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + finalizers: + - kubernetes.io/pvc-protection + labels: + app.kubernetes.io/instance: cassandra + app.kubernetes.io/name: cassandra + velero.io/backup-name: cassandra-backup-19-09-23 + velero.io/restore-name: cassandra-restore-19-09-23 + name: data-cassandra-1 + namespace: cassandra + spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 3Gi + storageClassName: mayastor-single-replica + volumeMode: Filesystem +- apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + finalizers: + - kubernetes.io/pvc-protection + labels: + app.kubernetes.io/instance: cassandra + app.kubernetes.io/name: cassandra + velero.io/backup-name: cassandra-backup-19-09-23 + velero.io/restore-name: cassandra-restore-19-09-23 + name: data-cassandra-2 + namespace: cassandra + spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 3Gi + storageClassName: mayastor-single-replica + volumeMode: Filesystem +kind: List +metadata: + resourceVersion: "" +``` + +## Step 6: Delete and Recreate PVCs + +Delete the pending PVCs and apply the modified PVC YAML to recreate them with the new storage class: + +``` +kubectl delete pvc PVC_NAMES -n cassandra +``` + +``` +kubectl apply -f cassandra_pvc.yaml -n cassandra +``` + +## Step 7: Observe Velero Init Container and Confirm Restore + +Observe the Velero init container as it restores the volumes for the Cassandra pods. This process ensures that your data is correctly recovered. + +``` +Events: + Type Reason Age From Message + ---- ------ ---- ---- ------- + Warning FailedScheduling 8m37s default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling. + Warning FailedScheduling 8m36s default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling. + Warning FailedScheduling 83s default-scheduler 0/3 nodes are available: 3 persistentvolumeclaim "data-cassandra-0" not found. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling. + Warning FailedScheduling 65s default-scheduler running PreFilter plugin "VolumeBinding": %!!(MISSING)w() + Normal Scheduled 55s default-scheduler Successfully assigned cassandra/cassandra-0 to gke-mayastor-pool-2acd09ca-4v3z + Normal NotTriggerScaleUp 3m34s (x31 over 8m35s) cluster-autoscaler pod didn't trigger scale-up: + Normal SuccessfulAttachVolume 55s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-bf8a2fb7-8ddb-4e53-aa48-f8bbf2064b41" + Normal Pulled 47s kubelet Container image "velero/velero-restore-helper:v1.11.1" already present on machine + Normal Created 47s kubelet Created container restore-wait + Normal Started 47s kubelet Started container restore-wait + Normal Pulled 41s kubelet Container image "docker.io/bitnami/cassandra:4.1.3-debian-11-r37" already present on machine + Normal Created 41s kubelet Created container cassandra + Normal Started 41s kubelet Started container cassandra +``` + +Run this command to check the restore status: + +``` +velero get restore | grep cassandra-restore-19-09-23 +``` + +Run this command to check if all the pods are running: + +``` +kubectl get pods -n cassandra +``` + +## Step 8: Verify Cassandra Data and StatefulSet + +### Access a Cassandra pod using cqlsh and check the data + +- You can use the following command to access the Cassandra pods. This command establishes a connection to the Cassandra database running on pod `cassandra-1`: + +``` +cqlsh -u -p cassandra-1.cassandra-headless.cassandra.svc.cluster.local 9042 +``` +- The query results should display the data you backed up from cStor. In your output, you're expecting to see the data you backed up. + +``` +cassandra@cqlsh> USE openebs; +``` + +``` +cassandra@cqlsh:openebs> select * from openebs.data; +``` + +``` + replication | appname | volume +-------------+-----------+-------- + 3 | cassandra | cStor + +(1 rows) +``` + +- After verifying the data, you can exit the Cassandra shell by typing `exit`. + + + + + + + +### Modify your Cassandra StatefulSet YAML to use the mayastor-single-replica storage class + +- Before making changes to the Cassandra StatefulSet YAML, create a backup to preserve the existing configuration by running the following command: + +``` +kubectl get sts cassandra -n cassandra -o yaml > cassandra_sts_backup.yaml +``` + +- You can modify the Cassandra StatefulSet YAML to change the storage class to `mayastor-single-replica`. Here's the updated YAML: +``` +apiVersion: apps/v1 +kind: StatefulSet +metadata: + annotations: + meta.helm.sh/release-name: cassandra + meta.helm.sh/release-namespace: cassandra + labels: + app.kubernetes.io/instance: cassandra + app.kubernetes.io/managed-by: Helm + app.kubernetes.io/name: cassandra + helm.sh/chart: cassandra-10.5.3 + velero.io/backup-name: cassandra-backup-19-09-23 + velero.io/restore-name: cassandra-restore-19-09-23 + name: cassandra + namespace: cassandra +spec: + podManagementPolicy: OrderedReady + replicas: 3 + revisionHistoryLimit: 10 + selector: + matchLabels: + app.kubernetes.io/instance: cassandra + app.kubernetes.io/name: cassandra + serviceName: cassandra-headless + template: + # ... (rest of the configuration remains unchanged) + updateStrategy: + type: RollingUpdate + volumeClaimTemplates: + - apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + creationTimestamp: null + labels: + app.kubernetes.io/instance: cassandra + app.kubernetes.io/name: cassandra + name: data + spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 3Gi + storageClassName: mayastor-single-replica # Change the storage class here + volumeMode: Filesystem +``` + +- Apply the modified YAML to make the changes take effect: + +``` +kubectl apply -f cassandra_sts_modified.yaml +``` + + + +### Delete the Cassandra StatefulSet with the --cascade=orphan flag + +Delete the Cassandra StatefulSet while keeping the pods running without controller management: + +``` +kubectl delete sts cassandra -n cassandra --cascade=orphan +``` + + + +### Recreate the Cassandra StatefulSet using the updated YAML + +- Use the kubectl apply command to apply the modified StatefulSet YAML configuration file, ensuring you are in the correct namespace where your Cassandra deployment resides. Replace with the actual path to your YAML file. + +``` +kubectl apply -f -n cassandra +``` + +- To check the status of the newly created StatefulSet, run: + +``` +kubectl get sts -n cassandra +``` + +- To confirm that the pods are running and managed by the controller, run: + +``` +kubectl get pods -n cassandra +``` + + + + + diff --git a/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-replicated-db/replicateddb-backup.md b/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-replicated-db/replicateddb-backup.md new file mode 100644 index 000000000..48712462e --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-replicated-db/replicateddb-backup.md @@ -0,0 +1,204 @@ +# Steps to take a Backup from cStor for Replicated DB (Mongo) + +{% hint style="note" %} +If you are deploying databases using operators, you need to find a way to actively modify the entire deployment through the operator. This ensures that you control and manage changes effectively within the operator-driven database deployment. +{% endhint %} + +## Step 1: Backup from cStor Cluster + +Currently, we have a cStor cluster as the source, with a clustered MongoDB running as a StatefulSet using cStor volumes. + + +{% tabs %} +{% tab title="Command" %} +```text +kubectl get pods +``` +{% endtab %} + +{% tab title="Output" %} +```text +NAME READY STATUS RESTARTS AGE +mongo-client-758ddd54cc-h2gwl 1/1 Running 0 47m +mongod-0 1/1 Running 0 47m +mongod-1 1/1 Running 0 44m +mongod-2 1/1 Running 0 42m +ycsb-775fc86c4b-kj5vv 1/1 Running 0 47m +``` +{% endtab %} +{% endtabs %} + + +{% tabs %} +{% tab title="Command" %} +```text +kubectl get pvc +``` +{% endtab %} + +{% tab title="Output" %} +```text +NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE +mongodb-persistent-storage-claim-mongod-0 Bound pvc-cb115a0b-07f4-4912-b686-e160e8a0690d 3Gi RWO cstor-csi-disk 54m +mongodb-persistent-storage-claim-mongod-1 Bound pvc-c9214764-7670-4cda-87e3-82f0bc59d8c7 3Gi RWO cstor-csi-disk 52m +mongodb-persistent-storage-claim-mongod-2 Bound pvc-fc1f7ed7-d99e-40c7-a9b7-8d6244403a3e 3Gi RWO cstor-csi-disk 50m +``` +{% endtab %} +{% endtabs %} + +{% tabs %} +{% tab title="Command" %} +```text +kubectl get cvc -n openebs +``` +{% endtab %} + +{% tab title="Output" %} +```text +NAME CAPACITY STATUS AGE +pvc-c9214764-7670-4cda-87e3-82f0bc59d8c7 3Gi Bound 53m +pvc-cb115a0b-07f4-4912-b686-e160e8a0690d 3Gi Bound 55m +pvc-fc1f7ed7-d99e-40c7-a9b7-8d6244403a3e 3Gi Bound 50m +``` +{% endtab %} +{% endtabs %} + + +## Step 2: Install Velero + +{% hint style="note" %} +For the prerequisites, refer to the [overview](migration-cstor-mayastor\replicateddb-overview.md) section. +{% endhint %} + +Run the following command to install Velero: + +{% tabs %} +{% tab title="Command" %} +```text +velero install --use-node-agent --provider gcp --plugins velero/velero-plugin-for-gcp:v1.6.0 --bucket velero-backup-datacore --secret-file ./credentials-velero --uploader-type restic +``` +{% endtab %} + +{% tab title="Output" %} +```text +[Installation progress output] +``` +{% endtab %} +{% endtabs %} + +Verify the Velero namespace for Node Agent and Velero pods: + +{% tabs %} +{% tab title="Command" %} +```text +kubectl get pods -n velero +``` +{% endtab %} + +{% tab title="Output" %} +```text +NAME READY STATUS RESTARTS AGE +node-agent-cwkrn 1/1 Running 0 43s +node-agent-qg6hd 1/1 Running 0 43s +node-agent-v6xbk 1/1 Running 0 43s +velero-56c45f5c64-4hzn7 1/1 Running 0 43s +``` +{% endtab %} +{% endtabs %} + + + + +## Step 3: Data Validation + +On the Primary Database (mongo-0) you can see some sample data. + +![](https://hackmd.io/_uploads/HkNDm0CJa.png) + +You can also see the data available on the replicated secondary databases. + +![](https://hackmd.io/_uploads/H1aKmRCkT.png) + + +## Step 4: Take Velero Backup + +MongoDB uses replication, and data partitioning (sharding) for high availability and scalability. Taking a backup of the primary database is enough as the data gets replicated to the secondary databases. Restoring both primary and secondary at the same time can cause data corruption. + +For reference: [MongoDB Backup and Restore Error Using Velero](https://www.mongodb.com/community/forums/t/mongodb-backup-and-restore-error-using-velero-and-minio-on-premise-kubernetes-cluster/223683/3) + +Velero supports two approaches for discovering pod volumes to be backed up using FSB: + +1. **Opt-in approach**: Annotate pods containing volumes to be backed up. +2. **Opt-out approach**: Backup all pod volumes with the ability to opt-out specific volumes. + +### Opt-In for Primary MongoDB Pod: + +To ensure that our primary MongoDB pod, which receives writes and replicates data to secondary pods, is included in the backup, we need to annotate it as follows: + +``` +kubectl annotate pod/mongod-0 backup.velero.io/backup-volumes=mongodb-persistent-storage-claim +``` + +### Opt-Out for Secondary MongoDB Pods and PVCs: + +To exclude secondary MongoDB pods and their associated Persistent Volume Claims (PVCs) from the backup, we can label them as follows: + +``` +kubectl label pod mongod-1 velero.io/exclude-from-backup=true +pod/mongod-1 labeled +``` + +``` +kubectl label pod mongod-2 velero.io/exclude-from-backup=true +pod/mongod-2 labeled +``` + +``` +kubectl label pvc mongodb-persistent-storage-claim-mongod-1 velero.io/exclude-from-backup=true +persistentvolumeclaim/mongodb-persistent-storage-claim-mongod-1 labeled +``` + +``` +kubectl label pvc mongodb-persistent-storage-claim-mongod-2 velero.io/exclude-from-backup=true +persistentvolumeclaim/mongodb-persistent-storage-claim-mongod-2 labeled +``` + +### Backup Execution: + +Create a backup of the entire namespace. If any other applications run in the same namespace as MongoDB, we can exclude them from the backup using labels or flags from the Velero CLI: + +{% tabs %} +{% tab title="Command" %} +```text +velero backup create mongo-backup-13-09-23 --include-namespaces default --default-volumes-to-fs-backup --wait +``` +{% endtab %} + +{% tab title="Output" %} +```text +Backup request "mongo-backup-13-09-23" submitted successfully. +Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup will continue in the background. +........... +Backup completed with status: Completed. You may check for more information using the commands `velero backup describe mongo-backup-13-09-23` and `velero backup logs mongo-backup-13-09-23`. +``` +{% endtab %} +{% endtabs %} + +### Backup Verification: + +To check the status of the backup using the Velero CLI, you can use the following command. If the backup fails for any reason, you can inspect the logs with the velero backup logs command: + + +{% tabs %} +{% tab title="Command" %} +```text +velero get backup | grep 13-09-23 +``` +{% endtab %} + +{% tab title="Output" %} +```text +mongo-backup-13-09-23 Completed 0 0 2023-09-13 13:15:32 +0000 UTC 29d default +``` +{% endtab %} +{% endtabs %} \ No newline at end of file diff --git a/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-replicated-db/replicateddb-overview.md b/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-replicated-db/replicateddb-overview.md new file mode 100644 index 000000000..6b2595dc1 --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-replicated-db/replicateddb-overview.md @@ -0,0 +1,16 @@ +# Migrating from CStor to Mayastor for Replicated Databases (MongoDB) + + +This documentation provides a comprehensive guide on migrating CStor application volumes to Mayastor. We utilize Velero for the backup and restoration process, enabling a seamless transition from a CStor cluster to Mayastor. This example specifically focuses on a GKE cluster. + +Velero offers support for the backup and restoration of Kubernetes volumes attached to pods directly from the volume's file system. This is known as File System Backup (FSB) or Pod Volume Backup. The data movement is facilitated through the use of modules from free, open-source backup tools such as Restic (which is the tool of choice in this guide). + +- For cloud providers, you can find the necessary plugins [here](https://velero.io/docs/main/supported-providers/). +- For detailed Velero GKE configuration prerequisites, refer to [this resource](https://github.com/vmware-tanzu/velero-plugin-for-gcp#setup). +- It's essential to note that Velero requires an object storage bucket for storing backups, and in our case, we use a [Google Cloud Storage (GCS) bucket](https://github.com/vmware-tanzu/velero-plugin-for-gcp#setup). +- For detailed instructions on Velero basic installation, visit https://velero.io/docs/v1.11/basic-install/. + + + + + diff --git a/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-replicated-db/replicateddb-restore.md b/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-replicated-db/replicateddb-restore.md new file mode 100644 index 000000000..9854ddb27 --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/additional-information/migration-for-replicated-db/replicateddb-restore.md @@ -0,0 +1,513 @@ +# Steps to Restore from cStor Backup to Mayastor for Replicated DBs (Mongo) + +{% hint style=“info” %} +Before you begin, make sure you have the following: +- Access to a Kubernetes cluster with Velero installed. +- A backup of your Mongo database created using Velero. +- Mayastor configured in your Kubernetes environment. +{% endhint %} + +## Step 1: Install Velero with GCP Provider on Destination (Mayastor Cluster) + +Install Velero with the GCP provider, ensuring you use the same values for the `BUCKET-NAME` and `SECRET-FILENAME` placeholders that you used originally. These placeholders should be replaced with your specific values: + +{% tabs %} +{% tab title="Command" %} +```text +velero install --use-node-agent --provider gcp --plugins velero/velero-plugin-for-gcp:v1.6.0 --bucket BUCKET-NAME --secret-file SECRET-FILENAME --uploader-type restic +``` +{% endtab %} + +{% tab title="Output" %} +```text +CustomResourceDefinition/backuprepositories.velero.io: attempting to create resource +CustomResourceDefinition/backuprepositories.velero.io: attempting to create resource client +CustomResourceDefinition/backuprepositories.velero.io: created +CustomResourceDefinition/backups.velero.io: attempting to create resource +CustomResourceDefinition/backups.velero.io: attempting to create resource client +CustomResourceDefinition/backups.velero.io: created +CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource +CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource client +CustomResourceDefinition/backupstoragelocations.velero.io: created +CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource +CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource client +CustomResourceDefinition/deletebackuprequests.velero.io: created +CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource +CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource client +CustomResourceDefinition/downloadrequests.velero.io: created +CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource +CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource client +CustomResourceDefinition/podvolumebackups.velero.io: created +CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource +CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource client +CustomResourceDefinition/podvolumerestores.velero.io: created +CustomResourceDefinition/restores.velero.io: attempting to create resource +CustomResourceDefinition/restores.velero.io: attempting to create resource client +CustomResourceDefinition/restores.velero.io: created +CustomResourceDefinition/schedules.velero.io: attempting to create resource +CustomResourceDefinition/schedules.velero.io: attempting to create resource client +CustomResourceDefinition/schedules.velero.io: created +CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource +CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource client +CustomResourceDefinition/serverstatusrequests.velero.io: created +CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource +CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource client +CustomResourceDefinition/volumesnapshotlocations.velero.io: created +Waiting for resources to be ready in cluster... +Namespace/velero: attempting to create resource +Namespace/velero: attempting to create resource client +Namespace/velero: created +ClusterRoleBinding/velero: attempting to create resource +ClusterRoleBinding/velero: attempting to create resource client +ClusterRoleBinding/velero: created +ServiceAccount/velero: attempting to create resource +ServiceAccount/velero: attempting to create resource client +ServiceAccount/velero: created +Secret/cloud-credentials: attempting to create resource +Secret/cloud-credentials: attempting to create resource client +Secret/cloud-credentials: created +BackupStorageLocation/default: attempting to create resource +BackupStorageLocation/default: attempting to create resource client +BackupStorageLocation/default: created +VolumeSnapshotLocation/default: attempting to create resource +VolumeSnapshotLocation/default: attempting to create resource client +VolumeSnapshotLocation/default: created +Deployment/velero: attempting to create resource +Deployment/velero: attempting to create resource client +Deployment/velero: created +DaemonSet/node-agent: attempting to create resource +DaemonSet/node-agent: attempting to create resource client +DaemonSet/node-agent: created +Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status. +thulasiraman_ilangovan@cloudshell:~$ +``` +{% endtab %} +{% endtabs %} + + + +## Step 2: Verify Backup Availability + +Check the availability of your previously-saved backups. If the credentials or bucket information doesn't match, you won't be able to see the backups: + + +{% tabs %} +{% tab title="Command" %} +```text +velero get backup | grep 13-09-23 +``` +{% endtab %} + +{% tab title="Output" %} +```text +mongo-backup-13-09-23 Completed 0 0 2023-09-13 13:15:32 +0000 UTC 29d default +``` +{% endtab %} +{% endtabs %} + +{% tabs %} +{% tab title="Command" %} +```text +kubectl get backupstoragelocation -n velero +``` +{% endtab %} + +{% tab title="Output" %} +```text +NAME PHASE LAST VALIDATED AGE DEFAULT +default Available 23s 3m32s true +``` +{% endtab %} +{% endtabs %} + + + +## Step 3: Restore Using Velero CLI + +Initiate the restore process using Velero CLI with the following command: + +{% tabs %} +{% tab title="Command" %} +```text +velero restore create mongo-restore-13-09-23 --from-backup mongo-backup-13-09-23 +``` +{% endtab %} + +{% tab title="Output" %} +```text +Restore request "mongo-restore-13-09-23" submitted successfully. +Run `velero restore describe mongo-restore-13-09-23` or `velero restore logs mongo-restore-13-09-23` for more details. +``` +{% endtab %} +{% endtabs %} + + + + +## Step 4: Check Restore Status + +You can check the status of the restore process by using the `velero get restore` command. + +``` +velero get restore +``` + +When Velero performs a restore, it deploys an init container within the application pod, responsible for restoring the volume. Initially, the restore status will be `InProgress`. + +{% hint style="note" %} +Your storage class was originally set to `cstor-csi-disk` because you imported this PVC from a cStor volume, the status might temporarily stay as **In Progress** and your PVC will be in **Pending** status. +{% endhint %} + + + +## Step 5: Backup PVC and Change Storage Class + +- Retrieve the current configuration of the PVC which is in `Pending` status using the following command: + +``` +kubectl get pvc mongodb-persistent-storage-claim-mongod-0 -o yaml > pvc-mongo.yaml +``` + +- Confirm that the PVC configuration has been saved by checking its existence with this command: + +``` +ls -lrt | grep pvc-mongo.yaml +``` + +- Edit the `pvc-mongo.yaml` file to update its storage class. Below is the modified PVC configuration with `mayastor-single-replica` set as the new storage class: + +{% hint style="note" %} +The statefulset for Mongo will still have the `cstor-csi-disk` storage class at this point. This will be addressed in the further steps. +{% endhint %} + + +``` +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + finalizers: + - kubernetes.io/pvc-protection + labels: + role: mongo + velero.io/backup-name: mongo-backup-13-09-23 + velero.io/restore-name: mongo-restore-13-09-23 + name: mongodb-persistent-storage-claim-mongod-0 + namespace: default +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 3Gi + storageClassName: mayastor-single-replica + volumeMode: Filesystem +``` + +## Step 6: Resolve issue where PVC is in a Pending + +- Begin by deleting the problematic PVC with the following command: + +``` +kubectl delete pvc mongodb-persistent-storage-claim-mongod-0 +``` + +- Once the PVC has been successfully deleted, you can recreate it using the updated configuration from the `pvc-mongo.yaml` file. Apply the new configuration with the following command: + +``` +kubectl apply -f pvc-mongo.yaml +``` + +## Step 7: Check Velero init container + +After recreating the PVC with Mayastor storageClass, you will observe the presence of a Velero initialization container within the application pod. This container is responsible for restoring the required volumes. + +You can check the status of the restore operation by running the following command: + +``` +kubectl describe pod +``` + +![](https://hackmd.io/_uploads/rk1fbgJep.png) + + +The output will display the pods' status, including the Velero initialization container. Initially, the status might show as "Init:0/1," indicating that the restore process is in progress. + +You can track the progress of the restore by running: + +{% tabs %} +{% tab title="Command" %} +```text +velero get restore +``` +{% endtab %} + +{% tab title="Output" %} +```text +NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR +mongo-restore-13-09-23 mongo-backup-13-09-23 Completed 2023-09-13 13:56:19 +0000 UTC 2023-09-13 14:06:09 +0000 UTC 0 4 2023-09-13 13:56:19 +0000 UTC +``` +{% endtab %} +{% endtabs %} + +You can then verify the data restoration by accessing your MongoDB instance. In the provided example, we used the "mongosh" shell to connect to the MongoDB instance and check the databases and their content. The data should reflect what was previously backed up from the cStor storage. + +``` +mongosh mongodb://admin:admin@mongod-0.mongodb-service.default.svc.cluster.local:27017 +``` + +## Step 8: Monitor Pod Progress + +Due to the statefulset's configuration with three replicas, you will notice that the `mongo-1` pod is created but remains in a `Pending` status. This behavior is expected as we have the storage class set to cStor in statefulset configuration. + +## Step 9: Capture the StatefulSet Configuration and Modify Storage Class + + +Capture the current configuration of the StatefulSet for MongoDB by running the following command: + +``` +kubectl get sts mongod -o yaml > sts-mongo-original.yaml +``` + +This command will save the existing StatefulSet configuration to a file named `sts-mongo-original.yaml`. Next, edit this YAML file to change the storage class to `mayastor-single-replica`. + +``` +apiVersion: apps/v1 +kind: StatefulSet +metadata: + annotations: + backup.velero.io/backup-volumes: mongodb-persistent-storage-claim + meta.helm.sh/release-name: mongo + meta.helm.sh/release-namespace: default + generation: 1 + labels: + app.kubernetes.io/managed-by: Helm + velero.io/backup-name: mongo-backup-13-09-23 + velero.io/restore-name: mongo-restore-13-09-23 + name: mongod + namespace: default +spec: + podManagementPolicy: OrderedReady + replicas: 3 + revisionHistoryLimit: 10 + selector: + matchLabels: + role: mongo + serviceName: mongodb-service + template: + metadata: + creationTimestamp: null + labels: + environment: test + replicaset: rs0 + role: mongo + spec: + containers: + - command: + - mongod + - --bind_ip + - 0.0.0.0 + - --replSet + - rs0 + env: + - name: MONGO_INITDB_ROOT_USERNAME + valueFrom: + secretKeyRef: + key: username + name: secrets + - name: MONGO_INITDB_ROOT_PASSWORD + valueFrom: + secretKeyRef: + key: password + name: secrets + image: mongo:latest + imagePullPolicy: Always + lifecycle: + postStart: + exec: + command: + - /bin/sh + - -c + - sleep 90 ; ./tmp/scripts/script.sh > /tmp/script-log + name: mongod-container + ports: + - containerPort: 27017 + protocol: TCP + resources: {} + terminationMessagePath: /dev/termination-log + terminationMessagePolicy: File + volumeMounts: + - mountPath: /data/db + name: mongodb-persistent-storage-claim + - mountPath: /tmp/scripts + name: mongo-scripts + dnsPolicy: ClusterFirst + restartPolicy: Always + schedulerName: default-scheduler + securityContext: {} + terminationGracePeriodSeconds: 10 + volumes: + - configMap: + defaultMode: 511 + name: mongo-replica + name: mongo-scripts + updateStrategy: + type: RollingUpdate + volumeClaimTemplates: + - apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + annotations: + volume.beta.kubernetes.io/storage-class: mayastor-single-replica #Make the change here + creationTimestamp: null + name: mongodb-persistent-storage-claim + spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 3Gi + volumeMode: Filesystem + +``` + + + + + +## Step 10: Delete StatefulSet (Cascade=False) + +Delete the StatefulSet while preserving the pods with the following command: + +``` +kubectl delete sts mongod --cascade=false +``` + +You can run the following commands to verify the status: + +``` +kubectl get sts +``` + +``` +kubectl get pods +``` + +``` +kubectl get pvc +``` + +## Step 11: Deleting Pending Secondary Pods and PVCs + +Delete the MongoDB Pod `mongod-1`. + +``` +kubectl delete pod mongod-1 +``` + +Delete the Persistent Volume Claim (PVC) for `mongod-1`. + +``` +kubectl delete pvc mongodb-persistent-storage-claim-mongod-1 +``` + +## Step 12: Recreate StatefulSet + +Recreate the StatefulSet with the Yaml file. + +{% tabs %} +{% tab title="Command" %} +```text +kubectl apply -f sts-mongo-mayastor.yaml +``` +{% endtab %} + +{% tab title="Output" %} +```text +statefulset.apps/mongod created +``` +{% endtab %} +{% endtabs %} + +{% tabs %} +{% tab title="Command" %} +```text +kubectl get pods +``` +{% endtab %} + +{% tab title="Output" %} +```text +NAME READY STATUS RESTARTS AGE +mongo-client-758ddd54cc-h2gwl 1/1 Running 0 31m +mongod-0 1/1 Running 0 31m +mongod-1 1/1 Running 0 7m54s +mongod-2 1/1 Running 0 6m13s +ycsb-775fc86c4b-kj5vv 1/1 Running 0 31m +``` +{% endtab %} +{% endtabs %} + +{% tabs %} +{% tab title="Command" %} +```text +kubectl mayastor get volumes +``` +{% endtab %} + +{% tab title="Output" %} +```text +ID REPLICAS TARGET-NODE ACCESSIBILITY STATUS SIZE THIN-PROVISIONED ALLOCATED + f41c2cdc-5611-471e-b5eb-1cfb571b1b87 1 gke-mayastor-pool-2acd09ca-ppxw nvmf Online 3GiB false 3GiB + 113882e1-c270-4c72-9c1f-d9e09bfd66ad 1 gke-mayastor-pool-2acd09ca-4v3z nvmf Online 3GiB false 3GiB + fb4d6a4f-5982-4049-977b-9ae20b8162ad 1 gke-mayastor-pool-2acd09ca-q30r nvmf Online 3GiB false 3GiB +``` +{% endtab %} +{% endtabs %} + + + +## Step 13: Verify Data Replication on Secondary DB + +Verify data replication on the secondary database to ensure synchronization. + +``` +root@mongod-1:/# mongosh mongodb://admin:admin@mongod-1.mongodb-service.default.svc.cluster.local:27017 +Current Mongosh Log ID: 6501c744eb148521b3716af5 +Connecting to: mongodb://@mongod-1.mongodb-service.default.svc.cluster.local:27017/?directConnection=true&appName=mongosh+1.10.6 +Using MongoDB: 7.0.1 +Using Mongosh: 1.10.6 + +For mongosh info see: https://docs.mongodb.com/mongodb-shell/ + +------ + The server generated these startup warnings when booting + 2023-09-13T14:19:37.984+00:00: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem + 2023-09-13T14:19:38.679+00:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted + 2023-09-13T14:19:38.679+00:00: You are running this process as the root user, which is not recommended + 2023-09-13T14:19:38.679+00:00: vm.max_map_count is too low +------ + +rs0 [direct: secondary] test> use mydb +switched to db mydb +rs0 [direct: secondary] mydb> db.getMongo().setReadPref('secondary') + +rs0 [direct: secondary] mydb> db.accounts.find() +[ + { + _id: ObjectId("65019e2f183959fbdbd23f00"), + name: 'john', + total: '1058' + }, + { + _id: ObjectId("65019e2f183959fbdbd23f01"), + name: 'jane', + total: '6283' + }, + { + _id: ObjectId("65019e31183959fbdbd23f02"), + name: 'james', + total: '472' + } +] +rs0 [direct: secondary] mydb> +``` \ No newline at end of file diff --git a/docs/main/user-guides/replicated-engine-user-guide/additional-information/performance-tips.md b/docs/main/user-guides/replicated-engine-user-guide/additional-information/performance-tips.md new file mode 100644 index 000000000..f628d55c8 --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/additional-information/performance-tips.md @@ -0,0 +1,108 @@ +# Performance Tips + +## CPU isolation + +Mayastor will fully utilize each CPU core that it was configured to run on. It will spawn a thread on each and the thread will run in an endless loop serving tasks dispatched to it without sleeping or blocking. There are also other Mayastor threads that are not bound to the CPU and those are allowed to block and sleep. However, the bound threads \(also called reactors\) rely on being interrupted by the kernel and other userspace processes as little as possible. Otherwise, the latency of IO may suffer. + +Ideally, the only thing that interrupts Mayastor's reactor would be only kernel time-based interrupts responsible for CPU accounting. However, that is far from trivial. `isolcpus` option that we will be using does not prevent: + +* kernel threads and +* other k8s pods to run on the isolated CPU + +However, it prevents system services including kubelet from interfering with Mayastor. + +### Set Linux kernel boot parameter + +Note that the best way to accomplish this step may differ, based on the Linux distro that you are using. + +Add the `isolcpus` kernel boot parameter to `GRUB_CMDLINE_LINUX_DEFAULT` in the grub configuration file, with a value which identifies the CPUs to be isolated \(indexing starts from zero here\). The location of the configuration file to change is typically `/etc/default/grub` but may vary. For example when running Ubuntu 20.04 in AWS EC2 Cloud boot parameters are in `/etc/default/grub.d/50-cloudimg-settings.cfg`. + +In the following example we assume a system with 4 CPU cores in total, and that the third and the fourth CPU cores are to be dedicated to Mayastor. + +```text +GRUB_CMDLINE_LINUX_DEFAULT="quiet splash isolcpus=2,3" +``` + +### Update grub + +{% tabs %} +{% tab title="Command" %} +```bash +sudo update-grub +``` +{% endtab %} + +{% tab title="Output example" %} +```text +Sourcing file `/etc/default/grub' +Sourcing file `/etc/default/grub.d/40-force-partuuid.cfg' +Sourcing file `/etc/default/grub.d/50-cloudimg-settings.cfg' +Sourcing file `/etc/default/grub.d/init-select.cfg' +Generating grub configuration file ... +Found linux image: /boot/vmlinuz-5.8.0-29-generic +Found initrd image: /boot/microcode.cpio /boot/initrd.img-5.8.0-29-generic +Found linux image: /boot/vmlinuz-5.4.0-1037-aws +Found initrd image: /boot/microcode.cpio /boot/initrd.img-5.4.0-1037-aws +Found Ubuntu 20.04.2 LTS (20.04) on /dev/xvda1 +done +``` +{% endtab %} +{% endtabs %} + +### Reboot the system + +{% tabs %} +{% tab title="Command" %} +```bash +sudo reboot +``` +{% endtab %} +{% endtabs %} + +### Verify isolcpus + +Basic verification is by outputting the boot parameters of the currently running kernel: + +{% tabs %} +{% tab title="Command" %} +```bash +cat /proc/cmdline +``` +{% endtab %} + +{% tab title="Example output" %} +```text +BOOT_IMAGE=/boot/vmlinuz-5.8.0-29-generic root=PARTUUID=7213a253-01 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 isolcpus=2,3 panic=-1 +``` +{% endtab %} +{% endtabs %} + +You can also print a list of isolated CPUs: + +{% tabs %} +{% tab title="Command" %} +```bash +cat /sys/devices/system/cpu/isolated +``` +{% endtab %} + +{% tab title="Example output" %} +```text +2-3 +``` +{% endtab %} +{% endtabs %} + +### Update mayastor helm chart for CPU core specification + +To allot specific CPU cores for Mayastor's reactors, follow these steps: + +1. Ensure that you have the Mayastor kubectl plugin installed, matching the version of your Mayastor Helm chart deployment ([releases](https://github.com/openebs/mayastor/releases)). You can find installation instructions in the [Mayastor kubectl plugin documentation]( https://mayastor.gitbook.io/introduction/advanced-operations/kubectl-plugin). + +2. Execute the following command to update Mayastor's configuration. Replace `` with the appropriate Kubernetes namespace where Mayastor is deployed. + +``` +kubectl mayastor upgrade -n --set-args 'io_engine.coreList={3,4}' +``` + +In the above command, `io_engine.coreList={3,4}` specifies that Mayastor's reactors should operate on the third and fourth CPU cores. \ No newline at end of file diff --git a/docs/main/user-guides/replicated-engine-user-guide/additional-information/replica-operations.md b/docs/main/user-guides/replicated-engine-user-guide/additional-information/replica-operations.md new file mode 100644 index 000000000..b33a60acf --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/additional-information/replica-operations.md @@ -0,0 +1,70 @@ +# Replica Operations + +## Basics + +When a Mayastor volume is provisioned based on a StorageClass which has a replication factor greater than one \(set by its `repl` parameter)\, the control plane will attempt to maintain through a 'Kubernetes-like' reconciliation loop that number of identical copies of the volume's data "replicas" (a replica is a nexus target "child"\) at any point in time. When a volume is first provisioned the control plane will attempt to create the required number of replicas, whilst adhering to its internal heuristics for their location within the cluster \(which will be discussed shortly\). If it succeeds, the volume will become available and will bind with the PVC. If the control plane cannot identify a sufficient number of eligble Mayastor Pools in which to create required replicas at the time of provisioning, the operation will fail; the Mayastor Volume will not be created and the associated PVC will not be bound. Kubernetes will periodically re-try the volume creation and if at any time the appropriate number of pools can be selected, the volume provisioning should succeed. + +Once a volume is processing I/O, each of its replicas will also receive I/O. Reads are round-robin distributed across replicas, whilst writes must be written to all. In a real world environment this is attended by the possibility that I/O to one or more replicas might fail at any time. Possible reasons include transient loss of network connectivity, node reboots, node or disk failure. If a volume's nexus \(NVMe controller\) encounters 'too many' failed I/Os for a replica, then that replica's child status will be marked `Faulted` and it will no longer receive I/O requests from the nexus. It will remain a member of the volume, whose departure from the desired state with respect to replica count will be reflected with a volume status of `Degraded`. How many I/O failures are considered "too many" in this context is outside the scope of this discussion. + +The control plane will first 'retire' the old, faulted one which will then no longer be associated to the volume. Once retired, a replica will become available for garbage collection (deletion from the Mayastor Pool containing it), assuming that the nature of the failure was such that the pool itself is still viable (i.e. the underlying disk device is still accessible). + Then it will attempt to restore the desired state \(replica count\) by creating a new replica, following its replica placement rules. If it succeeds, the nexus will "rebuild" that new replica - performing a full copy of all data from a healthy replica `Online`, i.e. the source. This process can proceed whilst the volume continues to process application I/Os although it will contend for disk throughput at both the source and destination disks. + +If a nexus is cleanly restarted, i.e. the Mayastor pod hosting it restarts gracefully, with the assistance of the control plane it will 'recompose' itself; all of the previous healthy member replicas will be re-attached to it. If previously faulted replicas are available to be re-connected (`Online`), then the control plane will attempt to reuse and rebuild them directly, rather than seek replacements for them first. This edge-case therefore does not result in the retirement of the affected replicas; they are simply reused. If the rebuild fails then we follow the above process of removing a `Faulted` replica and adding a new one. On an unclean restart (i.e. the Mayastor pod hosting the nexus crashes or is forcefully deleted) only one healthy replica will be re-attached and all other replicas will eventually be rebuilt. + +Once provisioned, the replica count of a volume can be changed using the kubectl-mayastor plugin `scale` subcommand. The value of the `num_replicas` field may be either increased or decreased by one and the control plane will attempt to satisfy the request by creating or destroying a replicas as appropriate, following the same replica placement rules as described herein. If the replica count is reduced, faulted replicas will be selected for removal in preference to healthy ones. + +## Replica Placement Heuristics + +Accurate predictions of the behaviour of Mayastor with respect to replica placement and management of replica faults can be made by reference to these 'rules', which are a simplified representation of the actual logic: + +* "Rule 1": A volume can only be provisioned if the replica count \(and capacity\) of its StorageClass can be satisfied at the time of creation +* "Rule 2": Every replica of a volume must be placed on a different Mayastor Node) +* "Rule 3": Children with the state `Faulted` are always selected for retirement in preference to those with state `Online` + +N.B.: By application of the 2nd rule, replicas of the same volume cannot exist within different pools on the same Mayastor Node. + +## Example Scenarios + +### Scenario One + +A cluster has two Mayastor nodes deployed, "Node-1" and "Node-2". Each Mayastor node hosts two Mayastor pools and currently, no Mayastor volumes have been defined. Node-1 hosts pools "Pool-1-A" and "Pool-1-B", whilst Node-2 hosts "Pool-2-A and "Pool-2-B". When a user creates a PVC from a StorageClass which defines a replica count of 2, the Mayastor control plane will seek to place one replica on each node (it 'follows' Rule 2). Since in this example it can find a suitable candidate pool with sufficient free capacity on each node, the volume is provisioned and becomes "healthy" (Rule 1). Pool-1-A is selected on Node-1, and Pool-2-A selected on Node-2 (all pools being of equal capacity and replica count, in this initial 'clean' state). + +Sometime later, the physical disk of Pool-2-A encounters a hardware failure and goes offline. The volume is in use at the time, so its nexus \(NVMe controller\) starts to receive I/O errors for the replica hosted in that Pool. The associated replica's child from Pool-2-A enters the `Faulted` state and the volume state becomes `Degraded` (as seen through the kubectl-mayastor plugin). + + +Expected Behaviour: The volume will maintain read/write access for the application via the remaining healthy replica. The faulty replica from Pool-2-A will be removed from the Nexus thus changing the nexus state to `Online` as the remaining is healthy. A new replica is created on either Pool-2-A or Pool-2-B and added to the nexus. The new replica child is rebuilt and eventually the state of the volume returns to `Online`. + +### Scenario Two + +A cluster has three Mayastor nodes deployed, "Node-1", "Node-2" and "Node-3". Each Mayastor node hosts one pool: "Pool-1" on Node-1, "Pool-2" on Node-2 and "Pool-3" on Node-3. No Mayastor volumes have yet been defined; the cluster is 'clean'. A user creates a PVC from a StorageClass which defines a replica count of 2. The control plane determines that it is possible to accommodate one replica within the available capacity of each of Pool-1 and Pool-2, and so the volume is created. An application is deployed on the cluster which uses the PVC, so the volume receives I/O. + +Unfortunately, due to user error the SAN LUN which is used to persist Pool-2 becomes detached from Node-2, causing I/O failures in the replica which it hosts for the volume. As with scenario one, the volume state becomes `Degraded` and the faulted child's becomes `Faulted`. + +Expected Behaviour: Since there is a Mayastor pool on Node-3 which has sufficient capacity to host a replacement replica, a new replica can be created (Rule 2: this 'third' incoming replica isn't located on either of the nodes that the two original ones are). The faulted replica in Pool-2 is retired from the nexus and a new replica is created on Pool-3 and added to the nexus. The new replica is rebuilt and eventually the state of the volume returns to `Online`. + +### Scenario Three + +In the cluster from Scenario three, sometime after the Mayastor volume has returned to the `Online` state, a user scales up the volume, increasing the `num_replicas` value from 2 to 3. Before doing so they corrected the SAN misconfiguration and ensured that the pool on Node-2 was `Online`. + +Expected Behaviour: The control plane will attempt to reconcile the difference in current (replicas = 2) and desired (replicas = 3) states. Since Node-2 no longer hosts a replica for the volume (the previously faulted replica was successfully retired and is no longer a member of the volume's nexus), the control plane will select it to host the new replica required (Rule 2 permits this). The volume state will become initially `Degraded` to reflect the difference in actual vs required redundant data copies but a rebuild of the new replica will be performed and eventually the volume state will be `Online` again. + +### Scenario Four + +A cluster has three Mayastor nodes deployed; "Node-1", "Node-2" and "Node-3". Each Mayastor node hosts two Mayastor pools and currently, no Mayastor volumes have been defined. Node-1 hosts pools "Pool-1-A" and "Pool-1-B", whilst Node-2 hosts "Pool-2-A and "Pool-2-B" and Node-3 hosts "Pool-3-A" and "Pool-3-B". A single volume exists in the cluster, which has a replica count of 3. The volume's replicas are all healthy and are located on Pool-1-A, Pool-2-A and Pool-3-A. An application is using the volume, so all replicas are receiving I/O. + +The host Node-3 goes down causing failure of all I/O sent to the replica it hosts from Pool-3-A. + +Expected Behaviour: The volume will enter and remain in the `Degraded` state. The associated child from the replica from Pool-3-A will be in the state `Faulted`, as observed in the volume through the kubectl-mayastor plugin. Said replica will be removed from the Nexus thus changing the nexus state to `Online` as the other replicas are healthy. The replica will then be disowned from the volume (it won't be possible to delete it since the host is down). Since Rule 2 dictates that every replica of a volume must be placed on a different Mayastor Node no new replica can be created at this point and the volume remains `Degraded` indefinitely. + + +### Scenario Five + +Given the post-host failure situation of Scenario four, the user scales down the volume, reducing the value of `num_replicas` from 3 to 2. + +Expected Behaviour: The control plane will reconcile the actual \(replicas=3\) vs desired \(replicas=2\) state of the volume. The volume state will become `Online` again. + +### Scenario Six + +In scenario Five, after scaling down the Mayastor volume the user waits for the volume state to become `Online` again. The desired and actual replica count are now 2. The volume's replicas are located in pools on both Node-1 and Node-2. The Node-3 is now back up and its pools Pool-3-A and Pool-3-B are `Online`. The user then scales the volume again, increasing the `num_replicas` from 2 to 3 again. + +Expected Behaviour: The volume's state will become `Degraded`, reflecting the difference in desired vs actual replica count. The control plane will select a pool on Node-3 as the location for the new replica required. Node-3 is therefore again a suitable candidate and has online pools with sufficient capacity. diff --git a/docs/main/user-guides/replicated-engine-user-guide/additional-information/scale-etcd.md b/docs/main/user-guides/replicated-engine-user-guide/additional-information/scale-etcd.md new file mode 100644 index 000000000..f1fe3245f --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/additional-information/scale-etcd.md @@ -0,0 +1,167 @@ +--- +Title: Scaling up etcd members +--- + +By default, Mayastor allows the creation of three etcd members. If you wish to increase the number of etcd replicas, you will encounter an error. However, you can make the necessary configuration changes discussed in this guide to make it work. + +## Overview of StatefulSets + +StatefulSets are Kubernetes resources designed for managing stateful applications. They provide stable network identities and persistent storage for pods. StatefulSets ensure ordered deployment and scaling, support persistent volume claims, and manage the state of applications. They are commonly used for databases, messaging systems, and distributed file systems. Here's how StatefulSets function: +* For a StatefulSet with N replicas, when pods are deployed, they are created sequentially in order from {0..N-1}. +* When pods are deleted, they are terminated in reverse order from {N-1..0}. +* Before a scaling operation is applied to a pod, all of its predecessors must be running and ready. +* Before a pod is terminated, all of its successors must be completely shut down. +* Mayastor uses etcd database for persisting configuration and state information. Etcd is setup as a Kubernetes StatefulSet when Mayastor is installed. + +{% tabs %} +{% tab title="Command" %} +```text +kubectl get dsp -n mayastor +``` +{% endtab %} +{% tab title="Output" %} +```text +NAME NODE STATE POOL_STATUS CAPACITY USED AVAILABLE +pool-0 worker-0 Online Online 374710730752 22561161216 352149569536 +pool-1 worker-1 Online Online 374710730752 21487419392 353223311360 +pool-2 worker-2 Online Online 374710730752 21793603584 352917127168 +``` +{% endtab %} +{% endtabs %} + +{% hint style="note" %} +Take a snapshot of the etcd. Click [here](https://etcd.io/docs/v3.5/op-guide/recovery/) for the detailed documentation. +{% endhint %} + +* From etcd-0/1/2, we can see that all the values are registered in the database. Once we scale up etcd with "n" replicas, all the key-value pairs should be available across all the pods. + +To scale up the etcd members, the following steps can be performed: + +1. Add a new etcd member +2. Add a peer URL +3. Create a PV (Persistent Volume) +4. Validate key-value pairs + +---------- + +## Step 1: Adding a New etcd Member (Scaling Up etcd Replica) + +To increase the number of replicas to 4, use the following `kubectl scale` command: + +{% tabs %} +{% tab title="Command" %} +```text +kubectl scale sts mayastor-etcd -n mayastor --replicas=4 +``` +{% endtab %} +{% tab title="Output" %} +```text +statefulset.apps/mayastor-etcd scaled +``` +{% endtab %} +{% endtabs %} + +> The new pod will be created on available nodes but will be in a **pending state** as there is no PV/PVC created to bind the volumes. + +{% tabs %} +{% tab title="Command" %} +```text +kubectl get pods -n mayastor -l app=etcd +``` +{% endtab %} +{% tab title="Output" %} +```text +NAME READY STATUS RESTARTS AGE +mayastor-etcd-0 1/1 Running 0 28d +mayastor-etcd-1 1/1 Running 0 28d +mayastor-etcd-2 1/1 Running 0 28d +mayastor-etcd-3 0/1 Pending 0 2m34s +``` +{% endtab %} +{% endtabs %} + +## Step 2: Add a New Peer URL + +Before creating a PV, we need to add the new peer URL (mayastor-etcd-3=http://mayastor-etcd-3.mayastor-etcd-headless.mayastor.svc.cluster.local:2380) and change the cluster's initial state from "new" to "existing" so that the new member will be added to the existing cluster when the pod comes up after creating the PV. Since the new pod is still in a pending state, the changes will not be applied to the other pods as they will be restarted in reverse order from {N-1..0}. It is expected that all of its predecessors must be running and ready. + +{% tabs %} +{% tab title="Command" %} +```text +kubectl edit sts mayastor-etcd -n mayastor +``` +{% endtab %} +{% tab title="Output" %} +```text + - name: ETCD_INITIAL_CLUSTER_STATE + value: existing + - name: ETCD_INITIAL_CLUSTER + value: mayastor-etcd-0=http://mayastor-etcd-0.mayastor-etcd-headless.mayastor.svc.cluster.local:2380,mayastor-etcd-1=http://mayastor-etcd-1.mayastor-etcd-headless.mayastor.svc.cluster.local:2380,mayastor-etcd-2=http://mayastor-etcd-2.mayastor-etcd-headless.mayastor.svc.cluster.local:2380,mayastor-etcd-3=http://mayastor-etcd-3.mayastor-etcd-headless.mayastor.svc.cluster.local:2380 +``` +{% endtab %} +{% endtabs %} + + + +## Step 3: Create a Persistent Volume + +Create a PV with the following YAML. Change the pod name/claim name based on the pod's unique identity. + +{% hint style="note" %} +This is only for the volumes created with "manual" storage class. +{% endhint %} + +{% tabs %} +{% tab title="YAML" %} +```text +apiVersion: v1 +kind: PersistentVolume +metadata: + annotations: + meta.helm.sh/release-name: mayastor + meta.helm.sh/release-namespace: mayastor + pv.kubernetes.io/bound-by-controller: "yes" + finalizers: + - kubernetes.io/pv-protection + labels: + app.kubernetes.io/managed-by: Helm + statefulset.kubernetes.io/pod-name: mayastor-etcd-3 + name: etcd-volume-3 +spec: + accessModes: + - ReadWriteOnce + capacity: + storage: 2Gi + claimRef: + apiVersion: v1 + kind: PersistentVolumeClaim + name: data-mayastor-etcd-3 + namespace: mayastor + hostPath: + path: /var/local/mayastor/etcd/pod-3 + type: "" + persistentVolumeReclaimPolicy: Delete + storageClassName: manual + volumeMode: Filesystem +``` +{% endtab %} +{% tab title="Output" %} +```text +kubectl apply -f pv-etcd.yaml -n mayastor +persistentvolume/etcd-volume-3 created +``` +{% endtab %} +{% endtabs %} + +## Step 4: Validate Key-Value Pairs + +Run the following command from the new etcd pod and ensure that the values are the same as those in etcd-0/1/2. Otherwise, it indicates a data loss issue. + +{% tabs %} +{% tab title="Command" %} +```text +kubectl exec -it mayastor-etcd-3 -n mayastor -- bash +#ETCDCTL_API=3 +#etcdctl get --prefix "" +``` +{% endtab %} +{% endtabs %} \ No newline at end of file diff --git a/docs/main/user-guides/replicated-engine-user-guide/additional-information/tested-third-party-software.md b/docs/main/user-guides/replicated-engine-user-guide/additional-information/tested-third-party-software.md new file mode 100644 index 000000000..d44357f50 --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/additional-information/tested-third-party-software.md @@ -0,0 +1,16 @@ +# Tested Third Party Software + +## **Worker Node OS** + +| **Linux** | _Distribution:Ubuntu Version: 20.04 LTS Kernel version: 5.13.0-27-generic_ | +|-------------|----------------------------------------------------------------------------| +| **Windows** | _Not supported_ | + +## **Cluster Orchestration** + +#### Kubernetes versions: + + * v1.25.10 + * v1.23.7 + * v1.22.10 + * v1.21.13 \ No newline at end of file diff --git a/docs/main/user-guides/replicated-engine-user-guide/additional-information/tips-and-tricks.md b/docs/main/user-guides/replicated-engine-user-guide/additional-information/tips-and-tricks.md new file mode 100644 index 000000000..57311607a --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/additional-information/tips-and-tricks.md @@ -0,0 +1,94 @@ +# Tips and Tricks + +## What if I have no Disk Devices available that I can use for testing? + +For basic test and evaluation purposes it may not always be practical or possible to allocate physical disk devices on a cluster to Mayastor for use within its pools. As a convenience, Mayastor supports two disk device type emulations for this purpose: + +* Memory-Backed Disks \("RAM drive"\) +* File-Backed Disks + +Memory-backed Disks are the most readily provisioned if node resources permit, since Mayastor will automatically create and configure them as it creates the corresponding pool. However they are the least durable option - since the data is held entirely within memory allocated to a Mayastor pod, should that pod be terminated and rescheduled by Kubernetes, that data will be lost. Therefore it is strongly recommended that this type of disk emulation be used only for short duration, simple testing. It must not be considered for production use. + +File-backed disks, as their name suggests, store pool data within a file held on a file system which is accessible to the Mayastor pod hosting that pool. Their durability depends on how they are configured; specifically on which type of volume mount they are located. If located on a path which uses Kubernetes ephemeral storage \(eg. EmptyDir\), they may be no more persistent than a RAM drive would be. However, if placed on their own Persistent Volume \(eg. a Kubernetes Host Path volume\) then they may considered 'stable'. They are slightly less convenient to use than memory-backed disks, in that the backing files must be created by the user as a separate step preceding pool creation. However, file-backed disks can be significantly larger than RAM disks as they consume considerably less memory resource within the hosting Mayastor pod. + +### Using Memory-backed Disks + +Creating a memory-backed disk emulation entails using the "malloc" uri scheme within the Mayastor pool resource definition. + +```text +apiVersion: "openebs.io/v1alpha1" +kind: DiskPool +metadata: + name: mempool-1 + namespace: mayastor +spec: + node: worker-node-1 + disks: ["malloc:///malloc0?size_mb=64"] +``` + +The example shown defines a pool named "mempool-1". The Mayastor pod hosted on "worker-node-1" automatically creates a 64MiB emulated disk for it to use, with the device identifier "malloc0" - _provided that at least 64MiB of 2MiB-sized Huge Pages are available to that pod after the Mayastor container's own requirements have been satisfied_. + +#### The malloc:/// URI schema + +The pool definition caccepts URIs matching the malloc:/// schema within its `disks` field for the purposes of provisioning memory-based disks. The general format is: + +`malloc:///malloc?` + +Where <DeviceId> is an integer value which uniquely identifies the device on that node, and where the parameter collection <parameters> may include the following: + +| Parameter | Function | Value Type | Notes | +| :--- | :--- | :--- | :--- | +| size\_mb | Specifies the requested size of the device in MiB | Integer | Mutually exclusive with "num\_blocks" | +| num\_blocks | Specifies the requested size of the device in terms of the number of addressable blocks | Integer | Mutually exclusive with "size\_mb" | +| blk\_size | Specifies the block size to be reported by the device in bytes | Integer \(512 or 4096\) | Optional. If not used, block size defaults to 512 bytes | + +{% hint style="warning" %} +Note: Memory-based disk devices are not over-provisioned and the memory allocated to them is so from the 2MiB-sized Huge Page resources available to the Mayastor pod. That is to say, to create a 64MiB device requires that at least 33 \(32+1\) 2MiB-sized pages are free for that Mayastor container instance to use. Satisfying the memory requirements of this disk type may require additional configuration on the worker node and changes to the resource request and limit spec of the Mayastor daemonset, in order to ensure that sufficient resource is available. +{% endhint %} + +### Using File-backed Disks + +Mayastor can use file-based disk emulation in place of physical pool disk devices, by employing the aio:/// URI schema within the pool's declaration in order to identify the location of the file resource. + +{% tabs %} +{% tab title="512 Byte Sector Size" %} +```text +apiVersion: "openebs.io/v1alpha1" +kind: DiskPool +metadata: + name: filepool-1 + namespace: mayastor +spec: + node: worker-node-1 + disks: ["aio:///var/tmp/disk1.img"] +``` +{% endtab %} + +{% tab title="4kBSector Size" %} +``` +apiVersion: "openebs.io/v1alpha1" +kind: DiskPool +metadata: + name: filepool-1 + namespace: mayastor +spec: + node: worker-node-1 + disks: ["aio:///tmp/disk1.img?blk_size=4096"] +``` +{% endtab %} +{% endtabs %} + +The examples shown seek to create a pool using a file named "disk1.img", located in the /var/tmp directory of the Mayastor container's file system, as its member disk device. For this operation to succeed, the file must already exist on the specified path \(which should be FULL path to the file\) and this path must be accessible by the Mayastor pod instance running on the corresponding node. + +The aio:/// schema requires no other parameters but optionally, "blk\_size" may be specified. Block size accepts a value of either 512 or 4096, corresponding to the emulation of either a 512-byte or 4kB sector size device. If this parameter is omitted the device defaults to using a 512-byte sector size. + +File-based disk devices are not over-provisioned; to create a 10GiB pool disk device requires that a 10GiB-sized backing file exist on a file system on an accessible path. + +The preferred method of creating a backing file is to use the linux `truncate` command. The following example demonstrates the creation of a 1GiB-sized file named disk1.img within the directory /tmp. + +```text +truncate -s 1G /tmp/disk1.img +``` + + + diff --git a/docs/main/user-guides/replicated-engine-user-guide/advanced-operations/HA.md b/docs/main/user-guides/replicated-engine-user-guide/advanced-operations/HA.md new file mode 100644 index 000000000..ddbccf43a --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/advanced-operations/HA.md @@ -0,0 +1,17 @@ +## High Availability + +Mayastor 2.0 enhances High Availability (HA) of the volume target with the nexus switch-over feature. In the event of the target failure, the switch-over feature quickly detects the failure and spawns a new nexus to ensure I/O continuity. +The HA feature consists of two components: the HA node agent (which runs in each csi- node) and the cluster agent (which runs alongside the agent-core). The HA node agent looks for io-path failures from applications to their corresponding targets. If any such broken path is encountered, the HA node agent informs the cluster agent. The cluster-agent then creates a new target on a different (live) node. Once the target is created, the `node-agent` establishes a new path between the application and its corresponding target. The HA feature restores the broken path within seconds, ensuring negligible downtime. + +{% hint style="warning" %} +The volume's replica count must be higher than 1 for a new target to be established as part of switch-over. +{% endhint %} + + +### How do I disable this feature? + +{% hint style="info" %} +We strongly recommend keeping this feature enabled. +{% endhint %} + +The HA feature is enabled by default; to disable it, pass the parameter `--set=agents.ha.enabled=false` with the helm install command. \ No newline at end of file diff --git a/docs/main/user-guides/replicated-engine-user-guide/advanced-operations/kubectl-plugin.md b/docs/main/user-guides/replicated-engine-user-guide/advanced-operations/kubectl-plugin.md new file mode 100644 index 000000000..ef325ce49 --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/advanced-operations/kubectl-plugin.md @@ -0,0 +1,218 @@ +# Mayastor kubectl plugin + +The **Mayastor kubectl plugin** can be used to view and manage Mayastor resources such as nodes, pools and volumes. It is also used for operations such as scaling the replica count of volumes. + +## Install kubectl plugin + +- The Mayastor kubectl plugin is available for the Linux platform. The binary for the plugin can be found [here](https://github.com/mayadata-io/mayastor-control-plane/releases). + +- Add the downloaded Mayastor kubectl plugin under $PATH. + +To verify the installation, execute: + +{% tabs %} +{% tab title="Command" %} +```text +kubectl mayastor -V +``` +{% endtab %} + +{% tab title="Expected Output" %} +```text +kubectl-plugin 1.0.0 +``` +{% endtab %} +{% endtabs %} + + +--------- + + +## Use kubectl plugin to retrieve data + +Sample command to use kubectl plugin: + +``` +USAGE: + kubectl-mayastor [OPTIONS] + +OPTIONS: + -h, --help + Print help information + -j, --jaeger + Trace rest requests to the Jaeger endpoint agent + -k, --kube-config-path + Path to kubeconfig file + -n, --namespace + Kubernetes namespace of mayastor service, defaults to mayastor [default: mayastor] + -o, --output + The Output, viz yaml, json [default: none] + -r, --rest + The rest endpoint to connect to + -t, --timeout + Timeout for the REST operations [default: 10s] + -V, --version + Print version information +SUBCOMMANDS: + cordon 'Cordon' resources + drain 'Drain' resources + dump `Dump` resources + get 'Get' resources + help Print this message or the help of the given subcommand(s) + scale 'Scale' resources + uncordon 'Uncordon' resources +``` + +You can use the plugin with the following options: + +### Get Mayastor Volumes + +{% tabs %} +{% tab title="Command" %} +```text +kubectl mayastor get volumes +``` +{% endtab %} + +{% tab title="Expected Output" %} +```text +ID REPLICAS TARGET-NODE ACCESSIBILITY STATUS SIZE + +18e30e83-b106-4e0d-9fb6-2b04e761e18a 4 mayastor-1 nvmf Online 10485761 +0c08667c-8b59-4d11-9192-b54e27e0ce0f 4 mayastor-2 Online 10485761 +``` +{% endtab %} +{% endtabs %} + + +### Get Mayastor Pools + +{% tabs %} +{% tab title="Command" %} +```text +kubectl mayastor get pools +``` +{% endtab %} + +{% tab title="Expected Output" %} +```text +ID TOTAL CAPACITY USED CAPACITY DISKS NODE STATUS MANAGED + +mayastor-pool-1 5360320512 1111490560 aio:///dev/vdb?uuid=d8a36b4b-0435-4fee-bf76-f2aef980b833 kworker1 Online true +mayastor-pool-2 5360320512 2172649472 aio:///dev/vdc?uuid=bb12ec7d-8fc3-4644-82cd-dee5b63fc8c5 kworker1 Online true +mayastor-pool-3 5360320512 3258974208 aio:///dev/vdb?uuid=f324edb7-1aca-41ec-954a-9614527f77e1 kworker2 Online false + +``` +{% endtab %} +{% endtabs %} + +### Get Mayastor Nodes + +{% tabs %} +{% tab title="Command" %} +```text +kubectl mayastor get nodes +``` +{% endtab %} + +{% tab title="Expected Output" %} +```text + +ID GRPC ENDPOINT STATUS +mayastor-2 10.1.0.7:10124 Online +mayastor-1 10.1.0.6:10124 Online +mayastor-3 10.1.0.8:10124 Online +``` +{% endtab %} +{% endtabs %} + + {% hint style="warning" %} + All the above resource information can be retrieved for a particular resource using its ID. The command to do so is as follows: + kubectl mayastor get <resource_name> <resource_id> + {% endhint %} + + +### Scale the replica count of a volume + +{% tabs %} +{% tab title="Command" %} +```text +kubectl mayastor scale volume +``` +{% endtab %} + +{% tab title="Expected Output" %} +```text +Volume 0c08667c-8b59-4d11-9192-b54e27e0ce0f Scaled Successfully 🚀 +``` +{% endtab %} +{% endtabs %} + + +### Retrieve resource in any of the output formats (table, JSON or YAML) + +> Table is the default output format. + +{% tabs %} +{% tab title="Command" %} +```text +kubectl mayastor -ojson get +``` +{% endtab %} + +{% tab title="Expected Output" %} +```text +[{"spec":{"num_replicas":2,"size":67108864,"status":"Created","target":{"node":"ksnode-2","protocol":"nvmf"},"uuid":"5703e66a-e5e5-4c84-9dbe-e5a9a5c805db","topology":{"explicit":{"allowed_nodes":["ksnode-1","ksnode-3","ksnode-2"],"preferred_nodes":["ksnode-2","ksnode-3","ksnode-1"]}},"policy":{"self_heal":true}},"state":{"target":{"children":[{"state":"Online","uri":"bdev:///ac02cf9e-8f25-45f0-ab51-d2e80bd462f1?uuid=ac02cf9e-8f25-45f0-ab51-d2e80bd462f1"},{"state":"Online","uri":"nvmf://192.168.122.6:8420/nqn.2019-05.io.openebs:7b0519cb-8864-4017-85b6-edd45f6172d8?uuid=7b0519cb-8864-4017-85b6-edd45f6172d8"}],"deviceUri":"nvmf://192.168.122.234:8420/nqn.2019-05.io.openebs:nexus-140a1eb1-62b5-43c1-acef-9cc9ebb29425","node":"ksnode-2","rebuilds":0,"protocol":"nvmf","size":67108864,"state":"Online","uuid":"140a1eb1-62b5-43c1-acef-9cc9ebb29425"},"size":67108864,"status":"Online","uuid":"5703e66a-e5e5-4c84-9dbe-e5a9a5c805db"}}] +``` +{% endtab %} +{% endtabs %} + + +### Retrieve replica topology for specific volumes + +{% tabs %} +{% tab title="Command" %} +```text +kubectl mayastor get volume-replica-topology +``` +{% endtab %} + +{% tab title="Expected Output" %} +```text + ID NODE POOL STATUS CAPACITY ALLOCATED SNAPSHOTS CHILD-STATUS REASON REBUILD + a34dbaf4-e81a-4091-b3f8-f425e5f3689b io-engine-1 pool-1 Online 12MiB 0 B 12MiB +``` +{% endtab %} +{% endtabs %} + + +{% hint style="warning" %} +The plugin requires access to the `Mayastor REST server` for execution. It gets the master node IP from the kube-config file. In case of any failure, the REST endpoint can be specified using the ‘–rest’ flag. +{% endhint %} + +### List available volume snapshots + +{% tabs %} +{% tab title="Command" %} +```text +kubectl mayastor get volume-snapshots +``` +{% endtab %} + +{% tab title="Expected Output" %} +```text + ID TIMESTAMP SOURCE-SIZE ALLOCATED-SIZE TOTAL-ALLOCATED-SIZE SOURCE-VOL + 25823425-41fa-434a-9efd-a356b70b5d7c 2023-07-07T13:20:17Z 10MiB 12MiB 12MiB ec4e66fd-3b33-4439-b504-d49aba53da26 +``` +{% endtab %} +{% endtabs %} + +-------- + + +## Limitations of kubectl plugin + +- The plugin currently does not have authentication support. +- The plugin can operate only over HTTP. + +_[Learn more](https://github.com/openebs/mayastor-extensions/blob/develop/k8s/plugin/README.md)_ \ No newline at end of file diff --git a/docs/main/user-guides/replicated-engine-user-guide/advanced-operations/monitoring.md b/docs/main/user-guides/replicated-engine-user-guide/advanced-operations/monitoring.md new file mode 100644 index 000000000..e0815a779 --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/advanced-operations/monitoring.md @@ -0,0 +1,110 @@ +# Monitoring + +## Pool metrics exporter + +The Mayastor pool metrics exporter runs as a sidecar container within every io-engine pod and exposes pool usage metrics in Prometheus format. These metrics are exposed on port 9502 using an HTTP endpoint /metrics and are refreshed every five minutes. + +### Supported pool metrics + +| Name | Type | Unit | Description | +| :--- | :--- | :--- | :--- | +| disk_pool_total_size_bytes | Gauge | Integer | Total size of the pool | +| disk_pool_used_size_bytes | Gauge | Integer | Used size of the pool | +| disk_pool_status | Gauge | Integer | Status of the pool (0, 1, 2, 3) = {"Unknown", "Online", "Degraded", "Faulted"} | +| disk_pool_committed_size | Gauge | Integer | Committed size of the pool in bytes | + +{% tab title="Example metrics" %} +```text +# HELP disk_pool_status disk-pool status +# TYPE disk_pool_status gauge +disk_pool_status{node="worker-0",name="mayastor-disk-pool"} 1 +# HELP disk_pool_total_size_bytes total size of the disk-pool in bytes +# TYPE disk_pool_total_size_bytes gauge +disk_pool_total_size_bytes{node="worker-0",name="mayastor-disk-pool"} 5.360320512e+09 +# HELP disk_pool_used_size_bytes used disk-pool size in bytes +# TYPE disk_pool_used_size_bytes gauge +disk_pool_used_size_bytes{node="worker-0",name="mayastor-disk-pool"} 2.147483648e+09 +# HELP disk_pool_committed_size_bytes Committed size of the pool in bytes +# TYPE disk_pool_committed_size_bytes gauge +disk_pool_committed_size_bytes{node="worker-0", name="mayastor-disk-pool"} 9663676416 +``` +{% endtab %} + + + +-------- + +## Stats exporter metrics + +When [eventing](reference/call-home.md) is activated, the stats exporter operates within the **obs-callhome-stats** container, located in the **callhome** pod. The statistics are made accessible through an HTTP endpoint at port `9090`, specifically using the `/stats` route. + + +### Supported stats metrics + +| Name | Type | Unit | Description | +| :--- | :--- | :--- | :--- | +| pools_created | Guage | Integer | Total successful pool creation attempts | +| pools_deleted | Guage | Integer | Total successful pool deletion attempts | +| volumes_created | Guage | Integer | Total successful volume creation attemtps | +| volumes_deleted | Guage | Integer | Total successful volume deletion attempts | + + +---- + +## Integrating exporter with Prometheus monitoring stack + +1. To install, add the Prometheus-stack helm chart and update the repo. + +{% tab title="Command" %} +```text +helm repo add prometheus-community https://prometheus-community.github.io/helm-charts +helm repo update +``` +{% endtab %} + +Then, install the Prometheus monitoring stack and set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues to false. This enables Prometheus to discover custom ServiceMonitor for Mayastor. + +{% tab title="Command" %} +```text +helm install mayastor prometheus-community/kube-prometheus-stack -n mayastor --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false +``` +{% endtab %} + +2. Next, install the ServiceMonitor resource to select services and specify their underlying endpoint objects. + +{% tab title="ServiceMonitor YAML" %} +```text +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + name: mayastor-monitoring + labels: + app: mayastor +spec: + selector: + matchLabels: + app: mayastor + endpoints: + - port: metrics +``` +{% endtab %} + +{% hint style="info" %} +Upon successful integration of the exporter with the Prometheus stack, the metrics will be available on the port 9090 and HTTP endpoint /metrics. +{% endhint %} + +--- + +## CSI metrics exporter + +| Name | Type | Unit | Description | +| :--- | :--- | :--- | :--- | +| kubelet_volume_stats_available_bytes | Gauge | Integer | Size of the available/usable volume (in bytes) | +| kubelet_volume_stats_capacity_bytes | Gauge | Integer | The total size of the volume (in bytes) | +| kubelet_volume_stats_used_bytes | Gauge | Integer | Used size of the volume (in bytes) | +| kubelet_volume_stats_inodes | Gauge | Integer | The total number of inodes | +| kubelet_volume_stats_inodes_free | Gauge | Integer | The total number of usable inodes. | +| kubelet_volume_stats_inodes_used | Gauge | Integer | The total number of inodes that have been utilized to store metadata. | + + +[Learn more](https://kubernetes.io/docs/concepts/storage/volume-health-monitoring/) \ No newline at end of file diff --git a/docs/main/user-guides/replicated-engine-user-guide/advanced-operations/node-cordon.md b/docs/main/user-guides/replicated-engine-user-guide/advanced-operations/node-cordon.md new file mode 100644 index 000000000..db647cbb5 --- /dev/null +++ b/docs/main/user-guides/replicated-engine-user-guide/advanced-operations/node-cordon.md @@ -0,0 +1,38 @@ +## Mayastor Node Cordon + +Cordoning a node marks or taints the node as unschedulable. This prevents the scheduler from deploying new resources on that node. However, the resources that were deployed prior to cordoning off the node will remain intact. + +This feature is in line with the node-cordon functionality of Kubernetes. + +To add a label and cordon a node, execute: +{% tab title="Command" %} +```text +kubectl-mayastor cordon node