diff --git a/docs/source/.internal/helm-crd-warning.md b/docs/source/.internal/helm-crd-warning.md new file mode 100644 index 00000000000..e22af94231a --- /dev/null +++ b/docs/source/.internal/helm-crd-warning.md @@ -0,0 +1,5 @@ +:::{warning} +Helm doesn't support managing CustomResourceDefinition resources ([#5871](https://github.com/helm/helm/issues/5871), [#7735](https://github.com/helm/helm/issues/7735)). +Helm only creates CRDs on the first install and never updates them, while keeping the CRDs up to date (with any update) is absolutely essential. +In order to update them, users have to do it manually every time. +::: diff --git a/docs/source/.internal/manager-license-note.md b/docs/source/.internal/manager-license-note.md new file mode 100644 index 00000000000..caa85bd1f7d --- /dev/null +++ b/docs/source/.internal/manager-license-note.md @@ -0,0 +1,5 @@ +:::{note} +ScyllaDB Manager is available for ScyllaDB Enterprise customers and ScyllaDB Open Source users. +With ScyllaDB Open Source, ScyllaDB Manager is limited to 5 nodes. +See the ScyllaDB Manager [Proprietary Software License Agreement](https://www.scylladb.com/scylla-manager-software-license-agreement/) for details. +::: diff --git a/docs/source/.internal/tuning-warning.md b/docs/source/.internal/tuning-warning.md new file mode 100644 index 00000000000..4401370bc05 --- /dev/null +++ b/docs/source/.internal/tuning-warning.md @@ -0,0 +1,4 @@ +:::{warning} +We recommend that you first try out the performance tuning on a pre-production instance. +Given the nature of the underlying tuning script, undoing the changes requires rebooting the Kubernetes node(s). +::: diff --git a/docs/source/architecture/index.md b/docs/source/architecture/index.md index 41c0dc9e452..0a55ce8f076 100644 --- a/docs/source/architecture/index.md +++ b/docs/source/architecture/index.md @@ -1,10 +1,10 @@ # Architecture -```{toctree} +:::{toctree} :maxdepth: 1 overview storage/index tuning manager -``` +::: diff --git a/docs/source/architecture/manager.md b/docs/source/architecture/manager.md index 9d2aa807adb..7584d3dc4ff 100644 --- a/docs/source/architecture/manager.md +++ b/docs/source/architecture/manager.md @@ -1 +1,25 @@ # ScyllaDB Manager + +{{productName}} has a basic integration with ScyllaDB Manager. At this point there is one global ScyllaDBManager instance that manages all [ScyllaClusters](../resources/scyllaclusters/overview.md) and a corresponding controller that automatically configures the ScyllaDB Manager to monitor the ScyllaDB instances and sync repair and backup tasks based on ScyllaCLuster definition. Unfortunately, the rest of the functionality is not yet implemented in ScyllaCluster APIs and e.g. a restore of a cluster from a backup needs to be performed by executing into the shared ScyllaDB manager deployment and using `sctool` directly by an administrator. + +:::{caution} +Because ScyllaDB manager instance is shared by all users and their ScyllaClusters, only administrators should have privileges to access the `scylla-manager` namespace. +::: + + +:::{include} ../.internal/manager-license-note.md +::: + +## Accessing ScyllaDB Manager + +For the operations that are not yet supported on ScyllaClusters, you can access the ScyllaDB manager manually. + +First, you need to find the ScyllaDB Maanager ID for your cluster: + +:::{code-block} bash +kubectl -n=test get scyllacluster/scylla --template='{{ .status.managerId }}' +::: + +:::{note} +Note that some of the operations use *ScyllaDB Manager Agent* that runs within the ScyllaCluster that has to have access e.g. to buckets being used. +::: diff --git a/docs/source/architecture/storage/index.md b/docs/source/architecture/storage/index.md index 50a26c16616..f7042662e7f 100644 --- a/docs/source/architecture/storage/index.md +++ b/docs/source/architecture/storage/index.md @@ -1,8 +1,8 @@ # Storage -```{toctree} +:::{toctree} :maxdepth: 1 overview local-csi-driver -``` +::: diff --git a/docs/source/architecture/tuning.md b/docs/source/architecture/tuning.md index 0412d79cee4..b2a1cb51639 100644 --- a/docs/source/architecture/tuning.md +++ b/docs/source/architecture/tuning.md @@ -1,10 +1,36 @@ # Tuning -To get the best performance and latency {{productName}} implements performance tuning. +ScyllaDB works best when it's pinned to the CPUs and not interrupted. +To get the best performance and latency {{productName}} -Performance tuning is enabled by default *when you create a corresponding [NodeConfig](../resources/nodeconfigs.md) for your nodes. +One of the most common causes of context-switching are network interrupts. +Packets coming to a Kubernetes node need to be processed which requires using CPU shares. -Because some of the operations it needs to perform are not multitenant or priviledged, the tuning scripts are run in a dedicated system namespace called `scylla-operator-node-tuning`. +On K8s we always have at least a couple of processes running on the node: kubelet, kubernetes provider applications, daemons etc. +These processes require CPU shares, so we cannot dedicate entire node processing power to Scylla, we need to leave space for others. +We take advantage of it, and we pin IRQs to CPUs not used by any Scylla Pods exclusively. + +Performance tuning is enabled by default **when you create a corresponding [NodeConfig](../resources/nodeconfigs.md) for your nodes**. + +Because some of the operations it needs to perform are not multitenant or require elevated privileges, the tuning scripts are run in a dedicated system namespace called `scylla-operator-node-tuning`. This namespace is created and entirely managed by {{productName}} and only administrators can access it. -When a ScyllaCluster Pod is created (and performance tuning is enabled), the Pod initializes but waits until {{productName}} runs an on-demand Job that will configure the host and the ScyllaDB process accordingly. Only after that it will actually start running ScyllaDB. +The tuning is based around `perftune` script that comes from [scyllaDBUtilsImage](../api-reference/groups/scylla.scylladb.com/scyllaoperatorconfigs.rst#api-scylla-scylladb-com-scyllaoperatorconfigs-v1alpha1-status). `perftune` executes the performance optmizations like tuning the kernel, network, disk devices, spreading IRQs across CPUs and more. Conceptually this is run in 2 parts: tuning the [Kubernetes nodes](#kubernetes-nodes) and tuning for [ScyllaDB Pods](#scylladb-pods). + +:::{include} ../.internal/tuning-warning.md +::: + +## Kubernetes nodes + +`perftune` script is executed on the target nodes and tunes kernel, network, disk devices and more. +This is executed right after the tuning is enabled using a [NodeConfig](../resources/nodeconfigs.md) + +## ScyllaDB Pods + +When a [ScyllaCluster](../resources/scyllaclusters/overview.md) Pod is created (and performance tuning is enabled), the Pod initializes but waits until {{productName}} runs an on-demand Job that will configure the host and the ScyllaDB process accordingly (e.g. spreading IRQs across other CPUs). +Only after that it will actually start running ScyllaDB. + +:::{caution} +Only Pods with [`Guaranteed` QoS class](https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/#guaranteed) are eligible to be tuned, otherwise they would not have pinned CPUs. + +Always verify that your [ScyllaCluster](../resources/scyllaclusters/overview.md) resource specifications meat [all the criteria](https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/#criteria). diff --git a/docs/source/conf.py b/docs/source/conf.py index c86145f250a..f46654dae76 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -55,7 +55,12 @@ myst_substitutions = { "productName": "Scylla Operator", "repository": "scylladb/scylla-operator", - "revision": "master" + "revision": "master", + "imageRepository": "docker.io/scylladb/scylla", + "imageTag": "6.2.0", + "enterpriseImageRepository": "docker.io/scylladb/scylla-enterprise", + "enterpriseImageTag": "2024.1.12", + "agentVersion": "3.3.3@sha256:40e31739e8fb1d48af87abaeaa8ee29f71607964daa8434fe2526dfc6f665920", } # -- Options for not found extension diff --git a/docs/source/contributing.md b/docs/source/contributing.md deleted file mode 100644 index da5fc078732..00000000000 --- a/docs/source/contributing.md +++ /dev/null @@ -1,155 +0,0 @@ -# Contributing to Scylla Operator - -## Prerequisites - -To develop on scylla-operator, your environment must have the following: - -1. [Go 1.13](https://golang.org/dl/) - * Make sure [GOPATH](https://github.com/golang/go/wiki/SettingGOPATH) is set to `GOPATH=$HOME/go`. -2. [Kustomize v3.1.0](https://github.com/kubernetes-sigs/kustomize/releases/tag/v3.1.0) -3. [kubebuilder v2.3.1](https://github.com/kubernetes-sigs/kubebuilder/releases/tag/v2.3.1) -4. [Docker](https://docs.docker.com/install/) -5. Git client installed -6. Github account - -To install all dependencies (Go, kustomize, kubebuilder, dep), simply run: -```bash -./install-dependencies.sh -``` - -## Initial Setup - -### Create a Fork - -From your browser navigate to [http://github.com/scylladb/scylla-operator](http://github.com/scylladb/scylla-operator) and click the "Fork" button. - -### Clone Your Fork - -Open a console window and do the following: - -```bash -# Create the scylla operator repo path -mkdir -p $GOPATH/src/github.com/scylladb - -# Navigate to the local repo path and clone your fork -cd $GOPATH/src/github.com/scylladb - -# Clone your fork, where is your GitHub account name -git clone https://github.com//scylla-operator.git -``` - -### Add Upstream Remote - -First you will need to add the upstream remote to your local git: -```bash -# Add 'upstream' to the list of remotes -git remote add upstream https://github.com/scylladb/scylla-operator.git - -# Verify the remote was added -git remote -v -``` -Now you should have at least `origin` and `upstream` remotes. You can also add other remotes to collaborate with other contributors. - -## Development - -To add a feature or to make a bug fix, you will need to create a branch in your fork and then submit a pull request (PR) from the branch. - -### Building the project - -You can build the project using the Makefile commands: -* Open the Makefile and change the `IMG` environment variable to a repository you have access to. -* Run `make docker-push` and wait for the image to be built and uploaded in your repo. - -### Create a Branch - -From a console, create a new branch based on your fork and start working on it: - -```bash -# Ensure all your remotes are up to date with the latest -git fetch --all - -# Create a new branch that is based off upstream master. Give it a simple, but descriptive name. -# Generally it will be two to three words separated by dashes and without numbers. -git checkout -b feature-name upstream/master -``` - -Now you are ready to make the changes and commit to your branch. - -### Updating Your Fork - -During the development lifecycle, you will need to keep up-to-date with the latest upstream master. As others on the team push changes, you will need to `rebase` your commits on top of the latest. This avoids unnecessary merge commits and keeps the commit history clean. - -Whenever you need to update your local repository, you never want to merge. You **always** will rebase. Otherwise you will end up with merge commits in the git history. If you have any modified files, you will first have to stash them (`git stash save -u ""`). - -```bash -git fetch --all -git rebase upstream/master -``` - -Rebasing is a very powerful feature of Git. You need to understand how it works or else you will risk losing your work. Read about it in the [Git documentation](https://git-scm.com/docs/git-rebase), it will be well worth it. In a nutshell, rebasing does the following: -- "Unwinds" your local commits. Your local commits are removed temporarily from the history. -- The latest changes from upstream are added to the history -- Your local commits are re-applied one by one -- If there are merge conflicts, you will be prompted to fix them before continuing. Read the output closely. It will tell you how to complete the rebase. -- When done rebasing, you will see all of your commits in the history. - -## Submitting a Pull Request - -Once you have implemented the feature or bug fix in your branch, you will open a PR to the upstream repo. Before opening the PR ensure you have added unit tests, are passing the integration tests, cleaned your commit history, and have rebased on the latest upstream. - -In order to open a pull request (PR) it is required to be up to date with the latest changes upstream. If other commits are pushed upstream before your PR is merged, you will also need to rebase again before it will be merged. - -### Commit History - -To prepare your branch to open a PR, you will need to have the minimal number of logical commits so we can maintain -a clean commit history. Most commonly a PR will include a single commit where all changes are squashed, although -sometimes there will be multiple logical commits. - -```bash -# Inspect your commit history to determine if you need to squash commits -git log - -# Rebase the commits and edit, squash, or even reorder them as you determine will keep the history clean. -# In this example, the last 5 commits will be opened in the git rebase tool. -git rebase -i HEAD~5 -``` - -Once your commit history is clean, ensure you have based on the [latest upstream](#updating-your-fork) before you open the PR. - -### Commit messages - -Please make the first line of your commit message a summary of the change that a user (not a developer) of Operator would like to read, -and prefix it with the most relevant directory of the change followed by a colon. -The changelog gets made by looking at just these first lines so make it good! - -If you have more to say about the commit, then enter a blank line and carry on the description. -Remember to say why the change was needed - the commit itself shows what was changed. - -Writing more is better than less. Comparing the behaviour before the change to that after the change is very useful. -Imagine you are writing to yourself in 12 months time when you've forgotten everything about what you just did, and you need to get up to speed quickly. - -If the change fixes an issue then write Fixes #1234 in the commit message. -This can be on the subject line if it will fit. If you don't want to close the associated issue just put #1234 and the change will get linked into the issue. - -Here is an example of a short commit message: - -``` -sidecar: log on reconcile loop - fixes #1234 -``` - -And here is an example of a longer one: -``` - -api: now supports host networking (#1234) - -The operator CRD now has a "network" property that can be used to -select host networking as well as setting the apropriate DNS policy. - -Fixes #1234 -``` - -### Submitting - -Go to the [Scylla Operator github](https://www.github.com/scylladb/scylla-operator) to open the PR. If you have pushed recently, you should see an obvious link to open the PR. If you have not pushed recently, go to the Pull Request tab and select your fork and branch for the PR. - -After the PR is open, you can make changes simply by pushing new commits. Your PR will track the changes in your fork and update automatically. diff --git a/docs/source/getting-started/eks.md b/docs/source/getting-started/eks.md deleted file mode 100644 index a7f9b0e9b09..00000000000 --- a/docs/source/getting-started/eks.md +++ /dev/null @@ -1,128 +0,0 @@ -# Deploying ScyllaDB on EKS - -This guide is focused on deploying Scylla on EKS with improved performance. -Performance tricks used by the script won't work with different machine tiers. -It sets up the kubelets on EKS nodes to run with [static cpu policy](https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/) and uses [local sdd disks](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html) in RAID0 for maximum performance. - -Most of the commands used to setup the Scylla cluster are the same for all environments -As such we have tried to keep them separate in the [general guide](generic.md). - -## TL;DR; - -If you don't want to run the commands step-by-step, you can just run a script that will set everything up for you: -```bash -# Edit according to your preference -EKS_REGION=us-east-1 -EKS_ZONES=us-east-1a,us-east-1b,us-east-1c - -# From inside the examples/eks folder -cd examples/eks -./eks.sh -z "$EKS_ZONES" -r "$EKS_REGION" -``` - -After you deploy, see how you can [benchmark your cluster with cassandra-stress](generic.md#benchmark-with-cassandra-stress). - -## Walkthrough - -### EKS Setup - -#### Configure environment variables - -First of all, we export all the configuration options as environment variables. -Edit according to your own environment. - -``` -EKS_REGION=us-east-1 -EKS_ZONES=us-east-1a,us-east-1b,us-east-1c -CLUSTER_NAME=scylla-demo -``` - -#### Creating an EKS cluster - -For this guide, we'll create an EKS cluster with the following: - -* A NodeGroup of 3 `i3-2xlarge` Nodes, where the Scylla Pods will be deployed. These nodes will only accept pods having `scylla-clusters` toleration. - -``` - - name: scylla-pool - instanceType: i3.2xlarge - desiredCapacity: 3 - labels: - scylla.scylladb.com/node-type: scylla - taints: - role: "scylla-clusters:NoSchedule" - ssh: - allow: true - kubeletExtraConfig: - cpuManagerPolicy: static -``` - -* A NodeGroup of 4 `c4.2xlarge` Nodes to deploy `cassandra-stress` later on. These nodes will only accept pods having `cassandra-stress` toleration. - -``` - - name: cassandra-stress-pool - instanceType: c4.2xlarge - desiredCapacity: 4 - labels: - pool: "cassandra-stress-pool" - taints: - role: "cassandra-stress:NoSchedule" - ssh: - allow: true -``` - -* A NodeGroup of 1 `i3.large` Node, where the monitoring stack and operator will be deployed. -``` - - name: monitoring-pool - instanceType: i3.large - desiredCapacity: 1 - labels: - pool: "monitoring-pool" - ssh: - allow: true -``` - -### Prerequisites - -#### Installing script third party dependencies - -Script requires several dependencies: -- eksctl - See: https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html -- kubectl - See: https://kubernetes.io/docs/tasks/tools/install-kubectl/ - -### Deploying ScyllaDB Operator - -Refer to [Deploying Scylla on a Kubernetes Cluster](generic.md) in the ScyllaDB Operator documentation to deploy the ScyllaDB Operator and its prerequisites. - -#### Setting up nodes for ScyllaDB - -ScyllaDB, except when in developer mode, requires storage with XFS filesystem. The local NVMes from the cloud provider usually come as individual devices. To use their full capacity together, you'll first need to form a RAID array from those disks. -`NodeConfig` performs the necessary RAID configuration and XFS filesystem creation, as well as it optimizes the nodes. You can read more about it in [Performance tuning](performance.md) section of ScyllaDB Operator's documentation. - -Deploy `NodeConfig` to let it take care of the above operations: -``` -kubectl apply --server-side -f examples/eks/nodeconfig-alpha.yaml -``` - -#### Deploying Local Volume Provisioner - -Afterwards, deploy ScyllaDB's [Local Volume Provisioner](https://github.com/scylladb/k8s-local-volume-provisioner), capable of dynamically provisioning PersistentVolumes for your ScyllaDB clusters on mounted XFS filesystems, earlier created over the configured RAID0 arrays. -``` -kubectl -n local-csi-driver apply --server-side -f examples/common/local-volume-provisioner/local-csi-driver/ -``` - -### Deploying ScyllaDB - -Now you can follow the steps described in [Deploying Scylla on a Kubernetes Cluster](generic.md) to launch your ScyllaDB cluster in a highly performant environment. - -#### Accessing the database - -Instructions on how to access the database can also be found in the [generic guide](generic.md). - -### Deleting an EKS cluster - -Once you are done with your experiments delete your cluster using the following command: - -``` -eksctl delete cluster "${CLUSTER_NAME}" -``` diff --git a/docs/source/getting-started/gke.md b/docs/source/getting-started/gke.md deleted file mode 100644 index eef4eab0091..00000000000 --- a/docs/source/getting-started/gke.md +++ /dev/null @@ -1,173 +0,0 @@ -# Deploying ScyllaDB on GKE - -This guide is focused on deploying Scylla on GKE with maximum performance (without any persistence guarantees). -It sets up the kubelets on GKE nodes to run with [static cpu policy](https://kubernetes.io/blog/2018/07/24/feature-highlight-cpu-manager/) and uses [local sdd disks](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/local-ssd) in RAID0 for maximum performance. - -Most of the commands used to setup the Scylla cluster are the same for all environments -As such we have tried to keep them separate in the [general guide](generic.md). - -## TL;DR; - -If you don't want to run the commands step-by-step, you can just run a script that will set everything up for you: -```bash -# Edit according to your preference -GCP_USER=$(gcloud config list account --format "value(core.account)") -GCP_PROJECT=$(gcloud config list project --format "value(core.project)") -GCP_ZONE=us-west1-b - -# From inside the examples/gke folder -cd examples/gke -./gke.sh -u "$GCP_USER" -p "$GCP_PROJECT" -z "$GCP_ZONE" - -# Example: -# ./gke.sh -u yanniszark@arrikto.com -p gke-demo-226716 -z us-west1-b -``` - -:::{warning} -Make sure to pass a ZONE (ex.: us-west1-b) and not a REGION (ex.: us-west1) or it will deploy nodes in each ZONE available in the region. -::: - -After you deploy, see how you can [benchmark your cluster with cassandra-stress](generic.md#benchmark-with-cassandra-stress). - -## Walkthrough - -### Google Kubernetes Engine Setup - -#### Configure environment variables - -First of all, we export all the configuration options as environment variables. -Edit according to your own environment. - -``` -GCP_USER=$( gcloud config list account --format "value(core.account)" ) -GCP_PROJECT=$( gcloud config list project --format "value(core.project)" ) -GCP_REGION=us-west1 -GCP_ZONE=us-west1-b -CLUSTER_NAME=scylla-demo -CLUSTER_VERSION=$( gcloud container get-server-config --zone ${GCP_ZONE} --format "value(validMasterVersions[0])" ) -``` - -#### Creating a GKE cluster - -First we need to change kubelet CPU Manager policy to static by providing a config file. Create file called `systemconfig.yaml` with the following content: -``` -kubeletConfig: - cpuManagerPolicy: static -``` - -Then we'll create a GKE cluster with the following: - -1. A NodePool of 2 `n1-standard-8` Nodes, where the operator and the monitoring stack will be deployed. These are generic Nodes and their free capacity can be used for other purposes. - ``` - gcloud container \ - clusters create "${CLUSTER_NAME}" \ - --cluster-version "${CLUSTER_VERSION}" \ - --node-version "${CLUSTER_VERSION}" \ - --machine-type "n1-standard-8" \ - --num-nodes "2" \ - --disk-type "pd-ssd" --disk-size "20" \ - --image-type "UBUNTU_CONTAINERD" \ - --enable-stackdriver-kubernetes \ - --no-enable-autoupgrade \ - --no-enable-autorepair - ``` - -2. A NodePool of 2 `n1-standard-32` Nodes to deploy `cassandra-stress` later on. - - ``` - gcloud container --project "${GCP_PROJECT}" \ - node-pools create "cassandra-stress-pool" \ - --cluster "${CLUSTER_NAME}" \ - --zone "${GCP_ZONE}" \ - --node-version "${CLUSTER_VERSION}" \ - --machine-type "n1-standard-32" \ - --num-nodes "2" \ - --disk-type "pd-ssd" --disk-size "20" \ - --node-taints role=cassandra-stress:NoSchedule \ - --image-type "UBUNTU_CONTAINERD" \ - --no-enable-autoupgrade \ - --no-enable-autorepair - ``` - -3. A NodePool of 4 `n1-standard-32` Nodes, where the Scylla Pods will be deployed. Each of these Nodes has 8 local NVMe SSDs attached, which are provided as [raw block devices](https://cloud.google.com/kubernetes-engine/docs/concepts/local-ssd#block). It is important to disable `autoupgrade` and `autorepair`. Automatic cluster upgrade or node repair has a hard timeout after which it no longer respect PDBs and force deletes the Compute Engine instances, which also deletes all data on the local SSDs. At this point, it's better to handle upgrades manually, with more control over the process and error handling. - ``` - gcloud container \ - node-pools create "scylla-pool" \ - --cluster "${CLUSTER_NAME}" \ - --node-version "${CLUSTER_VERSION}" \ - --machine-type "n1-standard-32" \ - --num-nodes "4" \ - --disk-type "pd-ssd" --disk-size "20" \ - --local-nvme-ssd-block count="8" \ - --node-taints role=scylla-clusters:NoSchedule \ - --node-labels scylla.scylladb.com/node-type=scylla \ - --image-type "UBUNTU_CONTAINERD" \ - --system-config-from-file=systemconfig.yaml \ - --no-enable-autoupgrade \ - --no-enable-autorepair - ``` - -#### Setting Yourself as `cluster-admin` -> (By default GKE doesn't give you the necessary RBAC permissions) - -Get the credentials for your new cluster -``` -gcloud container clusters get-credentials "${CLUSTER_NAME}" --zone="${GCP_ZONE}" -``` - -Create a ClusterRoleBinding for your user. -In order for this to work you need to have at least permission `container.clusterRoleBindings.create`. -The easiest way to obtain this permission is to enable the `Kubernetes Engine Admin` role for your user in the GCP IAM web interface. -``` -kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user "${GCP_USER}" -``` - - -### Prerequisites - -### Deploying ScyllaDB Operator - -Refer to [Deploying Scylla on a Kubernetes Cluster](generic.md) in the ScyllaDB Operator documentation to deploy the ScyllaDB Operator and its prerequisites. - -#### Setting up nodes for ScyllaDB - -ScyllaDB, except when in developer mode, requires storage with XFS filesystem. The local NVMes from the cloud provider usually come as individual devices. To use their full capacity together, you'll first need to form a RAID array from those disks. -`NodeConfig` performs the necessary RAID configuration and XFS filesystem creation, as well as it optimizes the nodes. You can read more about it in [Performance tuning](performance.md) section of ScyllaDB Operator's documentation. - -Deploy `NodeConfig` to let it take care of the above operations: -``` -kubectl apply --server-side -f examples/gke/nodeconfig-alpha.yaml -``` - -#### Deploying Local Volume Provisioner - -Afterwards, deploy ScyllaDB's [Local Volume Provisioner](https://github.com/scylladb/k8s-local-volume-provisioner), capable of dynamically provisioning PersistentVolumes for your ScyllaDB clusters on mounted XFS filesystems, earlier created over the configured RAID0 arrays. -``` -kubectl -n local-csi-driver apply --server-side -f examples/common/local-volume-provisioner/local-csi-driver/ -kubectl apply --server-side -f examples/common/local-volume-provisioner/local-csi-driver/00_scylladb-local-xfs.storageclass.yaml -``` - -### Deploy Scylla cluster -In order for the example to work you need to modify the cluster definition in the following way: - -``` -sed -i "s//${GCP_REGION}/g;s//${GCP_ZONE}/g" examples/gke/cluster.yaml -``` - -This will inject your region and zone into the cluster definition so that it matches the kubernetes cluster you just created. - -### Deploying ScyllaDB - -Now you can follow the steps described in [Deploying Scylla on a Kubernetes Cluster](generic.md) to launch your ScyllaDB cluster in a highly performant environment. - -#### Accessing the database - -Instructions on how to access the database can also be found in the [generic guide](generic.md). - -### Deleting a GKE cluster - -Once you are done with your experiments delete your cluster using the following command: - -``` -gcloud container --project "${GCP_PROJECT}" clusters delete --zone "${GCP_ZONE}" "${CLUSTER_NAME}" -``` diff --git a/docs/source/getting-started/index.md b/docs/source/getting-started/index.md deleted file mode 100644 index a607fa4cf5d..00000000000 --- a/docs/source/getting-started/index.md +++ /dev/null @@ -1,9 +0,0 @@ -# Getting started - -```{toctree} -:maxdepth: 1 - -overview -gke -eks -``` diff --git a/docs/source/getting-started/overview.md b/docs/source/getting-started/overview.md deleted file mode 100644 index 07dd0c5c770..00000000000 --- a/docs/source/getting-started/overview.md +++ /dev/null @@ -1 +0,0 @@ -# Overview diff --git a/docs/source/index.md b/docs/source/index.md index 21932dfd4cb..948f731202f 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -4,26 +4,17 @@ sd_hide_title: true # Scylla Operator Documentation -```{toctree} +:::{toctree} :hidden: :maxdepth: 1 architecture/index installation/index -getting-started/index resources/index +quickstarts/index support/index api-reference/index -generic -manager -migration - -exposing -multidc/index -performance -upgrade -contributing -``` +::: ## Scylla Operator @@ -49,6 +40,8 @@ contributing {{productName}} project helps users to run ScyllaDB on Kubernetes. It extends the Kubernetes APIs using [CustomResourceDefinitions(CRDs)](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/) and runs controllers that reconcile the desired state declared using these APIs. +{{productName}} works with both ScyllaDB Open Source and ScyllaDB Enterprise. You only have to [change the ScyllaCluster image repository]() and [adjust the ScyllaDB utils image using ScyllaOperatorConfig]() + Here is a subset of items to start with bellow. You can also navigate through the documentation using the menu. @@ -71,43 +64,46 @@ Learn about the components of Scylla Operator and how they fit together. :::{grid-item-card} {material-regular}`electric_bolt;2em` Installation :link: installation/overview +Configure your Kubernetes platform, install prerequisites and all components of {{productName}}. +++ [Learn more »](installation/overview) ::: -:::{grid-item-card} {material-regular}`explore;2em` Getting Started -:link: getting-started/overview - +:::{grid-item-card} {material-regular}`storage;2em` Working with Resources +:link: resources/overview +Learn about the APIs that {{productName}} provides. ScyllaClusters, ScyllaDBMonitorings and more. +++ -[Learn more »](getting-started/overview) +[Learn more »](resources/overview) ::: -:::{grid-item-card} {material-regular}`electric_bolt;2em` Working with Resources -:link: resources/overview -Creating ScyllaDB clusters, ... +:::{grid-item-card} {material-regular}`explore;2em` Quickstarts +:link: quickstarts/index + +Get it running right now. Simple GKE and EKS setups. + +++ -[Learn more »](resources/overview) +[Learn more »](quickstarts/index) ::: :::{grid-item-card} {material-regular}`fitness_center;2em` Performance Tuning -:link: installation/overview - +:link: architecture/tuning +Tuning your infra and ScyllaDB cluster for the best performance and latency. +++ -[Learn more »](installation/overview) +[Learn more »](architecture/tuning) ::: :::{grid-item-card} {material-regular}`build;2em` Support -:link: installation/overview -must-gather, FAQs, support matrix and more. +:link: support/overview +FAQs, support matrix, must-gather and more. +++ -[Learn more »](installation/overview) +[Learn more »](support/overview) ::: :::{grid-item-card} {material-regular}`menu_book;2em` API Rererence -:link: installation/overview +:link: api-reference/index Visit the automatically generated API reference for all our APIs. +++ -[Learn more »](installation/overview) +[Learn more »](api-reference/index) ::: :::: diff --git a/docs/source/installation/gitops.md b/docs/source/installation/gitops.md index b0738f3b11a..6ec8f58726b 100644 --- a/docs/source/installation/gitops.md +++ b/docs/source/installation/gitops.md @@ -2,11 +2,11 @@ ## Disclaimer -For the ease of use all the following commands reference manifests that come from the same repository as the source code is being built from. -This means we can't have a pinned reference to the latest z-stream as that is a [chicken-egg problem](https://en.wikipedia.org/wiki/Chicken_or_the_egg). Therefore, we use a rolling tag for the particular branch in our manifests. +The following commands reference manifests that come from the same repository as the source code is being built from. +This means we can't have a pinned reference to the latest release as that is a [chicken-egg problem](https://en.wikipedia.org/wiki/Chicken_or_the_egg). Therefore, we use a rolling tag for the particular branch in our manifests. :::{caution} For production deployment, you should always replace the {{productName}} image with a stable reference. -We'd encourage you to use a sha reference, although using full-version tags is also ok. +We'd encourage you to use a sha reference, although using full-version tags is also fine. ::: @@ -65,6 +65,8 @@ done ### {{productName}} +Once you have the dependencies installed and available in your cluster, it is the time to install the {{productName}}. + :::{code-block} shell :substitutions: @@ -173,7 +175,10 @@ kubectl -n=local-csi-driver apply --server-side -f=https://raw.githubusercontent kubectl -n=local-csi-driver rollout status --timeout=10m daemonset.apps/local-csi-driver ::: -### ScyllaDBManager +### ScyllaDB Manager + +:::{include} ../.internal/manager-license-note.md +::: :::::{tab-set} diff --git a/docs/source/installation/helm.md b/docs/source/installation/helm.md index 29e889f01d7..34125ce53f2 100644 --- a/docs/source/installation/helm.md +++ b/docs/source/installation/helm.md @@ -1,8 +1,6 @@ # Helm -:::{warning} -Helm doesn't support managing CustomResourceDefinition resources ([#5871](https://github.com/helm/helm/issues/5871), [#7735](https://github.com/helm/helm/issues/7735)) -These are only created on first install and never updated. In order to update them, users have to do it manually every time. +:::{include} ../.internal/helm-crd-warning.md ::: In this example we will install Scylla stack on Kubernetes. This includes the following components: @@ -334,6 +332,26 @@ helm upgrade --install scylla --namespace scylla scylla/scylla -f examples/helm/ Helm should notice the difference, install the ServiceMonitor, and then Prometheous will be able to scrape metrics. +## Upgrade via Helm + +Replace `` with the name of your Helm release for Scylla Operator and replace `` with the version number you want to install: +1. Make sure Helm chart repository is up-to-date: + ``` + helm repo add scylla-operator https://storage.googleapis.com/scylla-operator-charts/stable + helm repo update + ``` +2. Update CRD resources. We recommend using `--server-side` flag for `kubectl apply`, if your version supports it. + ``` + tmpdir=$( mktemp -d ) \ + && helm pull scylla-operator/scylla-operator --version --untar --untardir "${tmpdir}" \ + && find "${tmpdir}"/scylla-operator/crds/ -name '*.yaml' -printf '-f=%p ' \ + | xargs kubectl apply + ``` +3. Update Scylla Operator + ``` + helm upgrade --version scylla-operator/scylla-operator + ``` + ## Cleanup To remove these applications you can simply uninstall them using Helm CLI: diff --git a/docs/source/installation/index.md b/docs/source/installation/index.md index 0418f2fde71..50a02ed71e1 100644 --- a/docs/source/installation/index.md +++ b/docs/source/installation/index.md @@ -1,10 +1,10 @@ # Installation -```{toctree} +:::{toctree} :maxdepth: 1 overview -platform/index +kubernetes/index gitops helm -``` +::: diff --git a/docs/source/installation/platform/eks.md b/docs/source/installation/kubernetes/eks.md similarity index 100% rename from docs/source/installation/platform/eks.md rename to docs/source/installation/kubernetes/eks.md diff --git a/docs/source/installation/kubernetes/generic.md b/docs/source/installation/kubernetes/generic.md new file mode 100644 index 00000000000..11585ad38b6 --- /dev/null +++ b/docs/source/installation/kubernetes/generic.md @@ -0,0 +1,25 @@ +# Generic + +Because {{productName}} aims to leverage the best performance available, there is a few extra steps that need to be configured on your Kubernetes cluster. + +## Kubelet + +### Static CPU policy + +By default, *kubelet* uses the CFS quota to enforce pod CPU limits. +When the Kubernetes node runs a lot of CPU-bound Pods, the processes can move over different CPU cores, depending on whether the Pod +is throttled and which CPU cores are available. +However, kubelet may be configured to assign CPUs exclusively by setting the CPU manager policy to static. + +To get the best performance and latency ScyllaDB Pods should run under [static CPU policy](https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy) to pin cores. + +:::{note} +Configuring kubelet options is provider specific. +We provide a few examples for the major ones later in this section, otherwise please consult the documentation for your Kubernetes platform. +::: + +## Nodes + +### Labels + +For the purposes of the installation guides, we assume that the nodes meant to run ScyllaDB (ScyllaClusters) have label `scylla.scylladb.com/node-type=scylla`. diff --git a/docs/source/installation/platform/gke.md b/docs/source/installation/kubernetes/gke.md similarity index 93% rename from docs/source/installation/platform/gke.md rename to docs/source/installation/kubernetes/gke.md index 8f6ffe2381a..b1ba88d1fa4 100644 --- a/docs/source/installation/platform/gke.md +++ b/docs/source/installation/kubernetes/gke.md @@ -5,8 +5,8 @@ ### Static CPU policy GKE allows you to set [static CPU policy](./generic.md#static-cpu-policy) using a [node system configuration](https://cloud.google.com/kubernetes-engine/docs/how-to/node-system-config) -```{code} yaml +:::{code} yaml :number-lines: kubeletConfig: cpuManagerPolicy: static -``` +::: diff --git a/docs/source/installation/platform/index.md b/docs/source/installation/kubernetes/index.md similarity index 50% rename from docs/source/installation/platform/index.md rename to docs/source/installation/kubernetes/index.md index 505b4cc525b..fc560ee48e6 100644 --- a/docs/source/installation/platform/index.md +++ b/docs/source/installation/kubernetes/index.md @@ -1,9 +1,9 @@ -# Platform +# Kubernetes -```{toctree} +:::{toctree} :maxdepth: 1 generic eks gke -``` +::: diff --git a/docs/source/installation/overview.md b/docs/source/installation/overview.md index 235dabe168c..0dfc798155c 100644 --- a/docs/source/installation/overview.md +++ b/docs/source/installation/overview.md @@ -18,7 +18,7 @@ Issues on unsupported platforms are unlikely to be addressed. Before reporting and issue, please see our [support page](../support/overview.md) and [troubleshooting installation issues](../support/troubleshooting/installation) ::: -## Components +## {{productName}} components Scylla Operator consists of multiple components that need to be installed in your cluster. This is by no means a complete list of all resources, rather is aims to show the major components in one place. @@ -30,7 +30,7 @@ This is by no means a complete list of all resources, rather is aims to show the ``` :::{note} -Depending on [which storage provisioner you choose](../architecture/storage/overview.md), the `local-csi-driver` may be replaced by a different component. +Depending on [which storage provisioner you choose](../architecture/storage/overview.md), the `local-csi-driver` may be replaced or complemented by a different component. ::: ### {{productName}} @@ -68,12 +68,14 @@ Before reporting and issue, please see our [support page](../support/overview.md Depending on your preference, there is more than one way to install {{productName}} and there may be more to come / or provided by other parties or supply chains. +At this point, we provide 2 ways to install the operator - [GitOps/manifests](#gitops) and [Helm charts](#helm). Give we provide only a subset of helm charts for the main resources and because **Helm can't update CRDs** - you still have to resort to using the manifests or GitOps anyway. For a consistent experience we'd recommend using the [GitOps flow](#gitops) which will also give you a better idea about what you actually deploy. + :::{caution} Do not use rolling tags (like `latest`, `1.14` with our manifests in production. The manifests and images for a particular release are tightly coupled and any update requires updating both of them, while the rolling tags may surprisingly update only the images. ::: :::{note} -To avoid races, when you create a CRD, you need to wait for it to be propagated to other instances of the kubernetes-apiserver, before you can relliably create the corresponding CRs. +To avoid races, when you create a CRD, you need to wait for it to be propagated to other instances of the kubernetes-apiserver, before you can reliably create the corresponding CRs. ::: :::{note} @@ -87,14 +89,18 @@ We provide a set of Kubernetes manifest that contain all necessary objects to ap Depending on your preference applying them may range from using Git+ArgoCD to Git+kubectl. To keep the instructions clear for everyone we'll demonstrate applying the manifests using `kubectl`, that everyone is familiar with and is able to translate it to the GitOps platform of his choosing. - +For details, please the [dedicated section describing the deployment using GitOps (kubectl)](./gitops.md). ### Helm -:::{warning} -Helm doesn't support managing CustomResourceDefinition resources ([#5871](https://github.com/helm/helm/issues/5871), [#7735](https://github.com/helm/helm/issues/7735)) -These are only created on first install and never updated. In order to update them, users have to do it manually every time. + +:::{include} ../.internal/helm-crd-warning.md ::: +For details, please the [dedicated section describing the deployment using Helm](./helm.md). + ## Upgrades -TODo: compatibility (N-1) warning + link the upgrade docs +{{productName}} supports N+1 upgrades only. +That means to you can only update by 1 minor version at the time and wait for it to successfully roll out and then update all ScyllaCLusters that also run using the image that's being updated. ({{productName}} injects it as a sidecar to help run and manage ScyllaDB.) + +We value the stability of our APIs and all API changes are backwards compatible. diff --git a/docs/source/installation/platform/generic.md b/docs/source/installation/platform/generic.md deleted file mode 100644 index 1215788429c..00000000000 --- a/docs/source/installation/platform/generic.md +++ /dev/null @@ -1,15 +0,0 @@ -# Generic - -Because {{productName}} aim to leverage the best performance available, there is a few extra step that need to be configured on your Kubernetes cluster. - -## Kubelet - -### Static CPU policy - -To get the best performance and latency ScyllaDB Pods should run under [static CPU policy](https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy) to pin cores. - -## Nodes - -### Labels - -For the purposes of the installation guides, we assume that the nodes meant to run ScyllaDB (ScyllaClusters) have label `scylla.scylladb.com/node-type=scylla`. diff --git a/docs/source/upgrade.md b/docs/source/installation/upgrade.md similarity index 82% rename from docs/source/upgrade.md rename to docs/source/installation/upgrade.md index bc458be7c3d..ad1dc94dead 100644 --- a/docs/source/upgrade.md +++ b/docs/source/installation/upgrade.md @@ -1,31 +1,10 @@ -# Upgrade of Scylla Operator +# Upgrades This page describes Scylla Operator upgrade procedures. There are two generic update procedures - via Helm and via kubectl. Before upgrading, please check this page to find out if your target version requires additional upgrade steps. -## Upgrade via Helm -Helm doesn't support managing CustomResourceDefinition resources ([#5871](https://github.com/helm/helm/issues/5871), [#7735](https://github.com/helm/helm/issues/7735)) -These are only created on first install and never updated. In order to update them, users have to do it manually. - -Replace `` with the name of your Helm release for Scylla Operator and replace `` with the version number you want to install: -1. Make sure Helm chart repository is up-to-date: - ``` - helm repo add scylla-operator https://storage.googleapis.com/scylla-operator-charts/stable - helm repo update - ``` -2. Update CRD resources. We recommend using `--server-side` flag for `kubectl apply`, if your version supports it. - ``` - tmpdir=$( mktemp -d ) \ - && helm pull scylla-operator/scylla-operator --version --untar --untardir "${tmpdir}" \ - && find "${tmpdir}"/scylla-operator/crds/ -name '*.yaml' -printf '-f=%p ' \ - | xargs kubectl apply - ``` -3. Update Scylla Operator - ``` - helm upgrade --version scylla-operator/scylla-operator - ``` ## Upgrade via kubectl diff --git a/docs/source/manager.md b/docs/source/manager.md deleted file mode 100644 index 9a8db2fd37a..00000000000 --- a/docs/source/manager.md +++ /dev/null @@ -1,258 +0,0 @@ -# Deploying Scylla Manager on a Kubernetes Cluster - -Scylla Manager is a product for database operations automation, -it can schedule tasks such as repairs and backups. -Scylla Manager can manage multiple Scylla clusters and run cluster-wide tasks -in a controlled and predictable way. - -Scylla Manager is available for Scylla Enterprise customers and Scylla Open Source users. -With Scylla Open Source, Scylla Manager is limited to 5 nodes. -See the Scylla Manager [Proprietary Software License Agreement](https://www.scylladb.com/scylla-manager-software-license-agreement/) for details. - -## Prerequisites - -* Kubernetes cluster -* Scylla Operator - see [generic guide](generic.md) - -## Architecture - -Scylla Manager in K8s consist of: -- Dedicated Scylla Cluster - - Scylla Manager persists its state to a Scylla cluster. -Additional small single node cluster is spawned in the Manager namespace. - -- Scylla Manager Controller - - Main mission of Controller is to watch changes of Scylla Clusters, and synchronize three states. - 1. What user wants - task definition in CRD. - 2. What Controller registered - Task name to Task ID mapping - CRD status. - 3. Scylla Manager task listing - internal state of Scylla Manager. - - When Scylla Cluster CRD is being deployed Controller will register it in Scylla Manager once cluster reaches desired node count. -Once Cluster is fully up and running it will schedule all tasks defined in Cluster CRD. -Controller also supports task updates and unscheduling. - -- Scylla Manager - - Regular Scylla Manager, the same used in cloud and bare metal deployments. - - - -## Deploy Scylla Manager - -Deploy the Scylla Manager using the following commands: - -```console -kubectl apply -f deploy/manager-prod.yaml -``` - -This will install the Scylla Manager in the `scylla-manager` namespace. -You can check if the Scylla Manager is up and running with: - -```console -kubectl -n scylla-manager get pods -NAME READY STATUS RESTARTS AGE -scylla-manager-cluster-manager-dc-manager-rack-0 2/2 Running 0 37m -scylla-manager-controller-0 1/1 Running 0 28m -scylla-manager-scylla-manager-7bd9f968b9-w25jw 1/1 Running 0 37m -``` - -As you can see there are three pods: -* `scylla-manager-cluster-manager-dc-manager-rack-0` - is a single node Scylla cluster. -* `scylla-manager-controller-0` - Scylla Manager Controller. -* `scylla-manager-scylla-manager-7bd9f968b9-w25jw` - Scylla Manager. - -To see if Scylla Manager is fully up and running we can check their logs. -To do this, execute following command: - - ```console -kubectl -n scylla-manager logs scylla-manager-controller-0 -``` - -The output should be something like: -```console -{"L":"INFO","T":"2020-09-23T11:25:27.882Z","M":"Scylla Manager Controller started","version":"","build_date":"","commit":"","built_by":"","go_version":"","options":{"Name":"scylla-manager-controller-0","Namespace":"scylla-manager","LogLevel":"debug","ApiAddress":"http://127.0.0.1:5080/api/v1"},"_trace_id":"LQEJV3kDR5Gx9M3XQ2YnnQ"} -{"L":"INFO","T":"2020-09-23T11:25:28.435Z","M":"Registering Components.","_trace_id":"LQEJV3kDR5Gx9M3XQ2YnnQ"} -``` - -To check logs of Scylla Manager itself, use following command: -```console -kubectl -n scylla-manager logs scylla-manager-scylla-manager-7bd9f968b9-w25jw -``` - -The output should be something like: - -```console -{"L":"INFO","T":"2020-09-23T11:26:53.238Z","M":"Scylla Manager Server","version":"2.1.2-0.20200816.76cc4dcc","pid":1,"_trace_id":"xQhkJ0OuR8e6iMDEpM62Hg"} -{"L":"INFO","T":"2020-09-23T11:26:54.519Z","M":"Using config","config":{"HTTP":"127.0.0.1:5080","HTTPS":"","TLSCertFile":"/var/lib/scylla-manager/scylla_manager.crt","TLSKeyFile":"/var/lib/scylla-manager/scylla_manager.key","TLSCAFile":"","Prometheus":":56090","PrometheusScrapeInterval":5000000000,"debug":"127.0.0.1:56112","Logger":{"Mode":"stderr","Level":"info","Development":false},"Database":{"Hosts":["scylla-manager-cluster-manager-dc-manager-rack-0.scylla-manager.svc"],"SSL":false,"User":"","Password":"","LocalDC":"","Keyspace":"scylla_manager","MigrateDir":"/etc/scylla-manager/cql","MigrateTimeout":30000000000,"MigrateMaxWaitSchemaAgreement":300000000000,"ReplicationFactor":1,"Timeout":600000000,"TokenAware":true},"SSL":{"CertFile":"","Validate":true,"UserCertFile":"","UserKeyFile":""},"Healthcheck":{"Timeout":250000000,"SSLTimeout":750000000},"Backup":{"DiskSpaceFreeMinPercent":10,"AgeMax":43200000000000},"Repair":{"SegmentsPerRepair":1,"ShardParallelMax":0,"ShardFailedSegmentsMax":100,"PollInterval":200000000,"ErrorBackoff":300000000000,"AgeMax":0,"ShardingIgnoreMsbBits":12}},"config_files":["/mnt/etc/scylla-manager/scylla-manager.yaml"],"_trace_id":"xQhkJ0OuR8e6iMDEpM62Hg"} -{"L":"INFO","T":"2020-09-23T11:26:54.519Z","M":"Checking database connectivity...","_trace_id":"xQhkJ0OuR8e6iMDEpM62Hg"} -``` - -If there are no errors in the logs, let's spin a Scylla Cluster. - -## Cluster registration - - -When the Scylla Manager is fully up and running, lets create a regular instance of Scylla cluster. - -See [generic tutorial](generic.md) to spawn your cluster. - -Note: If you already have some Scylla Clusters, after installing Manager they should be -automatically registered in Scylla Manager. - -Once cluster reaches desired node count, cluster status will be updated with ID under which it was registered in Manager. - - ```console -kubectl -n scylla describe Cluster - -[...] -Status: - Manager Id: d1d532cd-49f2-4c97-9263-25126532803b - Racks: - us-east-1a: - Members: 3 - Ready Members: 3 - Version: 4.0.0 -``` -You can use this ID to talk to Scylla Manager using `sctool` CLI installed in Scylla Manager Pod. -You can also use Cluster name in `namespace/cluster-name` format. - -```console -kubectl -n scylla-manager exec -ti scylla-manager-scylla-manager-7bd9f968b9-w25jw -- sctool task list - -Cluster: scylla/simple-cluster (d1d532cd-49f2-4c97-9263-25126532803b) -╭─────────────────────────────────────────────────────────────┬──────────────────────────────────────┬────────────────────────────────┬────────╮ -│ Task │ Arguments │ Next run │ Status │ -├─────────────────────────────────────────────────────────────┼──────────────────────────────────────┼────────────────────────────────┼────────┤ -│ healthcheck/400b2723-eec5-422a-b7f3-236a0e10575b │ │ 23 Sep 20 14:28:42 CEST (+15s) │ DONE │ -│ healthcheck_rest/28169610-a969-4c20-9d11-ab7568b8a1bd │ │ 23 Sep 20 14:29:57 CEST (+1m) │ NEW │ -╰─────────────────────────────────────────────────────────────┴──────────────────────────────────────┴────────────────────────────────┴────────╯ - -``` - -Scylla Manager by default registers recurring healhcheck tasks for Agent and for each of the enabled frontends (CQL, Alternator). - -In this task listing we can see CQL and REST healthchecks. - -## Task scheduling - -You can either define tasks prior Cluster creation, or for existing Cluster. -Let's edit already running cluster definition to add repair and backup task. -```console -kubectl -n scylla edit Cluster simple-cluster -``` - -Add following task definition to Cluster spec: -``` - repairs: - - name: "users repair" - keyspace: ["users"] - interval: "1d" - backups: - - name: "weekly backup" - location: ["s3:cluster-backups"] - retention: 3 - interval: "7d" - - name: "daily backup" - location: ["s3:cluster-backups"] - retention: 7 - interval: "1d" -``` - -For full task definition configuration consult [ScyllaCluster CRD](api-reference/groups/scylla.scylladb.com/scyllaclusters.rst). - -**Note**: Scylla Manager Agent must have access to above bucket prior the update in order to schedule backup task. -Consult Scylla Manager documentation for details on how to set it up. - -Scylla Manager Controller will spot this change and will schedule tasks in Scylla Manager. - -```console -kubectl -n scylla-manager exec -ti scylla-manager-scylla-manager-7bd9f968b9-w25jw -- sctool task list - -Cluster: scylla/simple-cluster (d1d532cd-49f2-4c97-9263-25126532803b) -╭─────────────────────────────────────────────────────────────┬──────────────────────────────────────┬────────────────────────────────┬────────╮ -│ Task │ Arguments │ Next run │ Status │ -├─────────────────────────────────────────────────────────────┼──────────────────────────────────────┼────────────────────────────────┼────────┤ -│ healthcheck/400b2723-eec5-422a-b7f3-236a0e10575b │ │ 23 Sep 20 14:28:42 CEST (+15s) │ DONE │ -│ backup/275aae7f-c436-4fc8-bcec-479e65fb8372 │ -L s3:cluster-backups --retention 3 │ 23 Sep 20 14:28:58 CEST (+7d) │ NEW │ -│ healthcheck_rest/28169610-a969-4c20-9d11-ab7568b8a1bd │ │ 23 Sep 20 14:29:57 CEST (+1m) │ NEW │ -│ repair/d4946360-c29d-4bb4-8b9d-619ada495c2a │ │ 23 Sep 20 14:38:42 CEST │ NEW │ -╰─────────────────────────────────────────────────────────────┴──────────────────────────────────────┴────────────────────────────────┴────────╯ - -``` - -As you can see, we have two new tasks, weekly recurring backup, and one repair which should start shortly. - -To check progress of run you can use following command: - -```console -kubectl -n scylla-manager exec -ti scylla-manager-scylla-manager-7bd9f968b9-w25jw -- sctool task progress --cluster d1d532cd-49f2-4c97-9263-25126532803b repair/d4946360-c29d-4bb4-8b9d-619ada495c2a -Status: RUNNING -Start time: 23 Sep 20 14:38:42 UTC -Duration: 13s -Progress: 2.69% -Datacenters: - - us-east-1 -+--------------------+-------+ -| system_auth | 8.06% | -| system_distributed | 0.00% | -| system_traces | 0.00% | -+--------------------+-------+ - -``` -Other tasks can be also tracked using the same command, but using different task ID. -Task IDs are present in Cluster Status as well as in task listing. - -## Clean Up - -To clean up all resources associated with Scylla Manager, you can run the commands below. - -**NOTE:** this will destroy your Scylla Manager database and delete all of its associated data. - -```console -kubectl delete -f deploy/manager-prod.yaml -``` - -## Troubleshooting - -**Manager is not running** - -If the Scylla Manager does not come up, the first step would be to examine the Manager and Controller logs: - -```console -kubectl -n scylla-manager logs -f scylla-manager-controller-0 scylla-manager-controller -kubectl -n scylla-manager logs -f scylla-manager-controller-0 scylla-manager-scylla-manager-7bd9f968b9-w25jw -``` - - -**My task wasn't scheduled** - -If your task wasn't scheduled, Cluster status will be updated with error messages for each failed task. -You can also consult Scylla Manager logs. - -Example: - -Following status describes error when backup task cannot be scheduled, due to lack of access to bucket: -```console -Status: - Backups: - Error: create backup target: location is not accessible: 10.100.16.62: giving up after 2 attempts: after 15s: timeout - make sure the location is correct and credentials are set, to debug SSH to 10.100.16.62 and run "scylla-manager-agent check-location -L s3:manager-test --debug"; 10.107.193.33: giving up after 2 attempts: after 15s: timeout - make sure the location is correct and credentials are set, to debug SSH to 10.107.193.33 and run "scylla-manager-agent check-location -L s3:manager-test --debug"; 10.109.197.60: giving up after 2 attempts: after 15s: timeout - make sure the location is correct and credentials are set, to debug SSH to 10.109.197.60 and run "scylla-manager-agent check-location -L s3:manager-test --debug" - Id: 00000000-0000-0000-0000-000000000000 - Interval: 0 - Location: - s3:manager-test - Name: adhoc backup - Num Retries: 3 - Retention: 3 - Start Date: now - Manager Id: 2b9dbe8c-9daa-4703-a66d-c29f63a917c8 - Racks: - us-east-1a: - Members: 3 - Ready Members: 3 - Version: 4.0.0 -``` - -Because Controller is infinitely retrying to schedule each defined task, once permission issues will be resolved, -task should appear in task listing and Cluster status. diff --git a/docs/source/migration.md b/docs/source/migration.md deleted file mode 100644 index 6b450637a22..00000000000 --- a/docs/source/migration.md +++ /dev/null @@ -1,146 +0,0 @@ -# Version migrations - - -## `v0.3.0` -> `v1.0.0` migration - -`v0.3.0` used a very common name as a CRD kind (`Cluster`). In `v1.0.0` this issue was solved by using less common kind -which is easier to disambiguate (`ScyllaCluster`). -***This change is backward incompatible, which means manual migration is needed.*** - -This procedure involves having two CRDs registered at the same time. We will detach Scylla Pods -from Scylla Operator for a short period to ensure that nothing is garbage collected when Scylla Operator is upgraded. -Compared to the [upgrade guide](upgrade.md) where full deletion is requested, this procedure shouldn't cause downtimes. -Although detaching resources from their controller is considered hacky. This means that you shouldn't run procedure -out of the box on production. Make sure this procedure works well multiple times on your staging environment first. - -***Read the whole procedure and make sure you understand what is going on before executing any of the commands!*** - -In case of any issues or questions regarding this procedure, you're welcomed on our [Scylla Users Slack](http://slack.scylladb.com/) -on #kubernetes channel. - -## Procedure - -1. Execute this whole procedure for each cluster sequentially. To get a list of existing clusters execute the following - ``` - kubectl -n scylla get cluster.scylla.scylladb.com - - NAME AGE - simple-cluster 30m - ``` - All below commands will use `scylla` namespace and `simple-cluster` as a cluster name. -1. Make sure you're using v1.0.0 tag: - ``` - git checkout v1.0.0 - ``` -1. Upgrade your `cert-manager` to `v1.0.0`. If you installed it from a static file from this repo, simply execute the following: - ``` - kubectl apply -f examples/common/cert-manager.yaml - ``` - If your `cert-manager` was installed in another way, follow official instructions on `cert-manager` website. -1. `deploy/operator.yaml` file contains multiple resources. Extract **only** `CustomResourceDefinition` to separate file. -1. Install v1.0.0 CRD definition from file created in the previous step: - ``` - kubectl apply -f examples/common/crd.yaml - ``` -1. Save your existing `simple-cluster` Cluster definition to a file: - ``` - kubectl -n scylla get cluster.scylla.scylladb.com simple-cluster -o yaml > existing-cluster.yaml - ``` -1. Migrate `Kind` and `ApiVersion` to new values using: - ``` - sed -i 's/scylla.scylladb.com\/v1alpha1/scylla.scylladb.com\/v1/g' existing-cluster.yaml - sed -i 's/kind: Cluster/kind: ScyllaCluster/g' existing-cluster.yaml - ``` -1. Install migrated CRD instance - ``` - kubectl apply -f existing-cluster.yaml - ``` - At this point, we should have two CRDs describing your Scylla cluster, although the new one is not controlled by the Operator. -1. Get UUID of newly created ScyllaCluster resource: - ``` - kubectl -n scylla get ScyllaCluster simple-cluster --template="{{ .metadata.uid }}" - - 12a3678d-8511-4c9c-8a48-fa78d3992694 - ``` - Save output UUID somewhere, it will be referred as `` in commands below. - - ***Depending on your shell, you might get additional '%' sign at the end of UUID, make sure to remove it!*** - -1. Upgrade ClusterRole attached to each of the Scylla nodes to grant them permission to lookup Scylla clusters: - ``` - kubectl patch ClusterRole simple-cluster-member --type "json" -p '[{"op":"add","path":"/rules/-","value":{"apiGroups":["scylla.scylladb.com"],"resources":["scyllaclusters"],"verbs":["get"]}}]' - ``` - Amend role name according to your cluster name, it should look like `-member`. -1. Get a list of all Services associated with your cluster. First get list of all services: - ``` - kubectl -n scylla get svc -l "scylla/cluster=simple-cluster" - - NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE - simple-cluster-client ClusterIP None 9180/TCP 109m - simple-cluster-us-east-1-us-east-1a-0 ClusterIP 10.43.23.96 7000/TCP,7001/TCP,7199/TCP,10001/TCP,9042/TCP,9142/TCP,9160/TCP 109m - simple-cluster-us-east-1-us-east-1a-1 ClusterIP 10.43.66.22 7000/TCP,7001/TCP,7199/TCP,10001/TCP,9042/TCP,9142/TCP,9160/TCP 108m - simple-cluster-us-east-1-us-east-1a-2 ClusterIP 10.43.246.25 7000/TCP,7001/TCP,7199/TCP,10001/TCP,9042/TCP,9142/TCP,9160/TCP 106m - - ``` -1. For each service, change its `ownerReference` to point to new CRD instance: - ``` - kubectl -n scylla patch svc --type='json' -p='[{"op": "replace", "path": "/metadata/ownerReferences/0/apiVersion", "value":"scylla.scylladb.com/v1"}, {"op": "replace", "path": "/metadata/ownerReferences/0/kind", "value":"ScyllaCluster"}, {"op": "replace", "path": "/metadata/ownerReferences/0/uid", "value":""}]' - ``` - Replace `` with Service name, and `` with saved UUID from one of the previous steps. -1. Get a list of all Services again to see if none was deleted. Check also "Age" column, it shouldn't be lower than previous result. - ``` - kubectl -n scylla get svc -l "scylla/cluster=simple-cluster" - - NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE - simple-cluster-client ClusterIP None 9180/TCP 110m - simple-cluster-us-east-1-us-east-1a-0 ClusterIP 10.43.23.96 7000/TCP,7001/TCP,7199/TCP,10001/TCP,9042/TCP,9142/TCP,9160/TCP 110m - simple-cluster-us-east-1-us-east-1a-1 ClusterIP 10.43.66.22 7000/TCP,7001/TCP,7199/TCP,10001/TCP,9042/TCP,9142/TCP,9160/TCP 109m - simple-cluster-us-east-1-us-east-1a-2 ClusterIP 10.43.246.25 7000/TCP,7001/TCP,7199/TCP,10001/TCP,9042/TCP,9142/TCP,9160/TCP 107m - - ``` -1. Get a list of StatefulSets associated with your cluster: - ``` - kubectl -n scylla get sts -l "scylla/cluster=simple-cluster" - - NAME READY AGE - simple-cluster-us-east-1-us-east-1a 3/3 104m - ``` -1. For each StatefulSet from previous step, change its `ownerReference` to point to new CRD instance. - - ``` - kubectl -n scylla patch sts --type='json' -p='[{"op": "replace", "path": "/metadata/ownerReferences/0/apiVersion", "value":"scylla.scylladb.com/v1"}, {"op": "replace", "path": "/metadata/ownerReferences/0/kind", "value":"ScyllaCluster"}, {"op": "replace", "path": "/metadata/ownerReferences/0/uid", "value":""}]' - ``` - Replace `` with StatefulSet name, and `` with saved UUID from one of the previous steps. - -1. Now when all k8s resources bound to Scylla are attached to new CRD, we can remove 0.3.0 Operator and old CRD definition. - Checkout `v0.3.0` version, and remove Scylla Operator, and old CRD: - ``` - git checkout v0.3.0 - kubectl delete -f examples/generic/operator.yaml - ``` -1. Checkout `v1.0.0`, and install upgraded Scylla Operator: - ``` - git checkout v1.0.0 - kubectl apply -f deploy/operator.yaml - ``` -1. Wait until Scylla Operator boots up: - ``` - kubectl -n scylla-operator-system wait --for=condition=ready pod --all --timeout=600s - ``` -1. Get a list of StatefulSets associated with your cluster: - ``` - kubectl -n scylla get sts -l "scylla/cluster=simple-cluster" - - NAME READY AGE - simple-cluster-us-east-1-us-east-1a 3/3 104m -1. For each StatefulSet from previous step, change its sidecar container image to `v1.0.0`, and wait until change will be propagated. This step will initiate a rolling restart of pods one by one. - ``` - kubectl -n scylla patch sts --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/initContainers/0/image", "value":"scylladb/scylla-operator:v1.0.0"}]' - kubectl -n scylla rollout status sts - ``` - Replace `` with StatefulSet name. -1. If you're using Scylla Manager, bump Scylla Manager Controller image to `v1.0.0` - ``` - kubectl -n scylla-manager-system patch sts scylla-manager-controller --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"scylladb/scylla-operator:v1.0.0"}]' - ``` -1. Your Scylla cluster is now migrated to `v1.0.0`. diff --git a/docs/source/multidc/index.rst b/docs/source/multidc/index.rst deleted file mode 100644 index a2f1eae7709..00000000000 --- a/docs/source/multidc/index.rst +++ /dev/null @@ -1,18 +0,0 @@ -========================================================== -Deploying multi-datacenter ScyllaDB clusters in Kubernetes -========================================================== - -Prepare a platform for a multi datacenter ScyllaDB cluster deployment: - -.. toctree:: - :maxdepth: 1 - - eks - gke - -Deploy a multi-datacenter ScyllaDB cluster in Kubernetes: - -.. toctree:: - :maxdepth: 1 - - multidc diff --git a/docs/source/performance.md b/docs/source/performance.md deleted file mode 100644 index 1d5fd8b2796..00000000000 --- a/docs/source/performance.md +++ /dev/null @@ -1,100 +0,0 @@ -# Performance tuning - -Scylla Operator 1.6 introduces a new experimental feature allowing users to optimize Kubernetes nodes. - -:::{warning} -We recommend that you first try out the performance tuning on a pre-production instance. -Given the nature of the underlying tuning script, undoing the changes requires rebooting the Kubernetes node(s). -::: - -## Node tuning - -Starting from Operator 1.6, a new CRD called NodeConfig is available, allowing users to target Nodes which should be tuned. -When a Node is supposed to be optimized, the Scylla Operator creates a DaemonSet covering these Nodes. -Nodes matching the provided placement conditions will be subject to tuning. - -Below example NodeConfig tunes nodes having `scylla.scylladb.com/node-type=scylla` label: -``` -apiVersion: scylla.scylladb.com/v1alpha1 -kind: NodeConfig -metadata: - name: cluster -spec: - placement: - nodeSelector: - scylla.scylladb.com/node-type: scylla -``` -For more details about new CRD use: -``` -kubectl explain nodeconfigs.scylla.scylladb.com/v1alpha1 -``` - -For all optimizations we use a Python script available in the Scylla image called perftune. -Perftune executes the performance optmizations like tuning the kernel, network, disk devices, spreading IRQs across CPUs and more. - -Tuning consists of two separate optimizations: common node tuning, and tuning based on Scylla Pods and their resource assignment. -Node tuning is executed immediately. Pod tuning is executed when Scylla Pod lands on the same Node. - -Scylla works most efficently when it's pinned to CPU and not interrupted. -One of the most common causes of context-switching are network interrupts. Packets coming to a node need to be processed, -and this requires CPU shares. - -On K8s we always have at least a couple of processes running on the node: kubelet, kubernetes provider applications, daemons etc. -These processes require CPU shares, so we cannot dedicate entire node processing power to Scylla, we need to leave space for others. -We take advantage of it, and we pin IRQs to CPUs not used by any Scylla Pods exclusively. - -Tuning resources are created in a special namespace called `scylla-operator-node-tuning`. - -The tuning is applied only to pods with `Guaranteed` QoS class. Please double check your ScyllaCluster resource specification -to see if it meets all conditions. - -## Kubernetes tuning - -By default, the kubelet uses the CFS quota to enforce pod CPU limits. -When the node runs many CPU-bound pods, the workload can move around different CPU cores depending on whether the pod -is throttled and which CPU cores are available. -However, kubelet may be configured to assign CPUs exclusively, by setting the CPU manager policy to static. - -Setting up kubelet configuration is provider specific. Please check the docs for your distribution or talk to your -provider. - -Only pods within the [Guaranteed QoS class](https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed)) can take advantage of this option. -When such pod lands on a Node, kubelet will pin them to specific CPUs, and those won't be part of the shared pool. - -In our case there are two requirements each ScyllaCluster must fulfill to receive a Guaranteed QoS class: -* resource request and limits must be equal or only limits have to be provided -* agentResources must be provided and their requests and limits must be equal, or only limits have to be provided - -An example of such a ScyllaCluster that receives a Guaranteed QoS class is below: - -``` -apiVersion: scylla.scylladb.com/v1 -kind: ScyllaCluster -metadata: - name: guaranteed-cluster - namespace: scylla -spec: - agentVersion: 3.3.3 - version: 6.2.0 - datacenter: - name: us-east-1 - racks: - - name: us-east-1a - members: 3 - storage: - capacity: 500Gi - agentResources: - requests: - cpu: 1 - memory: 1G - limits: - cpu: 1 - memory: 1G - resources: - requests: - cpu: 4 - memory: 16G - limits: - cpu: 4 - memory: 16G -``` diff --git a/docs/source/quickstarts/eks.md b/docs/source/quickstarts/eks.md new file mode 100644 index 00000000000..b24de89edaf --- /dev/null +++ b/docs/source/quickstarts/eks.md @@ -0,0 +1,134 @@ +# Deploying ScyllaDB on EKS + +This is a quickstart guide to help you set up a basic EKS cluster quickly with local NVMes and solid performance. + +This is by no means a complete guide, and you should always consult your provider's documentation. + +## Prerequisites + +In this guide we'll be using `eksctl` to set up the cluster, and you'll need `kubectl` to talk to it. + +If you don't have those already, or are not available through your package manager, you can try these links to learn more about installing them: +- [eksctl](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html) +- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) + +## Creating an EKS cluster + +First, let's create a declarative config to used with eksctl +:::{code} bash +:linenos: +:emphasize-lines: 10 + +cat > clusterconfig.eksctl.yaml < systemconfig.yaml < (By default GKE doesn't give you the necessary RBAC permissions) + +Get the credentials for your new cluster +``` +gcloud container clusters get-credentials "${CLUSTER_NAME}" --zone="${GCP_ZONE}" +``` + +Create a ClusterRoleBinding for your user. +In order for this to work you need to have at least permission `container.clusterRoleBindings.create`. +The easiest way to obtain this permission is to enable the `Kubernetes Engine Admin` role for your user in the GCP IAM web interface. +``` +kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user "${GCP_USER}" +``` + + +## Setting up storage and tuning + +:::{code-block} bash +:linenos: + +kubectl apply --server-side -f=<