sync

scylladb · Nov 13, 2024 · c17cd84 · c17cd84
1 parent c3e7d4d
commit c17cd84
Show file tree

Hide file tree

Showing 10 changed files with 337 additions and 162 deletions.
diff --git a/docs/poetry.lock b/docs/poetry.lock
diff --git a/docs/pyproject.toml b/docs/pyproject.toml
@@ -10,8 +10,9 @@ package-mode = false
 python = "^3.10"
 pygments = "^2.18.0"
 sphinx-scylladb-theme = "^1.8.1"
-sphinx-substitution-extensions = "=2024.10.17"
+#sphinx-substitution-extensions = "=2024.10.17"
 sphinx-sitemap = "^2.6.0"
+beartype = ">0.0.0"
 sphinx-autobuild = "^2024.4.19"
 Sphinx = "^8.1.3"
 sphinx-multiversion-scylla = "^0.3.1"

diff --git a/docs/source/architecture/overview.md b/docs/source/architecture/overview.md
@@ -2,17 +2,17 @@
 
 ## Foreword
 
-{{ productName }} is a set of controllers and API extensions that need to be installed in your cluster.
+{{productName}} is a set of controllers and API extensions that need to be installed in your cluster.
 The Kubernetes API is extended using [CustomResourceDefinitions (CRDs)](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) and [dynamic admission webhooks](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) to provide new resources ([API reference](/api-reference/index)).
-These resources are reconciled by controllers embedded within {{ productName }} deployment.
+These resources are reconciled by controllers embedded within {{productName}} deployment.
 
-ScyllaDB is a stateful application and {{ productName }} requires you to have a storage provisioner installed in your cluster.
+ScyllaDB is a stateful application and {{productName}} requires you to have a storage provisioner installed in your cluster.
 To achieve the best performance, we recommend using a storage provisioner based on local NVMEs.
 You can learn more about different setups in [a dedicated storage section](./storage/overview.md).
 
 ## Components
 
-{{ productName }} deployment consists of several components that needs to be installed / present in you Kubernetes cluster.
+{{productName}} deployment consists of several components that needs to be installed / present in you Kubernetes cluster.
 By design, some of the components need elevated permissions, but they are only accessible to the administrators.
 
 

diff --git a/docs/source/architecture/tuning.md b/docs/source/architecture/tuning.md
@@ -1,10 +1,10 @@
 # Tuning
 
-To get the best performance and latency {{ productName }} implements performance tuning.
+To get the best performance and latency {{productName}} implements performance tuning.
 
 Performance tuning is enabled by default *when you create a corresponding [NodeConfig](../resources/nodeconfigs.md) for your nodes.
 
 Because some of the operations it needs to perform are not multitenant or priviledged, the tuning scripts are run in a dedicated system namespace called `scylla-operator-node-tuning`.
-This namespace is created and entirely managed by {{ productName }} and only administrators can access it.
+This namespace is created and entirely managed by {{productName}} and only administrators can access it.
 
-When a ScyllaCluster Pod is created (and performance tuning is enabled), the Pod initializes but waits until {{ productName }} runs an on-demand Job that will configure the host and the ScyllaDB process accordingly. Only after that it will actually start running ScyllaDB.
+When a ScyllaCluster Pod is created (and performance tuning is enabled), the Pod initializes but waits until {{productName}} runs an on-demand Job that will configure the host and the ScyllaDB process accordingly. Only after that it will actually start running ScyllaDB.
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -17,7 +17,6 @@
     'sphinx_multiversion',
     "sphinx_sitemap",
     "sphinx_design",
-    # "sphinx_substitution_extensions",
     "myst_parser",
 ]
 
@@ -55,7 +54,7 @@
 myst_heading_anchors = 6
 myst_substitutions = {
   "productName": "Scylla Operator",
-  "repositoryURL": "github.com/scylladb/scylla-operator",
+  "repository": "scylladb/scylla-operator",
   "revision": "master"
 }
 

diff --git a/docs/source/index.md b/docs/source/index.md
@@ -46,7 +46,7 @@ contributing
 :columns: 12 12 12 8
 :class: sd-d-flex-column
 
-{{ productName }} project helps users to run ScyllaDB on Kubernetes. 
+{{productName}} project helps users to run ScyllaDB on Kubernetes. 
 It extends the Kubernetes APIs using [CustomResourceDefinitions(CRDs)](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/) and runs controllers that reconcile the desired state declared  using these APIs.
 
 Here is a subset of items to start with bellow.

diff --git a/docs/source/installation/gitops.md b/docs/source/installation/gitops.md
@@ -1,44 +1,203 @@
 # GitOps (kubectl)
 
-## Manifests location
-sidecar image
+## Disclaimer
 
-manifest location
-
-(move to dedicated section)
+For the ease of use all the following commands reference manifests that come from the same repository as the source code is being built from.
+This means we can't have a pinned reference to the latest z-stream as that is a [chicken-egg problem](https://en.wikipedia.org/wiki/Chicken_or_the_egg). Therefore, we use a rolling tag for the particular branch in our manifests.
+:::{caution}
+For production deployment, you should always replace the {{productName}} image with a stable reference.
+We'd encourage you to use a sha reference, although using full-version tags is also ok.
+:::
 
 
 ## Installation
 
 ### Prerequisites
 
-Scylla Operator has a Cert Manager dependency, you have to install it first.
-It can be installed via following command executed in the root directory of repository:
-```shell
-kubectl apply -f https://{{ repositoryURL }}/blob/{{ revision }}/examples/common/cert-manager.yaml
-```
-```{code-block} bash
+Scylla Operator has a few dependencies that you need to install to your cluster first.
+
+In case you already have a supported version of each of these dependencies installed in your cluster, you can skip this part.
+
+#### Cert Manager
+
+:::{code-block} shell
 :substitutions:
 
-kubectl apply -f https://{{ repositoryURL }}/blob/{{ revision }}/examples/common/cert-manager.yaml
-```
-:::{code-block}
+# Deploy cert-manager.
+kubectl -n=cert-manager apply --server-side -f=https://raw.githubusercontent.com/{{repository}}/{{revision}}/examples/third-party/cert-manager.yaml
+:::
+
+:::{code-block} shell
+# Wait for CRDs to propagate to all apiservers.
+kubectl wait --for condition=established --timeout=60s crd/certificates.cert-manager.io crd/issuers.cert-manager.io
+
+# Wait for components that others steps depend on.
+for deploy in cert-manager{,-cainjector,-webhook}; do
+    kubectl -n=cert-manager rollout status --timeout=10m deployment.apps/"${deploy}"
+done
+
+# Wait for webhook CA secret to be created.
+for i in {1..30}; do
+    { kubectl -n=cert-manager get secret/cert-manager-webhook-ca && break; } || sleep 1
+done
+:::
+
+#### Prometheus Operator
+
+:::{code-block} shell
 :substitutions:
 
-{{repositoryURL}}
-{{ revision }}
+kubectl -n=prometheus-operator apply --server-side -f=https://raw.githubusercontent.com/{{repository}}/{{revision}}/examples/third-party/prometheus-operator.yaml
 :::
-:::{substitution-code}
 
-{{repositoryURL}}
-{{ revision }}
+:::{code-block} shell
+# Wait for CRDs to propagate to all apiservers.
+kubectl wait --for='condition=established' crd/prometheuses.monitoring.coreos.com crd/prometheusrules.monitoring.coreos.com crd/servicemonitors.monitoring.coreos.com
+
+# Wait for prometheus operator deployment.
+kubectl -n=prometheus-operator rollout status --timeout=10m deployment.apps/prometheus-operator
+
+# Wait for webhook CA secret to be created.
+for i in {1..30}; do
+    { kubectl -n=cert-manager get secret/cert-manager-webhook-ca && break; } || sleep 1
+done
 :::
 
-### CustomResourceDefinitions
-```console
+### {{productName}}
 
-```
+:::{code-block} shell
+:substitutions:
 
-### Operator
+kubectl -n=scylla-operator apply --server-side -f=https://raw.githubusercontent.com/{{repository}}/{{revision}}/deploy/operator/operator.yaml
+:::
+
+::::{caution}
+{{productName}} deployment references its own image that it later runs alongside each ScyllaDB instance. Therefore, you have to also replace the image in the environment variable called `SCYLLA_OPERATOR_IMAGE`:
+:::{code-block} yaml
+:linenos:
+:emphasize-lines: 16,19,20
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: scylla-operator
+  namespace: scylla-operator
+# ...
+spec:
+  # ...
+  template:
+    # ...
+    spec:
+      # ...
+      containers:
+      - name: scylla-operator
+        # ...
+        image: docker.io/scylladb/scylla-operator:1.14.0@sha256:8c75c5780e2283f0a8f9734925352716f37e0e7f41007e50ce9b1d9924046fa1
+        env:
+          # ...
+        - name: SCYLLA_OPERATOR_IMAGE
+          value: docker.io/scylladb/scylla-operator:1.14.0@sha256:8c75c5780e2283f0a8f9734925352716f37e0e7f41007e50ce9b1d9924046fa1
+:::
+The {{productName}} image value and the `SCYLLA_OPERATOR_IMAGE` shall always match.
+Be careful not to use a rolling tag for any of them to avoid an accidental skew!
+::::
+
+:::{code-block} shell
+# Wait for CRDs to propagate to all apiservers.
+kubectl wait --for='condition=established' crd/scyllaclusters.scylla.scylladb.com crd/nodeconfigs.scylla.scylladb.com crd/scyllaoperatorconfigs.scylla.scylladb.com crd/scylladbmonitorings.scylla.scylladb.com
+
+# Wait for the components to deploy.
+kubectl -n=scylla-operator rollout status --timeout=10m deployment.apps/{scylla-operator,webhook-server}
+
+# Wait for webhook CA secret to be created.
+for i in {1..30}; do
+    { kubectl -n=cert-manager get secret/cert-manager-webhook-ca && break; } || sleep 1
+done
+:::
+
+### Setting up local storage on nodes and enabling tuning
+
+:::{caution}
+The following step heavily depends on the platform that you use, the machine type, or the options chosen when creating a node pool.
+
+Please review the [NodeConfig](../resources/nodeconfigs.md) and adjust it for your platform!
+:::
+
+:::::{tab-set}
+
+::::{tab-item} GKE (NVMe)
+:::{code-block} shell
+:substitutions:
+kubectl -n=scylla-operator apply --server-side -f=https://raw.githubusercontent.com/{{repository}}/{{revision}}/examples/gke/nodeconfig-alpha.yaml
+:::
+::::
+
+::::{tab-item} EKS (NVMe)
+:::{code-block} shell
+:substitutions:
+kubectl -n=scylla-operator apply --server-side -f=https://raw.githubusercontent.com/{{repository}}/{{revision}}/examples/eks/nodeconfig-alpha.yaml
+:::
+::::
+
+::::{tab-item} Any platform (Loop devices)
+:::{caution}
+This NodeConfig sets up loop devices instead of NVMe disks and is only intended for development purposes when you don't have the NVMe disks available.
+Do not expect meaningful performance with this setup.
+:::
+:::{code-block} shell
+:substitutions:
+kubectl -n=scylla-operator apply --server-side -f=https://raw.githubusercontent.com/{{repository}}/{{revision}}/examples/generic/nodeconfig-alpha.yaml
+:::
+::::
+
+:::::
+
+:::{note}
+Performance tuning is enabled for all nodes that are selected by the [NodeConfig](../resources/nodeconfigs.md) by default, unless opted-out.
+:::
+
+:::{code-block} shell
+# Wait for the NodeConfig to apply changes to the Kubernetes nodes.
+kubectl wait --for='condition=Reconciled' --timeout=10m nodeconfigs.scylla.scylladb.com/cluster
+:::
 
 ### Local CSI driver
+
+:::{code-block} shell
+:substitutions:
+
+kubectl -n=local-csi-driver apply --server-side -f=https://raw.githubusercontent.com/{{repository}}/{{revision}}/examples/common/local-volume-provisioner/local-csi-driver/{00_namespace,00_scylladb-local-xfs.storageclass,10_csidriver,10_driver.serviceaccount,10_provisioner_clusterrole,20_provisioner_clusterrolebinding,50_daemonset}.yaml
+:::
+
+:::{code-block} shell
+# Wait for it to deploy.
+kubectl -n=local-csi-driver rollout status --timeout=10m daemonset.apps/local-csi-driver
+:::
+
+### ScyllaDBManager
+
+:::::{tab-set}
+
+::::{tab-item} Production (sized)
+:::{code-block} shell
+:substitutions:
+kubectl -n=scylla-manager apply --server-side -f=https://raw.githubusercontent.com/{{repository}}/{{revision}}/deploy/manager-prod.yaml
+:::
+::::
+
+::::{tab-item} Development (sized)
+:::{code-block} shell
+:substitutions:
+kubectl -n=scylla-manager apply --server-side -f=https://raw.githubusercontent.com/{{repository}}/{{revision}}/deploy/manager-dev.yaml
+:::
+::::
+
+:::::
+
+:::{code-block} shell
+# Wait for it to deploy.
+kubectl -n=local-csi-driver rollout status --timeout=10m daemonset.apps/local-csi-driver
+:::
+
+## Next steps
+
+Now that you've successfully installed {{productName}}, it's time to look at [how to run ScyllaDB](../resources/scyllaclusters/overview.md).
diff --git a/docs/source/installation/overview.md b/docs/source/installation/overview.md
@@ -2,11 +2,11 @@
 
 ## Kubernetes
 
-{{ productName }} is a set of controllers and Kubernetes API extensions.
+{{productName}} is a set of controllers and Kubernetes API extensions.
 Therefore, we assume you either have an existing (conformant) Kubenernetes cluster and/or are already familiar with how a Kubernetes cluster is deployed and operated.
 
-{{ productName }} controllers and API extensions may have dependencies on some of the newer Kubernetes features and APIs that need to be available.
-More over, {{ productName }} implements additional features like performance tuning, some of which are platform/OS specific.
+{{productName}} controllers and API extensions may have dependencies on some of the newer Kubernetes features and APIs that need to be available.
+More over, {{productName}} implements additional features like performance tuning, some of which are platform/OS specific.
 While we do our best to implement these routines as generically as possible, sometimes there isn't any low level API to base them on and they may work only on a subset of platforms.
 
 :::{caution}
@@ -24,35 +24,35 @@ Scylla Operator consists of multiple components that need to be installed in you
 This is by no means a complete list of all resources, rather is aims to show the major components in one place.
 
 
-:::{note}
-Depending on [which storage provisioner you choose](../architecture/storage/overview.md), the `local-csi-driver` may be replaced by a different component.
-:::
-
 ```{figure} deploy.svg
 :class: sd-m-auto
 :name: deploy-overview
 ```
 
-### {{ productName }}
+:::{note}
+Depending on [which storage provisioner you choose](../architecture/storage/overview.md), the `local-csi-driver` may be replaced by a different component.
+:::
 
-{{ productName }} contains the Kubernetes API extensions and corresponding controllers and admission hooks that run inside `scylla-operator` namespace.
+### {{productName}}
+
+{{productName}} contains the Kubernetes API extensions and corresponding controllers and admission hooks that run inside `scylla-operator` namespace.
 
 You can learn more about the APIs in [resources section](../resources/overview.md) and the [generated API reference](../api-reference/index.rst). 
 
 ### ScyllaDB Manager
 
 ScyllaDB Manager is a global deployment that is responsible for operating all [ScyllaClusters](../api-reference/groups/scylla.scylladb.com/scyllaclusters.rst) and runs inside `scylla-manager` namespace.
-There is a corresponding controller running in [{{ productName }}](./#{{ productName }}) that syncs the [ScyllaCluster](../api-reference/groups/scylla.scylladb.com/scyllaclusters.rst) metadata, [backups](../api-reference/groups/scylla.scylladb.com/scyllaclusters/#spec-backups) and [repairs](../api-reference/groups/scylla.scylladb.com/scyllaclusters.rst#spec-repairs) tasks into the manager (and vice versa) and avoids accessing the shared instance by users. Unfortunately, at this point, other task like restoring from a backup require executing into the shared ScyllaDB Manager deployment which effectively needs administrator privileges. 
+There is a corresponding controller running in [{{productName}}](./#{{productName}}) that syncs the [ScyllaCluster](../api-reference/groups/scylla.scylladb.com/scyllaclusters.rst) metadata, [backups](../api-reference/groups/scylla.scylladb.com/scyllaclusters/#spec-backups) and [repairs](../api-reference/groups/scylla.scylladb.com/scyllaclusters.rst#spec-repairs) tasks into the manager (and vice versa) and avoids accessing the shared instance by users. Unfortunately, at this point, other task like restoring from a backup require executing into the shared ScyllaDB Manager deployment which effectively needs administrator privileges. 
 
-ScyllaDB Manager uses a small ScyllaCluster instance internally and thus depends on the {{ productName }} deployment.
+ScyllaDB Manager uses a small ScyllaCluster instance internally and thus depends on the {{productName}} deployment.
 
 ### NodeConfig
 
-[NodeConfig](../resources/nodeconfigs.md) is a cluster-coped custom resource provided by {{ productName }} that helps you set local disks on Kubernetes nodes, create and mount a file system, configure performance tuning and more. 
+[NodeConfig](../resources/nodeconfigs.md) is a cluster-coped custom resource provided by {{productName}} that helps you set local disks on Kubernetes nodes, create and mount a file system, configure performance tuning and more. 
 
 ### ScyllaOperatorConfig
 
-[ScyllaOperatorConfig](../resources/scyllaoperatorconfigs.md) is a cluster-coped custom resource provided by {{ productName }} to help you configure {{ productName }}. It helps you configure auxiliary images, see which ones are in use and more. 
+[ScyllaOperatorConfig](../resources/scyllaoperatorconfigs.md) is a cluster-coped custom resource provided by {{productName}} to help you configure {{productName}}. It helps you configure auxiliary images, see which ones are in use and more. 
 
 ### Local CSI driver
 
@@ -66,12 +66,21 @@ Before reporting and issue, please see our [support page](../support/overview.md
 
 ## Installation modes
 
-Depending on your preference, there is more than one way to install {{ productName }} and there may be more to come / or provided by other parties or supply chains.
+Depending on your preference, there is more than one way to install {{productName}} and there may be more to come / or provided by other parties or supply chains.
 
 :::{caution}
 Do not use rolling tags (like `latest`, `1.14` with our manifests in production. The manifests and images for a particular release are tightly coupled and any update requires updating both of them, while the rolling tags may surprisingly update only the images.
 :::
 
+:::{note}
+To avoid races, when you create a CRD, you need to wait for it to be propagated to other instances of the kubernetes-apiserver, before you can relliably create the corresponding CRs.
+:::
+
+:::{note}
+When you create [ValidatingWebhookConfiguration](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#webhook-configuration) or [MutatingWebhookConfiguration](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#webhook-configuration), you have to wait for the corresponding webhook deployments to be available, or the kubernetes-apiserver will fail all requests for resources affected by these webhhok configurations.
+Also note that some platforms have non-conformant networking setups by default that prevents the kube-apiserver from talking to the webhooks - [see our troubleshooting guide for more info](../support/troubleshooting/installation.md#webhooks).
+::: 
+
 ### GitOps
 
 We provide a set of Kubernetes manifest that contain all necessary objects to apply to your Kubernetes cluster.