Skip to content

Commit

Permalink
docs: detail decoupled versus unified deployment models
Browse files Browse the repository at this point in the history
  • Loading branch information
henrybell committed Jan 3, 2025
1 parent 060f546 commit 5acc35e
Show file tree
Hide file tree
Showing 3 changed files with 91 additions and 49 deletions.
File renamed without changes
Binary file added resources/architecture-gke-unified.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
140 changes: 91 additions & 49 deletions terraform/gke/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,12 @@

* [Table of Contents](#table-of-contents)
* [Overview](#overview)
* [Architecture](#architecture)
* [Pros](#pros)
* [Cons](#cons)
* [Options for GKE deployment](#options-for-gke-deployment)
* [Architecture](#architecture)
* [Decoupled Model](#decoupled-model)
* [Unified Model](#unified-model)
* [Before you begin](#before-you-begin)
* [Preparing the Autoscaler Project](#preparing-the-autoscaler-project)
* [Using Firestore for Autoscaler state](#using-firestore-for-autoscaler-state)
Expand All @@ -52,22 +55,67 @@ This directory contains Terraform configuration files to quickly set up the
infrastructure for your Autoscaler for a deployment to
[Google Kubernetes Engine (GKE)][gke].

In this deployment option, all the components of the Autoscaler reside in the
same project as your [Spanner][spanner] instances. A future enhancement may
enable the autoscaler to operate cross-project when running in GKE.

This deployment is ideal for independent teams who want to self-manage the
infrastructure and configuration of their own Autoscalers on Kubernetes.

The GKE deployment has the following pros and cons:

### Pros

* **Kubernetes-based**: For teams that may not be able to use Google Cloud
services such as [Cloud Run functions][cloud-functions], this design enables
the use of the Autoscaler.
* **Configuration**: The control over scheduler parameters belongs to the team
that owns the Spanner instance, therefore the team has the highest degree of
freedom to adapt the Autoscaler to its needs.
* **Infrastructure**: This design establishes a clear boundary of
responsibility and security over the Autoscaler infrastructure because the
team owner of the Spanner instances is also the owner of the Autoscaler
infrastructure.

### Cons

* **Infrastructure**: In contrast to the [Cloud Run functions][cloud-functions]
design, some long-lived infrastructure and services are required.
* **Maintenance**: with each team being responsible for the Autoscaler
configuration and infrastructure it may become difficult to make sure that
all Autoscalers across the company follow the same update guidelines.
* **Audit**: because of the high level of control by each team, a centralized
audit may become more complex.

## Options for GKE deployment

For deployment to GKE there are two options to choose from:

1. Deployment of decoupled Poller and Scaler components, running in separate pods.

2. Deployment of a unified Autoscaler, with Poller and Scaler components
combined.

The decoupled deployment model has the advantage that Poller and Scaler
components can be assigned individual permissions (i.e. run as separate service
accounts), and the two components can be managed and scaled as required to suit
your needs. However, this deployment model relies on the Scaler component being
deployed as a long-running service, which consumes resources.

In contrast, the unified deployment model has the advantage that the Poller and
Scaler components can be deployed as a single pod, which runs as a Kubernetes
cron job. This means there are no long-running components. As well as this,
with Poller and Scaler components combined, only a single service account is
required.

For most use cases, the unified deployment model is recommended.

## Architecture

![architecture-gke][architecture-gke]
### Decoupled Model

![architecture-gke-decoupled][architecture-gke-decoupled]

1. Using a [Kubernetes ConfigMap][kubernetes-configmap] you define which
Spanner instances you would like to be managed by the autoscaler. Currently
these must be in the same project as the cluster that runs the autoscaler.
Spanner instances you would like to be managed by the Autoscaler.

2. Using a [Kubernetes CronJob][kubernetes-cronjob], the autoscaler is
2. Using a [Kubernetes CronJob][kubernetes-cronjob], the Autoscaler is
configured to run on a schedule. By default this is every two minutes,
though this is configurable.

Expand Down Expand Up @@ -99,53 +147,45 @@ infrastructure and configuration of their own Autoscalers on Kubernetes.
[Google Cloud Monitoring][gcm-docs]. See section
[Metrics in GKE deployment](#metrics-in-gke-deployment)

The GKE deployment has the following pros and cons:
### Unified Model

### Pros
![architecture-gke-unified][architecture-gke-unified]

* **Kubernetes-based**: For teams that may not be able to use Google Cloud
services such as [Cloud Run functions][cloud-functions], this design enables
the use of the autoscaler.
* **Configuration**: The control over scheduler parameters belongs to the team
that owns the Spanner instance, therefore the team has the highest degree of
freedom to adapt the Autoscaler to its needs.
* **Infrastructure**: This design establishes a clear boundary of
responsibility and security over the Autoscaler infrastructure because the
team owner of the Spanner instances is also the owner of the Autoscaler
infrastructure.

### Cons
1. Using a [Kubernetes ConfigMap][kubernetes-configmap] you define which
Spanner instances you would like to be managed by the Autoscaler.

* **Infrastructure**: In contrast to the [Cloud Run functions][cloud-functions]
design, some long-lived infrastructure and services are required.
* **Maintenance**: with each team being responsible for the Autoscaler
configuration and infrastructure it may become difficult to make sure that
all Autoscalers across the company follow the same update guidelines.
* **Audit**: because of the high level of control by each team, a centralized
audit may become more complex.
2. Using a [Kubernetes CronJob][kubernetes-cronjob], the Autoscaler is
configured to run on a schedule. By default this is every two minutes,
though this is configurable.

## Further options for GKE deployment
3. When scheduled, an instance of the unifed [Poller][autoscaler-poller] and
[Scaler][autoscaler-scaler] components (henceforth "Autoscaler") is created
as a [Kubernetes Job][kubernetes-job].

For deployment to GKE there are two further options to choose from:
4. The Autoscaler queries the [Cloud Monitoring][cloud-monitoring] API to retrieve
the utilization metrics for each Spanner instance.

1. Deployment of decoupled Poller and Scaler components, running in separate pods.
5. For each Spanner instance, the Autoscaler makes an internal call with a
payload that contains the utilization metrics for the specific
Spanner instance, and some of its corresponding configuration parameters.

2. Deployment of a unified Autoscaler, with Poller and Scaler components
combined.
6. Using the chosen [scaling method][scaling-methods]
the Autoscaler compares the Spanner instance metrics against the recommended
thresholds, plus or minus an [allowed margin][margins] and determines
if the instance should be scaled, and the number of nodes or processing units
that it should be scaled to.

The decoupled deployment model has the advantage that Poller and Scaler
components can be assigned individual permissions (i.e. run as separate service
accounts), and the two components can be managed and scaled as required to suit
your needs. However, this deployment model relies on the Scaler component being
deployed as a long-running service, which consumes resources.
7. The Autoscaler retrieves the time when the instance was last scaled from the
state data stored in [Cloud Firestore][cloud-firestore] (or alternatively
[Spanner][spanner]) and compares it with the current time.

In contrast, the unified deployment model has the advantage that the Poller and
Scaler components can be deployed as a single pod, which runs as a Kubernetes
cron job. This means there are no long-running components. As well as this,
with Poller and Scaler components combined, only a single service account is
required.
8. If the configured cooldown period has passed, then the Autoscaler requests the
Spanner Instance to scale out or in.

For most use cases, the unified deployment model is recommended.
9. The Autoscaler publishes counters to an [OpenTelemetry Collector][otel-collector],
also running in Kubernetes, which is configured to forward these counters to
[Google Cloud Monitoring][gcm-docs]. See section
[Metrics in GKE deployment](#metrics-in-gke-deployment)

## Before you begin

Expand Down Expand Up @@ -563,10 +603,10 @@ Next, follow the instructions in the
cat autoscaler-config/autoscaler-config*.yaml
```
These two files configure each instance of the autoscaler that you
These two files configure each instance of the Autoscaler that you
scheduled in the previous step. Notice the environment variable
`AUTOSCALER_CONFIG`. You can use this variable to reference a configuration
that will be used by that individual instance of the autoscaler. This means
that will be used by that individual instance of the Autoscaler. This means
that you can configure multiple scaling schedules across multiple Spanner
instances.
Expand Down Expand Up @@ -705,8 +745,10 @@ following the instructions above.
```
<!-- LINKS: https://www.markdownguide.org/basic-syntax/#reference-style-links -->
[architecture-gke]: ../../resources/architecture-gke.png
[architecture-gke-decoupled]: ../../resources/architecture-gke-decoupled.png
[architecture-gke-unified]: ../../resources/architecture-gke-unified.png
[autoscaler-poller]: ../../src/poller/README.md
[autoscaler-scaler]: ../../src/scaler/README.md
[autoscaler-config-params]: ../../src/poller/README.md#configuration-parameters
[cron-frequent]: ../../kubernetes/decoupled/autoscaler-pkg/poller/poller.yaml
[cron-hourly]: ../../kubernetes/decoupled/autoscaler-pkg/poller/poller-hourly.yaml
Expand Down

0 comments on commit 5acc35e

Please sign in to comment.