Skip to content

Latest commit

 

History

History
342 lines (248 loc) · 24.2 KB

README.md

File metadata and controls

342 lines (248 loc) · 24.2 KB

Terraform GCP-module for JupyterHub

terraform-tests

This repository defines a Terraform module, which you can use in your code by adding a module configuration and setting its source parameter to URL of this folder. This module builds a Kubernetes-based JupyterHub in Google Cloud as used by Brown University.

In general this module of JupyterHub is configured as follows:

  • Two pools: one for the core components, one for user pods
  • Authentication (Google OAuth has been tested, other are possible), dummy authenticator is the default.
  • We currently use Infoblox to configure our DNS, we will be making that optional in the future.
  • We provide scale-up and scale-down cronjobs that can change the number of replicas to have nodes be warm for users during class-time.
  • Optional shared nfs volume (for shared data, for instance).

For general terraform examples see the examples folder. In practice we deploy one hub per class at Brown. Since most of the deployments are very similar, we use Terragrunt to keep configurations DRY. While our deployment repository is not public at this moment, we hope to provide an example soon.

Getting Started

This module depends on you having GCP credentials of some kind. The module looks for a credential file in JSON format. You should export the following:

GOOGLE_APPLICATION_CREDENTIALS=/path/to/file.json

If the credentials are set correctly, the basic gcloud infrastructure is successfully created

Additionally make sure that gcloud init is using the appropriate service account. This is necessary because this module performs a local exec to get the cluster credentials. You also need to make sure that KUBECONFIG or KUBE_CONFIG_PATH path is set. A typical error seen when the context is not set correctly is

Error: error installing: Post "http://localhost/apis/apps/v1/namespaces/kube-system/deployments": dial tcp [::1]:80: connect: connection refused

Finally, this module also configures records in infoblox and therefore you'll need credentials to the server. For Brown users we recommend using 1password-cli to source your secrets into environment variables (ask for access to creds)., ie

export INFOBLOX_USERNAME=$(op item get infoblox --field username)
export INFOBLOX_PASSWORD=$(op item get infoblox --field password --reveal)
export INFOBLOX_SERVER=$(op item get infoblox --format json | jq -r '.urls[].href' | awk -F/ '{print $3}')

The following envs are required

INFOBLOX_USERNAME
INFORBOX_PASSWORD
INFOBLOX_SERVER

How to use this module

This repository defines a Terraform module, which you can use in your code by adding a module configuration and setting its source parameter to URL of this repository. See the examples folder for guidance

Requirements

Name Version
terraform >= 1.10.0
google 6.15.0
google-beta 6.15.0
helm 2.17.0
kubernetes 2.35.1

Providers

Name Version
google 6.15.0

Modules

Name Source Version
external_infoblox_record git::https://github.com/BrownUniversity/terraform-infoblox-record-a.git v0.1.6
gke_auth terraform-google-modules/kubernetes-engine/google//modules/auth 34.0.0
jhub_cluster git::https://github.com/BrownUniversity/terraform-gcp-cluster.git v0.1.11
jhub_helm ./modules/helm-jhub n/a
jhub_project git::https://github.com/BrownUniversity/terraform-gcp-project.git v0.1.7
jhub_vpc git::https://github.com/BrownUniversity/terraform-gcp-vpc.git v0.1.5
production_infoblox_record git::https://github.com/BrownUniversity/terraform-infoblox-record-a.git v0.1.6

Resources

Name Type
google_compute_address.static resource

Inputs

Name Description Type Default Required
activate_apis The list of apis to activate within the project list(string) [] no
auth_secretkeyvaluemap Key Value Map for secret variables used by the authenticator map(string)
{
"hub.config.DummyAuthenticator.password": "dummy_password"
}
no
auth_type Type OAuth e.g google string "dummy" no
auto_create_network Auto create default network. bool false no
automount_service_account_token Enable automatin mounting of the service account token bool true no
billing_account Billing account id. string n/a yes
cluster_name Cluster name string "default" no
core_pool_auto_repair Enable auto-repair of core-component pool bool true no
core_pool_auto_upgrade Enable auto-upgrade of core-component pool bool true no
core_pool_disk_size_gb Size of disk for core-component pool number 100 no
core_pool_disk_type Type of disk core-component pool string "pd-standard" no
core_pool_image_type Type of image core-component pool string "COS_CONTAINERD" no
core_pool_initial_node_count Number of initial nodes in core-component pool number 1 no
core_pool_local_ssd_count Number of SSDs core-component pool number 0 no
core_pool_machine_type Machine type for the core-component pool string "n1-highmem-4" no
core_pool_max_count Maximum number of nodes in the core-component pool number 3 no
core_pool_min_count Minimum number of nodes in the core-component pool number 1 no
core_pool_name Name for the core-component pool string "core-pool" no
core_pool_preemptible Make core-component pool preemptible bool false no
create_service_account Defines if service account specified to run nodes should be created. bool false no
create_tls_secret If set to true, user will be passing tls key and certificate to create a kubernetes secret, and use it in their helm chart bool true no
default_service_account Project default service account setting: can be one of delete, depriviledge, or keep. string "delete" no
disable_dependent_services Whether services that are enabled and which depend on this service should also be disabled when this service is destroyed. string "true" no
enable_private_nodes (Beta) Whether nodes have internal IP addresses only bool false no
folder_id The ID of a folder to host this project string n/a yes
gcp_zone The GCP zone to deploy the runner into. string "us-east1-b" no
helm_deploy_timeout Time for helm to wait for deployment of chart and downloading of docker image number 1000 no
helm_values_file Relative path and file name. Example: values.yaml string n/a yes
horizontal_pod_autoscaling Enable horizontal pod autoscaling addon bool true no
http_load_balancing Enable httpload balancer addon bool false no
jhub_helm_version Version of the JupyterHub Helm Chart Release string n/a yes
kubernetes_version The Kubernetes version of the masters. If set to 'latest' it will pull latest available version in the selected region. string n/a yes
labels Map of labels for project. map(string)
{
"environment": "automation",
"managed_by": "terraform"
}
no
logging_service The logging service that the cluster should write logs to. Available options include logging.googleapis.com, logging.googleapis.com/kubernetes (beta), and none string "logging.googleapis.com/kubernetes" no
maintenance_start_time Time window specified for daily maintenance operations in RFC3339 format string "03:00" no
monitoring_service The monitoring service that the cluster should write metrics to. Automatically send metrics from pods in the cluster to the Google Cloud Monitoring API. VM metrics will be collected by Google Compute Engine regardless of this setting Available options include monitoring.googleapis.com, monitoring.googleapis.com/kubernetes (beta) and none string "monitoring.googleapis.com/kubernetes" no
network_name Name of the VPC. string "kubernetes-vpc" no
network_policy Enable network policy addon bool true no
org_id Organization id. number n/a yes
project_name Name of the project. string n/a yes
range_name_pods The range name for pods string "kubernetes-pods" no
range_name_services The range name for services string "kubernetes-services" no
record_domain The domain on the record. hostaname.domain = FQDN string n/a yes
record_hostname The domain on the record. hostaname.domain = FQDN string n/a yes
region The region to host the cluster in string "us-east1" no
regional Whether the master node should be regional or zonal bool true no
remove_default_node_pool Remove default node pool while setting up the cluster bool false no
scale_down_command Command for scale-down cron job list(string)
[
"kubectl",
"scale",
"--replicas=0",
"statefulset/user-placeholder"
]
no
scale_down_name Name of scale-down cron job string "scale-down" no
scale_down_schedule Schedule for scale-down cron job string "1 18 * * 1-5" no
scale_up_command Command for scale-up cron job list(string)
[
"kubectl",
"scale",
"--replicas=3",
"statefulset/user-placeholder"
]
no
scale_up_name Name of scale-up cron job string "scale-up" no
scale_up_schedule Schedule for scale-up cron job string "1 6 * * 1-5" no
shared_storage_capacity Size of the shared volume number 5 no
site_certificate File containing the TLS certificate string n/a yes
site_certificate_key File containing the TLS certificate key string n/a yes
subnet_name Name of the subnet. string "kubernetes-subnet" no
tls_secret_name TLS secret name used in secret creation, it must match with what is used by user in helm chart string "jupyterhub-tls" no
use_shared_volume Whether to use a shared NFS volume bool false no
user_pool_auto_repair Enable auto-repair of user pool bool true no
user_pool_auto_upgrade Enable auto-upgrade of user pool bool true no
user_pool_disk_size_gb Size of disk for user pool number 100 no
user_pool_disk_type Type of disk user pool string "pd-standard" no
user_pool_image_type Type of image user pool string "COS_CONTAINERD" no
user_pool_initial_node_count Number of initial nodes in user pool number 1 no
user_pool_local_ssd_count Number of SSDs user pool number 0 no
user_pool_machine_type Machine type for the user pool string "n1-highmem-4" no
user_pool_max_count Maximum number of nodes in the user pool number 20 no
user_pool_min_count Minimum number of nodes in the user pool number 1 no
user_pool_name Name for the user pool string "user-pool" no
user_pool_preemptible Make user pool preemptible bool false no

Outputs

Name Description
cluster_name Cluster name
hub_ip Static IP assigned to the Jupyter Hub
location n/a
project_id Project ID
project_name Project Name
region n/a
zones List of zones in which the cluster resides

Local Development

Merging Policy

Use GitLab Flow.

  • Create feature branches for features and fixes from default branch
  • Merge only from PR with review
  • After merging to default branch a release is drafted using a github action. Check the draft and publish if you and tests are happy

Terraform

We recommend installing the latest version of terraform whenever you are updating this module. The current terraform version for this module is 1.9.2. You can install terraform with homebrew.

Pre-commit hooks

You should make sure that pre-commit hooks are installed to run the formater, linter, etc. Install and configure terraform pre-commit hooks as follows:

Install dependencies

brew bundle install

Install the pre-commit hook globally

DIR=~/.git-template
git config --global init.templateDir ${DIR}
pre-commit init-templatedir -t pre-commit ${DIR}

To run the hooks specified in .pre-commit-config.yaml:

pre-commit run -a

| Hook name                                        | Description                                                                                                                |
| ------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------- |
| `terraform_fmt`                                  | Rewrites all Terraform configuration files to a canonical format.                                                          |
| `terraform_docs`                                 | Inserts input and output documentation into `README.md`.                                                       |
| `terraform_tflint`                               | Validates all Terraform configuration files with [TFLint](https://github.com/terraform-linters/tflint).                              |
| `terraform_tfsec`                                | [TFSec](https://github.com/liamg/tfsec) static analysis of terraform templates to spot potential security issues.     |

GCloud and Infoblox Secrets

This is only needed if running tests locally. The google-cloud-sdk and last-pass cli are included in the Brewfile so it should now be installed

This repo includes a env.sh file that where you set the path to the google credentials file and infoblox secrets. First you'll need to make sure you are logged in to last pass,

eval $(op signin) 

Then use

source env.sh

to set the related environment variables. If you need to unset them, you can use

deactivate

As of 2022-08 Gcloud authentication needs an additional plugin to be installed. Run

gcloud components install gke-gcloud-auth-plugin

See here for more information.

Testing

This repository uses the native terraform tests to test the modules. In the tests directory you can find examples of how each module can be used and the test scripts.

Setup secrets

In addition to the GCLOUD and INFOBLOX variables configured by the env.sh file, we also need to add some additional secret variables.

In the example folders, rename the following files:

  • local-example.tfvars to secrets.auto.tfvars
  • local-example.yaml to secrets.yaml

Set the corresponding values inside of the files. They should automatically be ignored via our .gitignore file

Run the tests

Use the terraform test command to test the modules in this repo. You can also specify the name of the files to run each test individually:

terraform test -filter=tests/test-sample-jhub.tftest.hcl      # runs the test without nfs
terraform test -filter=tests/test-sample-jhub-nfs.tftest.hcl  # runs the test with nfs

Running terraform in a container

If you need finer control when trouble shooting, you can directly run terraform within the container specified by the Dockerfile.

First, build the Dockerfile with:

docker build -t <image_name> --platform linux/amd64 .

Note that --platform linux/amd64 is necessary for ARM-based systems (e.g. Apple Silicon Macs).

Then run the docker container with

docker run -t -d -v $(pwd):/usr/app --platform linux/amd64 <image_name>

Finally, you can get a shell inside the running container with:

docker exec -it <container_name> /bin/bash

Follow the next section to authenticate to Google Cloud and 1Password.

Troubleshooting

Further troubleshooting will require interacting with the kubernetes cluster directly, and you'll need to authenticate to the cluster. You can do so for instance as follows,

PROJECT=jhub-sample-xxxxx
ZONE=us-east1-b

gcloud container clusters get-credentials default --zone ${ZONE} --project ${PROJECT}

If gcloud is not authenticated, then do so as follows

gcloud auth activate-service-account <service-account> --key-file=<path-tojson-credentials>
--project=$PROJECT

CI

This project has three workflows enabled:

  1. PR labeler: When opening a PR to the main branch, a label is given assigned automatically according to the name of your feature branch. The labeler follows the follows rules in pr-labeler.yml

  2. Release Drafter: When merging to master, a release is drafted using the Release-Drafter Action

  3. terraform test is run on every commit unless [skip ci] is added to commit message.

Maintenance/Upgrades

We aim to upgrade this package at least once a year.

Update the version of Terraform

Use tfenv to manage your versions of terraform. You can update the version in the .terraform-version file and run tfenv install and tf use to install and use the version specified in the file.

You should also update the version of terraform specified in the versions.tf file