README.yaml

name: terraform-aws-eks-cluster

license: APACHE2

github_repo: cloudposse/terraform-aws-eks-cluster

badges:
  - name: Latest Release
    image: https://img.shields.io/github/release/cloudposse/terraform-aws-eks-cluster.svg?style=for-the-badge
    url: https://github.com/cloudposse/terraform-aws-eks-cluster/releases/latest
  - name: "Last Update"
    image: https://img.shields.io/github/last-commit/cloudposse/terraform-aws-eks-cluster/main?style=for-the-badge
    url: https://github.com/cloudposse/terraform-aws-eks-cluster/commits/main/
  - name: Slack Community
    image: https://slack.cloudposse.com/for-the-badge.svg
    url: https://slack.cloudposse.com

related:
  - name: terraform-aws-eks-workers
    description: Terraform module to provision an AWS AutoScaling Group, IAM Role, and
      Security Group for EKS Workers
    url: https://github.com/cloudposse/terraform-aws-eks-workers
  - name: terraform-aws-ec2-autoscale-group
    description: Terraform module to provision Auto Scaling Group and Launch Template
      on AWS
    url: https://github.com/cloudposse/terraform-aws-ec2-autoscale-group
  - name: terraform-aws-ecs-container-definition
    description: Terraform module to generate well-formed JSON documents (container
      definitions) that are passed to the  aws_ecs_task_definition Terraform resource
    url: https://github.com/cloudposse/terraform-aws-ecs-container-definition
  - name: terraform-aws-ecs-alb-service-task
    description: Terraform module which implements an ECS service which exposes a web
      service via ALB
    url: https://github.com/cloudposse/terraform-aws-ecs-alb-service-task
  - name: terraform-aws-ecs-web-app
    description: Terraform module that implements a web app on ECS and supports autoscaling,
      CI/CD, monitoring, ALB integration, and much more
    url: https://github.com/cloudposse/terraform-aws-ecs-web-app
  - name: terraform-aws-ecs-codepipeline
    description: Terraform module for CI/CD with AWS Code Pipeline and Code Build for
      ECS
    url: https://github.com/cloudposse/terraform-aws-ecs-codepipeline
  - name: terraform-aws-ecs-cloudwatch-autoscaling
    description: Terraform module to autoscale ECS Service based on CloudWatch metrics
    url: https://github.com/cloudposse/terraform-aws-ecs-cloudwatch-autoscaling
  - name: terraform-aws-ecs-cloudwatch-sns-alarms
    description: Terraform module to create CloudWatch Alarms on ECS Service level metrics
    url: https://github.com/cloudposse/terraform-aws-ecs-cloudwatch-sns-alarms
  - name: terraform-aws-ec2-instance
    description: Terraform module for providing a general purpose EC2 instance
    url: https://github.com/cloudposse/terraform-aws-ec2-instance
  - name: terraform-aws-ec2-instance-group
    description: Terraform module for provisioning multiple general purpose EC2 hosts
      for stateful applications
    url: https://github.com/cloudposse/terraform-aws-ec2-instance-group

description: Terraform module to provision an [EKS](https://aws.amazon.com/eks/) cluster on AWS.

introduction: |-
  The module provisions the following resources:

  - EKS cluster of master nodes that can be used together with the [terraform-aws-eks-workers](https://github.com/cloudposse/terraform-aws-eks-workers),
    [terraform-aws-eks-node-group](https://github.com/cloudposse/terraform-aws-eks-node-group) and
    [terraform-aws-eks-fargate-profile](https://github.com/cloudposse/terraform-aws-eks-fargate-profile)
    modules to create a full-blown cluster
  - IAM Role to allow the cluster to access other AWS services
  - Optionally, the module creates and automatically applies an authentication ConfigMap (`aws-auth`) to allow the 
    worker nodes to join the cluster and to add additional users/roles/accounts. (This option is enabled
    by default, but has some caveats noted below. Set `apply_config_map_aws_auth` to `false` to avoid these issues.)

  > [!WARNING]
  > Release `2.0.0` (previously released as version `0.45.0`) contains some changes that 
  > could result in your existing EKS cluster being replaced (destroyed and recreated).
  > To prevent this, follow the instructions in the [v1 to v2 migration path](./docs/migration-v1-v2.md).
  
  > [!NOTE]
  > Every Terraform module that provisions an EKS cluster has faced the challenge that access to the cluster
  > is partly controlled by a resource inside the cluster, a ConfigMap called `aws-auth`. You need to be able to access
  > the cluster through the Kubernetes API to modify the ConfigMap, because 
  > [there is no AWS API for it](https://github.com/aws/containers-roadmap/issues/185). This presents
  > a problem: how do you authenticate to an API endpoint that you have not yet created?
  
  We use the Terraform Kubernetes provider to access the cluster, and it uses the same underlying library
  that `kubectl` uses, so configuration is very similar. However, every kind of configuration we have tried
  has failed at some point.
  - An authentication token can be retrieved using the `aws_eks_cluster_auth` data source. This works as
  long as the token does not expire while Terraform is running, and the token is refreshed during the "plan"
  phase before trying to refresh the state, and the token does not expire in the interval between
  "plan" and "apply". Unfortunately, failures of all these types have been seen. Nevertheless,
  this is the only method that is compatible with Terraform Cloud, so it is the default. It is the only
  method we fully support until AWS [provides an API for managing `aws-auth`](https://github.com/aws/containers-roadmap/issues/185).
  - After creating the EKS cluster, you can generate a `KUBECONFIG` file that configures access to it.
  This works most of the time, but if the file was present and used as part of the configuration to create
  the cluster, and then the file gets deleted (as would happen in a CI system like Terraform Cloud), Terraform
  would not cause the file to be regenerated in time to use it to refresh Terraform's state and the "plan" phase will fail.
  So any `KUBECONFIG` file has to be managed separately.
  - An authentication token can be retrieved on demand by using the `exec` feature of the Kubernetes provider
  to call `aws eks get-token`. This requires that the `aws` CLI be installed and available to Terraform and that it
  has access to sufficient credentials to perform the authentication and is configured to use them. When those
  conditions are met, this is the most reliable method, and the one Cloud Posse prefers to use. However, since 
  it has these requirements that are not always easily met, it is not the default method and it is not 
  fully supported. 
  
  All of the above methods can face additional challenges when using `terraform import` to import
  resources into the Terraform state. The `KUBECONFIG` file method is the only sure way to `import` resources, due to 
  [Terraform limitations](https://github.com/hashicorp/terraform/issues/27934) on providers. You will need to create
  the file, of course, but that is easily done with `aws eks update-kubeconfig`. Depending on the situation,
  you may also be able to import resources by setting `-var apply_config_map_aws_auth=false` during import.
  
  At the moment, the `exec` option appears to be the most reliable method, so we recommend using it if possible,
  but because of the extra requirements it has, we use the data source as the default authentication method.
  
  > [!IMPORTANT]
  > All of the above methods require network connectivity between the host running the
  > `terraform` command and the EKS endpoint. If your EKS cluster does not have public access enabled, this means
  > you need to take extra steps, such as using a VPN to provide access to the private endpoint, or running
  > `terraform` on a host in the same VPC as the EKS cluster.
  
  > [!WARNING]
  > ### Failure during `destroy`
  >
  > If the cluster is destroyed (via Terraform or otherwise) before the Terraform resource
  > responsible for the `aws-auth` ConfigMap is destroyed, Terraform will get stuck trying to delete the ConfigMap, 
  > because it cannot contact the now destroyed cluster. This can show up as a `connection refused` error (usually
  > to `https://localhost/`). The easiest ways to handle this is either to add `-var apply_config_map_aws_auth=false` 
  > to the `destroy` command or to remove the ConfigMap (`...kubernetes_config_map.aws_auth[0]`) from the Terraform
  > state with `terraform state rm`.

  > [!NOTE]
  > We give you the `kubernetes_config_map_ignore_role_changes` option and default it to `true` for the following reasons:
  > - We provision the EKS cluster
  > - Then we wait for the cluster to become available (see `null_resource.wait_for_cluster` in [auth.tf](auth.tf)
  > - Then we provision the Kubernetes Auth ConfigMap to map and add additional roles/users/accounts to Kubernetes groups
  > - That is all we do in this module, but after that, we expect you to use [terraform-aws-eks-node-group](https://github.com/cloudposse/terraform-aws-eks-node-group)
  >   to provision a managed Node Group
  > - Then EKS updates the Auth ConfigMap and adds worker roles to it (for the worker nodes to join the cluster)
  > - Since the ConfigMap is modified outside of Terraform state, Terraform wants to update it to to remove the worker roles EKS added
  > - If you update the ConfigMap without including the worker nodes that EKS added, you will disconnect them from the cluster
  
  However, it is possible to get the worker node roles from the terraform-aws-eks-node-group via Terraform "remote state"
  and include them with any other roles you want to add (example code to be published later), so we make
  ignoring the role changes optional. (This is what we do for Cloud Posse clients.)
  If you do not ignore changes then you will have no problem with making future intentional changes.
  
  The downside of having `kubernetes_config_map_ignore_role_changes` set to true is that if you later want to make changes,
  such as adding other IAM roles to Kubernetes groups, you cannot do so via Terraform, because the role changes are ignored.
  Because of Terraform restrictions, you cannot simply change `kubernetes_config_map_ignore_role_changes` from `true`
  to `false`, apply changes, and set it back to `true` again. Terraform does not allow the
  "ignore" settings to be changed on a resource, so `kubernetes_config_map_ignore_role_changes` is implemented as
  2 different resources, one with ignore settings and one without. If you want to switch from ignoring to not ignoring,
  or vice versa, you must manually move the `aws_auth` resource in the terraform state. Change the setting of
  `kubernetes_config_map_ignore_role_changes`, run `terraform plan`, and you will see that an `aws_auth` resource
  is planned to be destroyed and another one is planned to be created. Use `terraform state mv` to move the destroyed
  resource to the created resource "address", something like
  ```
  terraform state mv 'module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0]' 'module.eks_cluster.kubernetes_config_map.aws_auth[0]'
  ```
  Then run `terraform plan` again and you should see only your desired changes made "in place". After applying your
  changes, if you want to set `kubernetes_config_map_ignore_role_changes` back to `true`, you will again need to use
  `terraform state mv` to move the `auth-map` back to its old "address".

usage: |2-

  For a complete example, see [examples/complete](examples/complete).

  For automated tests of the complete example using [bats](https://github.com/bats-core/bats-core) and [Terratest](https://github.com/gruntwork-io/terratest) (which tests and deploys the example on AWS), see [test](test).

  Other examples:

  - [terraform-aws-components/eks/cluster](https://github.com/cloudposse/terraform-aws-components/tree/master/modules/eks/cluster) - Cloud Posse's service catalog of "root module" invocations for provisioning reference architectures

  ```hcl
    provider "aws" {
      region = var.region
    }

    module "label" {
      source = "cloudposse/label/null"
      # Cloud Posse recommends pinning every module to a specific version
      # version  = "x.x.x"

      namespace  = var.namespace
      name       = var.name
      stage      = var.stage
      delimiter  = var.delimiter
      attributes = ["cluster"]
      tags       = var.tags
    }

    locals {
      # Prior to Kubernetes 1.19, the usage of the specific kubernetes.io/cluster/* resource tags below are required
      # for EKS and Kubernetes to discover and manage networking resources
      # https://www.terraform.io/docs/providers/aws/guides/eks-getting-started.html#base-vpc-networking
      tags = { "kubernetes.io/cluster/${module.label.id}" = "shared" }
    }

    module "vpc" {
      source = "cloudposse/vpc/aws"
      # Cloud Posse recommends pinning every module to a specific version
      # version     = "x.x.x"
      cidr_block = "172.16.0.0/16"

      tags    = local.tags
      context = module.label.context
    }

    module "subnets" {
      source = "cloudposse/dynamic-subnets/aws"
      # Cloud Posse recommends pinning every module to a specific version
      # version     = "x.x.x"

      availability_zones   = var.availability_zones
      vpc_id               = module.vpc.vpc_id
      igw_id               = module.vpc.igw_id
      cidr_block           = module.vpc.vpc_cidr_block
      nat_gateway_enabled  = true
      nat_instance_enabled = false

      tags    = local.tags
      context = module.label.context
    }

    module "eks_node_group" {
      source = "cloudposse/eks-node-group/aws"
      # Cloud Posse recommends pinning every module to a specific version
      # version     = "x.x.x"

      instance_types                     = [var.instance_type]
      subnet_ids                         = module.subnets.public_subnet_ids
      health_check_type                  = var.health_check_type
      min_size                           = var.min_size
      max_size                           = var.max_size
      cluster_name                       = module.eks_cluster.eks_cluster_id

      # Enable the Kubernetes cluster auto-scaler to find the auto-scaling group
      cluster_autoscaler_enabled = var.autoscaling_policies_enabled

      context = module.label.context

      # Ensure the cluster is fully created before trying to add the node group
      module_depends_on = module.eks_cluster.kubernetes_config_map_id
    }

    module "eks_cluster" {
      source = "cloudposse/eks-cluster/aws"
      # Cloud Posse recommends pinning every module to a specific version
      # version = "x.x.x"

      vpc_id     = module.vpc.vpc_id
      subnet_ids = module.subnets.public_subnet_ids

      kubernetes_version    = var.kubernetes_version
      oidc_provider_enabled = true
  
      addons = [
        // https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html#vpc-cni-latest-available-version
        {
          addon_name                  = "vpc-cni"
          addon_version               = var.vpc_cni_version
          resolve_conflicts_on_create = "NONE"
          resolve_conflicts_on_update = "NONE"
          service_account_role_arn    = null
        },
        // https://docs.aws.amazon.com/eks/latest/userguide/managing-kube-proxy.html
        {
          addon_name                  = "kube-proxy"
          addon_version               = var.kube_proxy_version
          resolve_conflicts_on_create = "NONE"
          resolve_conflicts_on_update = "NONE"
          service_account_role_arn    = null
        },
        // https://docs.aws.amazon.com/eks/latest/userguide/managing-coredns.html
        {
          addon_name                  = "coredns"
          addon_version               = var.coredns_version
          resolve_conflicts_on_create = "NONE"
          resolve_conflicts_on_update = "NONE"
          service_account_role_arn    = null
        },
      ]
      addons_depends_on = [module.eks_node_group]

      context = module.label.context
  
      cluster_depends_on = [module.subnets]
    }
  ```

  Module usage with two unmanaged worker groups:

  ```hcl
    locals {
      # Unfortunately, the `aws_ami` data source attribute `most_recent` (https://github.com/cloudposse/terraform-aws-eks-workers/blob/34a43c25624a6efb3ba5d2770a601d7cb3c0d391/main.tf#L141)
      # does not work as you might expect. If you are not going to use a custom AMI you should
      # use the `eks_worker_ami_name_filter` variable to set the right kubernetes version for EKS workers,
      # otherwise the first version of Kubernetes supported by AWS (v1.11) for EKS workers will be selected, but
      # EKS control plane will ignore it to use one that matches the version specified by the `kubernetes_version` variable.
      eks_worker_ami_name_filter = "amazon-eks-node-${var.kubernetes_version}*"
    }

    module "eks_workers" {
      source = "cloudposse/eks-workers/aws"
      # Cloud Posse recommends pinning every module to a specific version
      # version     = "x.x.x"

      attributes                         = ["small"]
      instance_type                      = "t3.small"
      eks_worker_ami_name_filter         = local.eks_worker_ami_name_filter
      vpc_id                             = module.vpc.vpc_id
      subnet_ids                         = module.subnets.public_subnet_ids
      health_check_type                  = var.health_check_type
      min_size                           = var.min_size
      max_size                           = var.max_size
      wait_for_capacity_timeout          = var.wait_for_capacity_timeout
      cluster_name                       = module.label.id
      cluster_endpoint                   = module.eks_cluster.eks_cluster_endpoint
      cluster_certificate_authority_data = module.eks_cluster.eks_cluster_certificate_authority_data
      cluster_security_group_id          = module.eks_cluster.eks_cluster_managed_security_group_id

      # Auto-scaling policies and CloudWatch metric alarms
      autoscaling_policies_enabled           = var.autoscaling_policies_enabled
      cpu_utilization_high_threshold_percent = var.cpu_utilization_high_threshold_percent
      cpu_utilization_low_threshold_percent  = var.cpu_utilization_low_threshold_percent

      context = module.label.context
    }

    module "eks_workers_2" {
      source = "cloudposse/eks-workers/aws"
      # Cloud Posse recommends pinning every module to a specific version
      # version     = "x.x.x"

      attributes                         = ["medium"]
      instance_type                      = "t3.medium"
      eks_worker_ami_name_filter         = local.eks_worker_ami_name_filter
      vpc_id                             = module.vpc.vpc_id
      subnet_ids                         = module.subnets.public_subnet_ids
      health_check_type                  = var.health_check_type
      min_size                           = var.min_size
      max_size                           = var.max_size
      wait_for_capacity_timeout          = var.wait_for_capacity_timeout
      cluster_name                       = module.label.id
      cluster_endpoint                   = module.eks_cluster.eks_cluster_endpoint
      cluster_certificate_authority_data = module.eks_cluster.eks_cluster_certificate_authority_data
      cluster_security_group_id          = module.eks_cluster.eks_cluster_managed_security_group_id

      # Auto-scaling policies and CloudWatch metric alarms
      autoscaling_policies_enabled           = var.autoscaling_policies_enabled
      cpu_utilization_high_threshold_percent = var.cpu_utilization_high_threshold_percent
      cpu_utilization_low_threshold_percent  = var.cpu_utilization_low_threshold_percent

      context = module.label.context
    }

    module "eks_cluster" {
      source = "cloudposse/eks-cluster/aws"
      # Cloud Posse recommends pinning every module to a specific version
      # version     = "x.x.x"

      vpc_id     = module.vpc.vpc_id
      subnet_ids = module.subnets.public_subnet_ids

      kubernetes_version    = var.kubernetes_version
      oidc_provider_enabled = false

      workers_role_arns          = [module.eks_workers.workers_role_arn, module.eks_workers_2.workers_role_arn]
      allowed_security_group_ids = [module.eks_workers.security_group_id, module.eks_workers_2.security_group_id]

      context = module.label.context
    }
  ```

include:
  - docs/targets.md
  - docs/terraform.md

contributors: []