Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Addons get deployed before associated IAM roles are created #7992

Open
josegonzalez opened this issue Oct 8, 2024 · 2 comments
Open

Comments

@josegonzalez
Copy link

josegonzalez commented Oct 8, 2024

What were you trying to accomplish?

I am trying to create a new cluster with the vpc-cni addon configured against a role that is created on the fly (to avoid #7951). Currently, the order of cloudformation stacks is:

  • addons
  • managed node groups
  • service accounts (and associated iam roles)

Because of this, the iam role that should exist for the vpc-cni addon doesn't exist, causing the vpc-cni plugin to never have it's pods created. Since the managed node groups is next, the node group will fail to be marked ready for EKS because the vpc-cni addon has yet to be ready. Thus, a cluster creation will fail.

What happened?

The cluster failed to be created successfully.

How to reproduce it?

eksctl create cluster -v 5 -f cluster.yaml

Contents of cluster.yaml below:

---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: prod
  region: ap-northeast-1
  version: "1.30"
  tags:
    environment: prod
    managed-by: eksctl

iam:
  serviceAccounts:
  - metadata:
      name: prod-apn1-ebs-csi-driver-role
      namespace: kube-system
    attachPolicyARNs:
    - "arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"
    tags:
      environment: prod
      managed-by: eksctl
  - metadata:
      name: prod-apn1-vpc-cni-role
      namespace: kube-system
    attachPolicyARNs:
    - "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
    tags:
      environment: prod
      managed-by: eksctl
  withOIDC: true

fargateProfiles:
  - name: default
    selectors:
      - namespace: default
      - namespace: kube-system
  - name: ingress-nginx
    selectors:
      - namespace: ingress-nginx

vpc:
  cidr: 10.129.0.0/16
  autoAllocateIPv6: true
  hostnameType: resource-name
  clusterEndpoints:
    publicAccess: true
    privateAccess: true

cloudWatch:
  clusterLogging:
    enableTypes: ["audit", "authenticator", "controllerManager"]
    logRetentionInDays: 60

managedNodeGroups:
  - name: airflow
    labels:
      role: airflow
    tags:
      environment: prod
      managed-by: eksctl
    instanceType: t3.xlarge
    minSize: 1
    maxSize: 6
    desiredCapacity: 1
    volumeSize: 280
    privateNetworking: true
    iam:
      withAddonPolicies:
        appMesh: true
        appMeshPreview: true
        autoScaler: true
        awsLoadBalancerController: true
        certManager: true
        cloudWatch: true
        ebs: true
        efs: true
        externalDNS: true
        fsx: true
        imageBuilder: true
        xRay: true

addons:
- name: aws-ebs-csi-driver
  serviceAccountRoleARN: arn:aws:iam::1234567890:role/prod-apn1-ebs-csi-driver-role
- name: coredns
- name: kube-proxy
- name: eks-pod-identity-agent
- name: vpc-cni
  serviceAccountRoleARN: arn:aws:iam::1234567890:role/prod-apn1-vpc-cni-role

Logs

https://gist.github.com/josegonzalez/b9b9b5bd0f82603ffe5c60db00232094

Anything else we need to know?

Versions

% eksctl info
eksctl version: 0.191.0-dev+c736924d6.2024-09-27T00:54:42Z
kubectl version: v1.31.1
OS: darwin
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Nov 10, 2024
@josegonzalez
Copy link
Author

This is still a bug.

@github-actions github-actions bot removed the stale label Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants