Fargate tests are flakey #1041

rquitales · 2024-02-22T00:46:27Z

What happened?

Surfaced in #1039. Our daily cron job fails about 33% of the time due to our Fargate tests.

The error from the test is:

 utils.go:290: 
          	Error Trace:	/home/runner/work/pulumi-eks/pulumi-eks/examples/utils/utils.go:290
          	            				/home/runner/work/pulumi-eks/pulumi-eks/examples/utils/utils.go:225
          	            				/home/runner/work/pulumi-eks/pulumi-eks/examples/utils/utils.go:84
          	            				/home/runner/work/pulumi-eks/pulumi-eks/examples/utils/utils.go:90
          	            				/opt/hostedtoolcache/go/1.21.6/x64/src/runtime/asm_amd64.s:1650
          	Error:      	Not equal: 
          	            	expected: 3
          	            	actual  : 2
          	Test:       	TestAccFargate
          	Messages:   	2 out of 3 Pods are ready

It is unclear why only 2 of the 3 pods can become live and requires further debugging.

Example

#1039

Output of `pulumi about`

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

The text was updated successfully, but these errors were encountered:

thomas11 · 2024-02-22T07:51:36Z

@rquitales I think we also need to disable Fargate tests until this is resolved, otherwise we'll constantly get new P1s like this #1043.

Disable Fargate tests until #1041 is resolved, see that issue for context. Fixes #1043

To ease the impact of the breaking API changes caused by generating the node SDK, we decided to add additional scalar inputs that simplify UX across all SDKs (for more details [see internal doc](https://docs.google.com/document/d/1f97nmDUG_nrZSllYxu_XSeI7ON8vhZzfVrdBTQQmZzw/edit#heading=h.fbweiu8gc5bw)). This change adds the scalar properties mentioned in the doc and adds acceptance tests for them. While adding the acceptance tests I noticed that running pods on Fargate doesn't work deterministically. In some cases the cluster fails to get healthy (coredns stuck in pending). This was caused by a race-condition between coredns starting and the fargate profile being created. If the fargate profile deployed after coredns, the pods got stuck in pending because they got assigned to the `default-scheduler` instead of the `fargate-scheduler`. The fix is relatively easy; making coredns depend on the fargate profile. I'll separately update the migration guide. ### New properties | Existing Resource | | New Top Level Property | Description | | :---- | :---- | :---- | :---- | | `clusterSecurityGroup: Output<aws.ec2.SecurityGroup \| undefined>` | | `clusterSecurityGroupId: Output<string>` | Only really useful property of a security group. Used to add additional ingress/egress rules. Default to `the EKS created security group id` | | `nodeSecurityGroup: Output<aws.ec2.SecurityGroup \| undefined>` | | `nodeSecurityGroupId: Output<string>` | | | `eksClusterIngressRule: Output<aws.ec2.SecurityGroupRule \| undefined>` | | `clusterIngressRuleId: Output<string>` | Only really useful property of a rule. Default to `””` | | `defaultNodeGroup: Output<eks.NodeGroupData \| undefined>` | | `defaultNodeGroupAsgName: Output<string>` | The only useful property of the default node group is the auto scaling group. Exposing its name allows users to reference it in IAM roles, tags, etc. Default to `””` | | `core` | `fargateProfile: Output<aws.eks.FargateProfile \| undefined>` | `fargateProfileId: Output<string>` | The id of the fargate profile. Can be used to reference it. Default to `””` | | | | `fargateProfileStatus: Output<string>` | The status of the fargate profile. Default to `””` | | | `oidcProvider: Output<aws.iam.OpenIdConnectProvider \| undefined>` | `oidcProviderArn: Output<string>` & `oidcProviderUrl: Output<string>` & `oidcIssuer: Output<string` | Arn and Url are properties needed to set up IAM identities for pods (required for the assume role policy of the IAM role). Users currently need to trim the `https://` part of the url to actually use it. We should expose `oidcProvider` with that already done to ease usage. | Fixes #1041

pulumi-bot · 2024-10-17T16:15:14Z

This issue has been addressed in PR #1445 and shipped in release v3.0.0-beta.2.

rquitales added needs-triage Needs attention from the triage team kind/engineering Work that is not visible to an external user and removed needs-triage Needs attention from the triage team labels Feb 22, 2024

rquitales mentioned this issue Feb 22, 2024

Workflow failure: cron #1039

Closed

thomas11 added a commit that referenced this issue Feb 22, 2024

Disable Fargate tests until #1041 is resolved

23f34fa

thomas11 mentioned this issue Feb 22, 2024

Disable Fargate tests until #1041 is resolved #1044

Merged

thomas11 added a commit that referenced this issue Feb 23, 2024

Disable Fargate tests until #1041 is resolved (#1044)

7853826

Disable Fargate tests until #1041 is resolved, see that issue for context. Fixes #1043

flostadler pushed a commit that referenced this issue Sep 4, 2024

Disable Fargate tests until #1041 is resolved (#1044)

1be7931

Disable Fargate tests until #1041 is resolved, see that issue for context. Fixes #1043

flostadler mentioned this issue Oct 16, 2024

Add scalar types for most commonly used resource outputs #1445

Merged

flostadler closed this as completed in 4664fca Oct 17, 2024

pulumi-bot assigned flostadler Oct 17, 2024

pulumi-bot added the resolution/fixed This issue was fixed label Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fargate tests are flakey #1041

Fargate tests are flakey #1041

rquitales commented Feb 22, 2024

thomas11 commented Feb 22, 2024

pulumi-bot commented Oct 17, 2024

Fargate tests are flakey #1041

Fargate tests are flakey #1041

Comments

rquitales commented Feb 22, 2024

What happened?

Example

Output of pulumi about

Additional context

Contributing

thomas11 commented Feb 22, 2024

pulumi-bot commented Oct 17, 2024

Output of `pulumi about`