Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky Test: JobSet Depends on #786

Open
kannon92 opened this issue Feb 18, 2025 · 6 comments
Open

Flaky Test: JobSet Depends on #786

kannon92 opened this issue Feb 18, 2025 · 6 comments

Comments

@kannon92
Copy link
Contributor

https://testgrid.k8s.io/sig-apps#pull-jobset-test-e2e-main-1-30

I've seen this failure twice on the helm chart PR.

JobSet when DependsOn is enabled on JobSet [It] trainer-node Job depends on launcher Job ready status
/home/prow/go/src/sigs.k8s.io/jobset/test/e2e/e2e_test.go:345
  [FAILED] Timed out after 600.001s.
  Expected
      <int32>: 0
  to equal
      <int32>: 1
  In [It] at: /home/prow/go/src/sigs.k8s.io/jobset/test/e2e/e2e_test.go:397 @ 02/18/25 03:53:02.09

cc @andreyvelich @tenzen-y

@tenzen-y
Copy link
Member

Just confirmation for prioritizing.
Is this blocker for next release?

@kannon92
Copy link
Contributor Author

Yea, let's call it that for now. Just want to make sure there isn't anything wrong with the functionality.

@andreyvelich
Copy link
Member

I was just running the same test locally and it was working fine for me:

$ ./bin/ginkgo --focus "trainer-node Job depends on launcher Job ready status" ./test/e2e/...

...
Ginkgo ran 1 suite in 28.074639959s
Test Suite Passed

@kannon92 Did you see the same errors in other PRs ?

@kannon92
Copy link
Contributor Author

You can see the test grid. It failed on the helm chart PR twice this week.

@tenzen-y
Copy link
Member

As I checked these test cases local, I faced the different dependsOn failures after performing E2E 115 times:
I guess that the order of dependsOn is not guaranteed.

JobSet when DependsOn is enabled on JobSet trainer-node Job depends on launcher Job ready status
/Users/s14554/go/src/sigs.k8s.io/jobset/test/e2e/e2e_test.go:345
  STEP: Create a JobSet with DependsOn @ 02/23/25 04:49:20.757
  STEP: Verify that only Launcher is created @ 02/23/25 04:49:20.766
  STEP: Wait for Launcher to be in Ready status @ 02/23/25 04:49:20.769
  STEP: Verify that Launcher and Trainer Job is created @ 02/23/25 04:49:32.636
  STEP: Wait for JobSet to be Completed @ 02/23/25 04:49:32.64
  STEP: checking jobset status is: Completed @ 02/23/25 04:49:32.641
  [TIMEDOUT] in [It] - /Users/s14554/go/src/sigs.k8s.io/jobset/test/e2e/e2e_test.go:345 @ 02/23/25 04:49:39.128
• [TIMEDOUT] [18.387 seconds]
JobSet when DependsOn is enabled on JobSet [It] trainer-node Job depends on launcher Job ready status
/Users/s14554/go/src/sigs.k8s.io/jobset/test/e2e/e2e_test.go:345

  [TIMEDOUT] A suite timeout occurred
  In [It] at: /Users/s14554/go/src/sigs.k8s.io/jobset/test/e2e/e2e_test.go:345 @ 02/23/25 04:49:39.128

  This is the Progress Report generated when the suite timeout occurred:
    JobSet when DependsOn is enabled on JobSet trainer-node Job depends on launcher Job ready status (Spec Runtime: 18.375s)
      /Users/s14554/go/src/sigs.k8s.io/jobset/test/e2e/e2e_test.go:345
      In [It] (Node Runtime: 18.371s)
        /Users/s14554/go/src/sigs.k8s.io/jobset/test/e2e/e2e_test.go:345
        At [By Step] checking jobset status is: Completed (Step Runtime: 6.487s)
          /Users/s14554/go/src/sigs.k8s.io/jobset/test/util/util.go:78

        Spec Goroutine
        goroutine 83 [select]
          github.com/onsi/gomega/internal.(*AsyncAssertion).match(0x140002651f0, {0x103923ac8, 0x14000807820}, 0x1, {0x0, 0x0, 0x0})
            /Users/s14554/go/pkg/mod/github.com/onsi/[email protected]/internal/async_assertion.go:546
          github.com/onsi/gomega/internal.(*AsyncAssertion).Should(0x140002651f0, {0x103923ac8, 0x14000807820}, {0x0, 0x0, 0x0})
            /Users/s14554/go/pkg/mod/github.com/onsi/[email protected]/internal/async_assertion.go:145
        > sigs.k8s.io/jobset/test/util.JobSetCompleted({0x103932590, 0x1045f6de0}, {0x10393adc0, 0x1400003b0e0}, 0x14000502540, 0x8bb2c97000)
            /Users/s14554/go/src/sigs.k8s.io/jobset/test/util/util.go:86
              |         }
              |         terminalState := string(jobset.JobSetCompleted)
              >         gomega.Eventually(checkJobSetStatus, timeout, interval).WithArguments(ctx, k8sClient, js, conditions).Should(gomega.Equal(true))
              |         gomega.Eventually(checkJobSetTerminalState, timeout, interval).WithArguments(ctx, k8sClient, js, terminalState).Should(gomega.Equal(true))
              | }
        > sigs.k8s.io/jobset/test/e2e.init.func1.8.2.5()
            /Users/s14554/go/src/sigs.k8s.io/jobset/test/e2e/e2e_test.go:407
              | 
              |         ginkgo.By("Wait for JobSet to be Completed", func() {
              >                 util.JobSetCompleted(ctx, k8sClient, jobSet, timeout)
              |         })
              | })
          github.com/onsi/ginkgo/v2/internal.(*Suite).By(0x140001b6a88, {0x10338a6a3, 0x1f}, {0x1400051df48, 0x1, 0x10336279c?})
            /Users/s14554/go/pkg/mod/github.com/onsi/ginkgo/[email protected]/internal/suite.go:323
          github.com/onsi/ginkgo/v2.By({0x10338a6a3?, 0xc?}, {0x1400051df48?, 0x3?, 0x0?})
            /Users/s14554/go/pkg/mod/github.com/onsi/ginkgo/[email protected]/core_dsl.go:600
        > sigs.k8s.io/jobset/test/e2e.init.func1.8.2()
            /Users/s14554/go/src/sigs.k8s.io/jobset/test/e2e/e2e_test.go:406
              | })
              | 
              > ginkgo.By("Wait for JobSet to be Completed", func() {
              |         util.JobSetCompleted(ctx, k8sClient, jobSet, timeout)
              | })
          github.com/onsi/ginkgo/v2/internal.extractBodyFunction.func3({0x14000622d80?, 0x0?})
            /Users/s14554/go/pkg/mod/github.com/onsi/ginkgo/[email protected]/internal/node.go:475
          github.com/onsi/ginkgo/v2/internal.(*Suite).runNode.func3()
            /Users/s14554/go/pkg/mod/github.com/onsi/ginkgo/[email protected]/internal/suite.go:894
          github.com/onsi/ginkgo/v2/internal.(*Suite).runNode in goroutine 9
            /Users/s14554/go/pkg/mod/github.com/onsi/ginkgo/[email protected]/internal/suite.go:881
------------------------------
[ReportAfterSuite] Autogenerated ReportAfterSuite for --junit-report
autogenerated by Ginkgo
[ReportAfterSuite] PASSED [0.002 seconds]
------------------------------

Summarizing 1 Failure:
  [TIMEDOUT] JobSet when DependsOn is enabled on JobSet [It] trainer-node Job depends on launcher Job ready status
  /Users/s14554/go/src/sigs.k8s.io/jobset/test/e2e/e2e_test.go:345

Ran 1 of 7 Specs in 18.411 seconds
FAIL! - Suite Timeout Elapsed -- 0 Passed | 1 Failed | 0 Pending | 6 Skipped
--- FAIL: TestAPIs (18.42s)
FAIL
You're using deprecated Ginkgo functionality:
=============================================
  --ginkgo.slow-spec-threshold is deprecated --slow-spec-threshold has been deprecated and will be removed in a future version of Ginkgo.  This feature has proved to be more noisy than useful.  You can use --poll-progress-after, instead, to get more actionable feedback about potentially slow specs and understand where they might be getting stuck.

To silence deprecations that can be silenced set the following environment variable:
  ACK_GINKGO_DEPRECATIONS=2.22.2


Tests failed on attempt #115

@andreyvelich
Copy link
Member

Could it be related to the @ahg-g comment here: #740 (comment) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants