Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm_resource did not wait for resource_deps on helm_repo #500

Open
nschmeller opened this issue May 25, 2023 · 6 comments
Open

helm_resource did not wait for resource_deps on helm_repo #500

nschmeller opened this issue May 25, 2023 · 6 comments

Comments

@nschmeller
Copy link

Hi all, I bumped into a race condition recently in a CI run where helm_resource was supposed to wait for helm_repo to finish but did not. Here are some logs from what happened:

vm-charts │ Initial Build
grafana-char… │ 
grafana-char… │ Initial Build
grafana-char… │ Running cmd: helm repo add grafana-charts https://grafana.github.io/helm-charts --force-update
    vm-charts │ Running cmd: helm repo add vm-charts https://victoriametrics.github.io/helm-charts --force-update
ingress-char… │ 
ingress-char… │ Initial Build
ingress-char… │ Running cmd: helm repo add ingress-charts https://kubernetes.github.io/ingress-nginx --force-update
    vm-charts │ "vm-charts" has been added to your repositories
   vmoperator │ 
   vmoperator │ Initial Build
   vmoperator │ STEP 1/1 — Deploying
   vmoperator │      Running cmd: python3 /home/runner/.local/share/tilt-dev/tilt_modules/github.com/tilt-dev/tilt-extensions/helm_resource/helm-apply-helper.py
   vmoperator │      Running cmd: ['helm', 'upgrade', '--install', 'vmoperator', 'vm-charts/victoria-metrics-operator']
   vmoperator │      Release "vmoperator" does not exist. Installing it now.
ingress-char… │ "ingress-charts" has been added to your repositories
      grafana │ 
      grafana │ Initial Build
      grafana │ STEP 1/1 — Deploying
      grafana │      Running cmd: python3 /home/runner/.local/share/tilt-dev/tilt_modules/github.com/tilt-dev/tilt-extensions/helm_resource/helm-apply-helper.py --values grafana/values.yaml
      grafana │      Running cmd: ['helm', 'upgrade', '--install', '--values', 'grafana/values.yaml', 'grafana', 'grafana-charts/grafana']
      grafana │      Release "grafana" does not exist. Installing it now.
      grafana │      Error: repo grafana-charts not found
      grafana │      Traceback (most recent call last):
      grafana │        File "/home/runner/.local/share/tilt-dev/tilt_modules/github.com/tilt-dev/tilt-extensions/helm_resource/helm-apply-helper.py", line 66, in <module>
      grafana │          subprocess.check_call(install_cmd, stdout=sys.stderr)
      grafana │        File "/usr/lib/python3.10/subprocess.py", line [36](https://github.com/lacework/agent/actions/runs/5081096334/jobs/9137856036#step:7:37)9, in check_call
      grafana │          raise CalledProcessError(retcode, cmd)
      grafana │      subprocess.CalledProcessError: Command '['helm', 'upgrade', '--install', '--values', 'grafana/values.yaml', 'grafana', 'grafana-charts/grafana']' returned non-zero exit status 1.
      grafana │ 
      grafana │ ERROR: Build Failed: apply command exited with status 1

The Tiltfile section that caused that is

# Install Grafana Helm Charts
helm_repo(name='grafana-charts',
          url='https://grafana.github.io/helm-charts')
helm_resource(name='grafana',
              chart='grafana-charts/grafana',
              flags=['--values', './dev/grafana/values.yaml'],
              port_forwards=[3000])
watch_file('./dev/grafana/values.yaml')

Here, grafana has a dependency on grafana-charts, but in the logs above, the chart was not found!

I went ahead and did some triaging in the hopes that it would be useful. My suspicion for what happened:

  • helm_repo uses the Tiltfile local_resource API (
    local_resource(resource_name, args, allow_parallel=True, **kwargs)
    )
  • However, local_resource returns "ready" to Tilt only when the process is started (and not when it is finished!), ctx: https://docs.tilt.dev/local_resource.html#readiness_probe.
  • That means there is a potential race where:
    • helm_repo is called and starts the helm repo add
    • helm_resource, which is dutifully waiting on helm_repo to signal "ready", gets the all-clear from Tilt as the process was started
    • helm_resource attempts to install the chart
    • helm_resource crashes when it does not find the chart
    • helm_repo finishes installing the chart, alas too late

I propose that helm_repo's call to local_resource adds a new readiness_probe which uses an exec_action of helm repo list -o json | jq -e 'map(select(.name == <name-of-chart>)) != []' to determine success of chart installation.

Let me know if this all makes sense please!

@nicks
Copy link
Member

nicks commented May 25, 2023

I think you're misreading the docs.

local_resource readiness checks apply to the serve_cmd, not the build cmd. The helm_repo function uses the build cmd. I didn't see anything in your logs to suggest grafana deployed before grafana-charts finished.

@nschmeller
Copy link
Author

Gotcha, thanks for the clarification -- could there be another reason why this would be flaky / racy?

@nicks
Copy link
Member

nicks commented May 25, 2023

You say you expected it to wait on resource_deps, but I don't see any resource_deps in the code snippet you posted?

@nschmeller
Copy link
Author

My bad, I pasted the wrong snippet from the Tiltfile 😅 -- the failed CI run happened after I committed code with the resource_deps added:

diff --git a/monitoring/Tiltfile b/monitoring/Tiltfile
index 30cb8f76e4..e0a3696cd7 100644
--- a/monitoring/Tiltfile
+++ b/monitoring/Tiltfile
@@ -56,6 +56,7 @@ helm_repo(name='grafana-charts',
 helm_resource(name='grafana',
               chart='grafana-charts/grafana',
               flags=['--values', './dev/grafana/values.yaml'],
+              resource_deps=['grafana-charts'],
               port_forwards=[3000])
 watch_file('./dev/grafana/values.yaml')

So the state of the Tiltfile when this happened was actually

# Install Grafana Helm Charts
helm_repo(name='grafana-charts',
          url='https://grafana.github.io/helm-charts')
helm_resource(name='grafana',
              chart='grafana-charts/grafana',
              flags=['--values', './dev/grafana/values.yaml'],
              resource_deps=['grafana-charts'],
              port_forwards=[3000])
watch_file('./dev/grafana/values.yaml')

Sorry again for the confusion -- any ideas still?

@nicks
Copy link
Member

nicks commented May 25, 2023

I'm not 100% confident 'helm repos add' is concurrency safe. From your logs, it looks like two repo-adds (repos-add?) ran at the same time, but only one printed, which is a bit suspicious. Maybe we could try removing the allow_parallel param from helm_repo and see if that helps?

nschmeller added a commit to nschmeller/tilt-extensions that referenced this issue May 25, 2023
Multiple invocations of `helm_repo` may not be concurrency safe because
of the execution of `helm repo add`. This commit unsets the
`allow_parallel` argument in `helm_repo` to prevent this bug.

Resolves tilt-dev#500
nschmeller added a commit to nschmeller/tilt-extensions that referenced this issue May 25, 2023
Multiple invocations of `helm_repo` may not be concurrency safe because
of the execution of `helm repo add`. This commit unsets the
`allow_parallel` argument in `helm_repo` to prevent this bug.

Resolves tilt-dev#500

Signed-off-by: Nick Schmeller <[email protected]>
nschmeller added a commit to nschmeller/tilt-extensions that referenced this issue May 25, 2023
Multiple invocations of `helm_repo` may not be concurrency safe because
of the execution of `helm repo add`. This commit unsets the
`allow_parallel` argument in `helm_repo` to prevent this bug.

Resolves tilt-dev#500

Signed-off-by: Nick Schmeller <[email protected]>
@nschmeller
Copy link
Author

I'm not sure how to force the race, but I added the problematic code to the test Tiltfile in helm_resource in the hopes of reproducing. Let me know how that PR looks!

nschmeller added a commit to nschmeller/tilt-extensions that referenced this issue May 25, 2023
Multiple invocations of `helm_repo` may not be concurrency safe because
of the execution of `helm repo add`. This commit unsets the
`allow_parallel` argument in `helm_repo` to prevent this bug.

Resolves tilt-dev#500

Signed-off-by: Nick Schmeller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants