Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

genrule fails with non-existent path using --build_tests_only on Windows #24546

Closed
avdv opened this issue Dec 3, 2024 · 8 comments
Closed

genrule fails with non-existent path using --build_tests_only on Windows #24546

avdv opened this issue Dec 3, 2024 · 8 comments
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug

Comments

@avdv
Copy link
Contributor

avdv commented Dec 3, 2024

Description of the bug:

In a genrule, trying to run a command which gets expanded from a template variable results in an error when using Bazel >= 7.3.0:

C:/b/bk-windows-j20f/bazel-org-repo-root/output/rules_sh-0.5.0/tests/sh_binaries/BUILD.bazel:3:23: Executing genrule //sh_binaries:genrule failed: (Exit 127): bash.exe failed: error executing Genrule command (from target //sh_binaries:genrule)
...
/usr/bin/bash: line 1: c:\\b\\3mrahdhj\\execroot\\_main\\bazel-out\\x64_windows-fastbuild\\bin\\sh_binaries\\hello_world.exe.runfiles/_main/sh_binaries/hello_world: No such file or directory

Note the path looks fishy: c:\\b\\3mrahdhj\\execroot\\_main\\bazel-out\\x64_windows-fastbuild\\bin\\sh_binaries\\hello_world.exe.runfiles/_main/sh_binaries/hello_world

Removing the --build_tests_only flag makes this work.

cc @fmeum

Which category does this issue belong to?

Local Execution

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

You can use rules_sh as reproducer (I'll try to give a minimal example later):

  1. gh repo clone tweag/rules_sh
  2. cd rules_sh/tests
  3. bazel test --build_tests_only ...

Which operating system are you running Bazel on?

Windows

What is the output of bazel info release?

7.3.0 - 7.4.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

Bisecting this points to 282ac62 610fe7b

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

I ran into this error for bazelbuild/bazel-central-registry#3288, see https://buildkite.com/bazel/bcr-presubmit/builds/9106.

@github-actions github-actions bot added the team-Local-Exec Issues and PRs for the Execution (Local) team label Dec 3, 2024
@fmeum
Copy link
Collaborator

fmeum commented Dec 3, 2024

@ADVD Do you know what the path looked like before the breaking commit?

@avdv
Copy link
Contributor Author

avdv commented Dec 3, 2024

Do you know what the path looked like before the breaking commit?

No, I don't know. But I suspect that this is related to the data that is appended to the launcher.exe on Windows. I don't have a Windows machine, which makes this always a bit tricky to debug.

@fmeum fmeum self-assigned this Dec 4, 2024
@fmeum fmeum added P1 I'll work on this now. (Assignee required) and removed untriaged labels Dec 4, 2024
@avdv
Copy link
Contributor Author

avdv commented Dec 5, 2024

@fmeum I just realized that the commit were this starts to fail is actually 610fe7b -- which also makes a lot more sense. (the bazel wrapper we are using always returned exit code 0 so bazelisk bisect reported success for that commit although it failed)

@avdv
Copy link
Contributor Author

avdv commented Dec 5, 2024

Here is a CI run with two jobs, one is using bazel from 610fe7b
(bad) the other its parent (good): https://github.com/tweag/rules_sh/actions/runs/12177585935

Also, the error also occurs when just trying to build the //sh_binaries:genrule target.

Here's a hexdump of the end of the hello_world.exe (for the failed build):

00001390  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 62  |...............b|
000013a0  69 6e 61 72 79 5f 74 79  70 65 3d 42 61 73 68 00  |inary_type=Bash.|
000013b0  77 6f 72 6b 73 70 61 63  65 5f 6e 61 6d 65 3d 5f  |workspace_name=_|
000013c0  6d 61 69 6e 00 73 79 6d  6c 69 6e 6b 5f 72 75 6e  |main.symlink_run|
000013d0  66 69 6c 65 73 5f 65 6e  61 62 6c 65 64 3d 30 00  |files_enabled=0.|
000013e0  62 61 73 68 5f 62 69 6e  5f 70 61 74 68 3d 43 3a  |bash_bin_path=C:|
000013f0  2f 6d 73 79 73 36 34 2f  75 73 72 2f 62 69 6e 2f  |/msys64/usr/bin/|
00001400  62 61 73 68 2e 65 78 65  00 62 61 73 68 5f 66 69  |bash.exe.bash_fi|
00001410  6c 65 5f 72 6c 6f 63 61  74 69 6f 6e 70 61 74 68  |le_rlocationpath|
00001420  3d 5f 6d 61 69 6e 2f 73  68 5f 62 69 6e 61 72 69  |=_main/sh_binari|
00001430  65 73 2f 68 65 6c 6c 6f  5f 77 6f 72 6c 64 00 a0  |es/hello_world..|
00001440  00 00 00 00 00 00 00                              |.......|

@fmeum
Copy link
Collaborator

fmeum commented Dec 5, 2024

Thanks for the update. I think I know what's up, but will need to confirm my theory. The commit you originally identified as the culprit actually has a bug, so thanks for making me aware of that. :-)

@fmeum
Copy link
Collaborator

fmeum commented Dec 5, 2024

While it is true that the culprit commit changed how the sh_binary launcher finds the script, it did so in a way that is transparent to consumers that correctly include the targets runfiles. If that's not done, an sh_binary target that uses runfiles also wouldn't have worked before that change. I am seeing failures on tweag/rules_sh#95 locally on macOS that point to this being the issue (rather than a Bazel bug).

@fmeum
Copy link
Collaborator

fmeum commented Dec 5, 2024

This is ultimately caused by https://github.com/tweag/rules_sh/blob/dfd9051483ebcd0cd84902291b52270617ca8028/sh/private/defs.bzl#L35-L38: Genrule tools expect a runfiles structure in which each has a sibling .runfiles directory or .runfiles_manifest file, but Bazel will instead create a single one next to this fake executable.

Without a solution to #15486, you probably have to create a wrapper script around each individual tool that sets the runfiles environment variables to the single shared runfiles location.

@avdv
Copy link
Contributor Author

avdv commented Dec 6, 2024

Thank you for digging into this @fmeum!

This is ultimately caused by https://github.com/tweag/rules_sh/blob/dfd9051483ebcd0cd84902291b52270617ca8028/sh/private/defs.bzl#L35-L38: Genrule tools expect a runfiles structure in which each has a sibling .runfiles directory or .runfiles_manifest file, but Bazel will instead create a single one next to this fake executable.

OK, I see. Closing.

@avdv avdv closed this as completed Dec 6, 2024
@fmeum fmeum reopened this Dec 6, 2024
@fmeum fmeum closed this as not planned Won't fix, can't repro, duplicate, stale Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug
Projects
None yet
Development

No branches or pull requests

5 participants