Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug aid #823

Open
BMurri opened this issue Jan 23, 2025 · 0 comments · May be fixed by #822
Open

Debug aid #823

BMurri opened this issue Jan 23, 2025 · 0 comments · May be fixed by #822
Assignees
Labels
Deployability Enable TES is easy to deploy for end users enhancement New feature or request Robustness Enable users can run tasks w/o bugs or with mitigation of known bugs TES Priority: P1 Groomed to a Priority 1 issue Troubleshooting Enable users to identify and debug errors
Milestone

Comments

@BMurri
Copy link
Collaborator

BMurri commented Jan 23, 2025

Problem:
The hardest kinds of problems to troubleshoot are the ones were the only information generated is an exit code that may as well be meaningless.

Solution:
Some means of accessing any information produced on the compute node before the task ends.

Describe alternatives you've considered
Providing a solution for #555 is larger in scope and something is needed earlier.

Code dependencies
Will this require code changes in:

  • CoA, for new and/or existing deployments? No
  • TES standalone, for new and/or existing deployments? No
  • Terra, for new and/or existing deployments? No
  • Build pipeline? No
  • Integration tests? No

Additional context
As envisioned, this is to cover one specific scenario: a very repeatable failure on the compute nodes where no logs of any kind are generated by the task runner, and the batch task ends with a non-zero exit code (usually 10, at the time of this writing).

@BMurri BMurri added Deployability Enable TES is easy to deploy for end users enhancement New feature or request Robustness Enable users can run tasks w/o bugs or with mitigation of known bugs TES Priority: P1 Groomed to a Priority 1 issue Troubleshooting Enable users to identify and debug errors labels Jan 23, 2025
@BMurri BMurri added this to the next milestone Jan 23, 2025
@BMurri BMurri self-assigned this Jan 23, 2025
@BMurri BMurri linked a pull request Jan 23, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deployability Enable TES is easy to deploy for end users enhancement New feature or request Robustness Enable users can run tasks w/o bugs or with mitigation of known bugs TES Priority: P1 Groomed to a Priority 1 issue Troubleshooting Enable users to identify and debug errors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant