-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(app): revised configure_torch_cuda_allocator()
& testing strategy
#7733
Conversation
372d3e8
to
9867b32
Compare
@psychedelicious import sys
from multiprocessing import Process, Queue
import logging
def subprocess_target(queue, func, *args, **kwargs):
import sys
from io import StringIO
stdout_capture = StringIO()
stderr_capture = StringIO()
sys.stdout = stdout_capture
sys.stderr = stderr_capture
try:
func(*args, **kwargs)
except Exception:
print(traceback.format_exc(), file=sys.stderr)
finally:
stdout = stdout_capture.getvalue().rstrip("\n")
stderr = stderr_capture.getvalue().rstrip("\n")
queue.put((stdout, stderr))
def run_function_in_subprocess(func, *args, **kwargs):
queue = Queue()
p = Process(target=subprocess_target, args=(queue, func, *args), kwargs=kwargs)
p.start()
p.join()
stdout, stderr = queue.get()
exit_code = p.exitcode
return stdout, stderr, exit_code
def example():
print("hi spencer")
print("oh no", file=sys.stderr)
sys.exit(42)
if __name__ == "__main__":
stdout, stderr, exit_code = run_function_in_subprocess(example)
assert stdout == "hi spencer"
assert stderr == "oh no"
assert exit_code == 42 |
Great work figuring out a very subtle bug! |
Thanks! That's a nicer way to run the isolated code, but it is a bit awkward with arbitrary functions. If I understand correctly, you have to manually catch errors and reroute messages to stderr and manually exit. The hacky function-to-script approach kinda gives you that for free. I'll keep the current implementation but keep this more pythonic method in mind. |
This allows our tests to run in an isolated environment. For tests taht implicitly depend on import behaviour, this can prevent side-effects. The function should only be used for tests.
…rch_cuda_allocator tests
9867b32
to
bb0cd59
Compare
Hey @psychedelicious, just to clarify - in this example, the arbitrary function being executed is: def example():
print("hi spencer")
print("oh no", file=sys.stderr)
sys.exit(42) The caller of |
Summary
configure_torch_cuda_allocator
changesPYTORCH_CUDA_ALLOC_CONF
is set to a different value than is configured ininvokeai.yaml
.PYTORCH_CUDA_ALLOC_CONF
is set to the same value that is configured ininvokeai.yaml
.torch
having been imported in the util. Note that this depends on the name of the module. So, iftorch
was imported with something likeimport torch as foo
, then the check won't work. But this should cover 99.9% of cases.Testing changes
I was running into an issue where I had the
PYTORCH_CUDA_ALLOC_CONF
env var set on my system and a test forconfigure_torch_cuda_allocator()
failed. The function must be called before torch is imported, and the test is intended to confirm that behaviour.By setting the env var before running the test,
configure_torch_cuda_allocator()
raises before getting to the code that the test intends to test.To reproduce on
main
:pytest tests/app/util/test_torch_cuda_allocator.py
export PYTORCH_CUDA_ALLOC_CONF="backend:native"
pytest tests/app/util/test_torch_cuda_allocator.py
I fixed this by instead changing that first raise to be a log. Then, I added explicit tests to ensure the function behaved correctly when torch was imported before calling it.
But this doesn't work. The tests rely on implicit torch import behaviour - it configures itself based on the
PYTORCH_CUDA_ALLOC_CONF
env var on import. However, out test suite imports torch in many places, including this very test file. By the time we run the tests, torch is already imported. Catch 22!So the test is unable to test the interaction of
configure_torch_cuda_allocator()
and torch's import behaviour as it should.I could not find a simple, cross-platform way have the tests run in isolated environments. Apparently
del sys.modules['module_name']
essentially un-imports a module, but it doesn't work for torch (maybe relates to native code execution?).So, I brute-forced it and added a utility
dangerously_run_function_in_subprocess()
to run a python function in a subprocess. This utility takes a function without any args, runs it in a subprocess, and returns stdout/stderr/exit code. This allows a test to run in total isolation! The function to be run cannot have any closures of course.The utility is tested separately and used to test the
configure_torch_cuda_allocator()
utility in an isolated environment.Related Issues / Discussions
Closes #7731
QA Instructions
Try combinations of "backend:native" and "backend:cudaMallocAsync", for the env var and the setting in
invokeai.yaml
.The app should:
Merge Plan
n/a
Checklist
What's New
copy (if doing a release after this PR)