Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stanford CS149 Assignment 4 getting red error [TEN404] Internal tensorizer error: TensorInitialization:Expect NeuronReduceMacro! #1058

Closed
kpodosin opened this issue Dec 6, 2024 · 2 comments

Comments

@kpodosin
Copy link

kpodosin commented Dec 6, 2024

Here is the full error:
(aws_neuron_venv_pytorch_p310) ubuntu@ip-172-31-37-252:~/cs149assign4/part2$ python3 test_harness.py
Running correctness test for conv2d kernel with smaller images...[TEN404] Internal tensorizer error: TensorInitialization:Expect NeuronReduceMacro! - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.
Traceback (most recent call last):
File "/home/ubuntu/cs149assign4/part2/test_harness.py", line 183, in
test_result = test_correctness_conv2d_kernel(conv2d, use_larger_images=False)
File "/home/ubuntu/cs149assign4/part2/test_harness.py", line 85, in test_correctness_conv2d_kernel
out = kernel(*args, **kwargs)
File "neuronxcc/nki/compile.py", line 95, in neuronxcc.nki.compile.GenericKernel.call
File "neuronxcc/starfish/penguin/targets/nki/TraceKernel.py", line 174, in neuronxcc.starfish.penguin.targets.nki.TraceKernel.Kernel.call
File "neuronxcc/starfish/penguin/targets/nki/TraceKernel.py", line 422, in neuronxcc.starfish.penguin.targets.nki.TraceKernel.BaremetalKernel.post_process_call
File "neuronxcc/starfish/penguin/targets/nki/TraceKernel.py", line 425, in neuronxcc.starfish.penguin.targets.nki.TraceKernel.BaremetalKernel.post_process_call
File "neuronxcc/starfish/penguin/targets/nki/TraceKernel.py", line 508, in neuronxcc.starfish.penguin.targets.nki.TraceKernel.BaremetalKernel._compile
RuntimeError: Compilation failed for fused_conv2d_maxpool with error Command '['neuronx-cc', 'compile', '--framework', 'XLA', 'penguin.py', '--internal-tensorizer-opt-level=nki', '--pipeline', 'compile', 'SaveTemps', '--target', 'trn1', '--disable-internal-io-dge', '--output=file.neff']' returned non-zero exit status 70.

Not sure what to do. I restarted my AWS instance and tried to run again but still getting the error.

@AWSNB
Copy link
Contributor

AWSNB commented Dec 6, 2024

hi @kpodosin

sorry you are getting this issue, please see couple of other threads that had TEN404 error to see if those tips and suggestions work for you:

#1055
#1052
#1049

and while we are working to make sure compiler output is much more helpful, there may be already useful errors from the compiler that may possibly be hidden by the test harness, see tip from @aws-serina-tan in : #1054 (comment)

@kpodosin
Copy link
Author

kpodosin commented Dec 8, 2024

The issue is that I was using a +=. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants