Internal Tensorizer Error #1052

ewang675 · 2024-12-05T01:10:49Z

We are getting an Internal Tensorizer Error. The error logs are copied below.

[TEN404] Internal tensorizer error: TensorInitialization:Expect NeuronReduceMacro! - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables. Traceback (most recent call last): File "/home/ubuntu/asst4-trainium/part2/test_harness.py", line 219, in <module> test_result = test_correctness_conv2d_kernel( File "/home/ubuntu/asst4-trainium/part2/test_harness.py", line 85, in test_correctness_conv2d_kernel out = kernel(*args, **kwargs) File "neuronxcc/nki/compile.py", line 92, in neuronxcc.nki.compile.GenericKernel.__call__ File "neuronxcc/starfish/penguin/targets/nki/TraceKernel.py", line 174, in neuronxcc.starfish.penguin.targets.nki.TraceKernel.Kernel.__call__ File "neuronxcc/starfish/penguin/targets/nki/TraceKernel.py", line 422, in neuronxcc.starfish.penguin.targets.nki.TraceKernel.BaremetalKernel.post_process_call File "neuronxcc/starfish/penguin/targets/nki/TraceKernel.py", line 425, in neuronxcc.starfish.penguin.targets.nki.TraceKernel.BaremetalKernel.post_process_call File "neuronxcc/starfish/penguin/targets/nki/TraceKernel.py", line 508, in neuronxcc.starfish.penguin.targets.nki.TraceKernel.BaremetalKernel._compile RuntimeError: Compilation failed for fused_conv2d_maxpool with error Command '['neuronx-cc', 'compile', '--framework', 'XLA', 'penguin.py', '--internal-tensorizer-opt-level=nki', '--pipeline', 'compile', 'SaveTemps', '--target', 'trn1', '--disable-internal-io-dge', '--output=file.neff']' returned non-zero exit status 70.

The source code is attached. The issue seems to be in the conv2d.py file on line 173 on the call to nl.max(...).

The code can be run using the command:
python3 test_harness.py --test_maxpool

Please use the --test_maxpool flag; without it, the relevant segment of code will not run and the error will not be produced.

part2.zip

The text was updated successfully, but these errors were encountered:

AWSNB · 2024-12-05T02:35:17Z

@ewang675 couple of quick things to try:

change out_tile += broadcasted_bias to out_tile = out_tile + broadcasted_bias
(we likely have bug in += when doing in place for sbuf or outside affine loop )

in general, change all the a +=b to a = a+b for now

ewang675 · 2024-12-05T03:36:16Z

Hello, thank you! when I make this change, I think the first 'a' in 'a = a + b' gets interpreted as a new variable. For example:

res_psum = nl.zeros((TILE_C_OUT, TILE_H * out_width), nl.float32, buffer=nl.psum)
for k in nl.affine_range(n_tiles_c_in):

.... [other code] ...
res_psum = res_psum + nl.matmul(weight_[k, :, :, i, j], rhs_tile, transpose_x=True)

conv_result += nl.copy(res_psum, dtype=X_out.dtype)`

I get the error:
conv_result += nl.copy(res_psum, dtype=X_out.dtype) SyntaxError: local variable 'res_psum' is referenced outside of its parent scope (loop k at f/home/ubuntu/asst4-trainium/part2/conv2d.py:149)!

Similar errors occur for replacing 'a += b' with 'a = a + b' in other places

AWSNB · 2024-12-05T03:41:13Z

@aws-zhehongb ^^

AWSNB · 2024-12-05T04:00:12Z

@ewang675 i asked our experts to look at it
but my initial suspect that += nl.copy() is the issue

could you try:
// copy to sbuf
temp_result = nl.copy(res_psum, dtype=X_out.dtype)
// add temp to accumulated results
conv_result = nl.add(conv_result, temp_result)

ewang675 · 2024-12-05T04:35:51Z

Hello, thank you for taking a look! We actually managed to get things working with your initial suggestion -- out_tile = out_tile + bias still worked even if the first out_tile was seen as a distinct object of some kind (since this wasn't in an inner loop). However, for the specific case named above, changing the code to:

res_psum = nl.zeros((TILE_C_OUT, TILE_H * out_width), nl.float32, buffer=nl.psum)

for k in nl.affine_range(n_tiles_c_in):

.... [other code] ...
res_psum = res_psum + nl.matmul(weight_[k, :, :, i, j], rhs_tile, transpose_x=True)

temp_result = nl.copy(res_psum, dtype=X_out.dtype)

conv_result = nl.add(conv_result, temp_result)

results in a different error:
SyntaxError: local variable 'conv_result' is referenced outside of its parent scope (loop j at f/home/ubuntu/asst4-trainium/part2/conv2d.py:145)!

(for context, conv_result is declared outside the loop this whole block takes place in, and is referenced after this loop)

Really appreciate your help above, though!

AWSNB · 2024-12-05T05:36:12Z

@aws-zhehongb

aws-serina-tan · 2024-12-05T05:36:21Z

To assign to the original declared tensor, we would need conv_result[...] = nl.add(conv_result, res_psum). Can you give that a try?

AWSNB · 2024-12-05T06:00:48Z

@ewang675 the following info explain many of the issues you saw so far, and matches your understanding of the error:

a = nl.copy will create a new variable a.
If you want to assign to a pre-declared variable a that was created using ndarray, nl.zero etc., you should use a[...] = nl.copy

there are few places like this in your code

AWSNB · 2024-12-05T21:05:27Z

@ewang675 did it work ?

ewang675 · 2024-12-05T21:13:32Z

I'm getting a different error now:

conv_result[...] = conv_result + nl.copy(res_psum, dtype=X_out.dtype)

SyntaxError: Unexpected output dependencies, missing indices in the dst access: j, i

This error doesn't happen with:

conv_result += nl.copy(res_psum, dtype=X_out.dtype)

aws-zhehongb · 2024-12-05T21:27:39Z

could you change i and j to sequential_range?

aws-serina-tan · 2024-12-05T21:27:55Z

New error related to #1055 ?

aws-zhehongb · 2024-12-05T21:32:31Z

generic reduction like

conv_result[...] = conv_result + nl.copy(res_psum, dtype=X_out.dtype)

is considered as loop-carried dependency, if we see

SyntaxError: Unexpected output dependencies, missing indices in the dst access: j, i

check this out: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/nki/api/nki.errors.html#err-unexpected-output-dependencies

andxalex mentioned this issue Dec 5, 2024

Reduction, missing indices in dst access #1055

Closed

ggumen added the NKI label Dec 5, 2024

AWSNB mentioned this issue Dec 6, 2024

Stanford CS149 Assignment 4 getting red error [TEN404] Internal tensorizer error: TensorInitialization:Expect NeuronReduceMacro! #1058

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal Tensorizer Error #1052

Internal Tensorizer Error #1052

ewang675 commented Dec 5, 2024 •

edited

Loading

AWSNB commented Dec 5, 2024

ewang675 commented Dec 5, 2024 •

edited

Loading

AWSNB commented Dec 5, 2024

AWSNB commented Dec 5, 2024

ewang675 commented Dec 5, 2024 •

edited

Loading

AWSNB commented Dec 5, 2024

aws-serina-tan commented Dec 5, 2024

AWSNB commented Dec 5, 2024

AWSNB commented Dec 5, 2024

ewang675 commented Dec 5, 2024 •

edited

Loading

aws-zhehongb commented Dec 5, 2024

aws-serina-tan commented Dec 5, 2024

aws-zhehongb commented Dec 5, 2024

Internal Tensorizer Error #1052

Internal Tensorizer Error #1052

Comments

ewang675 commented Dec 5, 2024 • edited Loading

AWSNB commented Dec 5, 2024

ewang675 commented Dec 5, 2024 • edited Loading

AWSNB commented Dec 5, 2024

AWSNB commented Dec 5, 2024

ewang675 commented Dec 5, 2024 • edited Loading

AWSNB commented Dec 5, 2024

aws-serina-tan commented Dec 5, 2024

AWSNB commented Dec 5, 2024

AWSNB commented Dec 5, 2024

ewang675 commented Dec 5, 2024 • edited Loading

aws-zhehongb commented Dec 5, 2024

aws-serina-tan commented Dec 5, 2024

aws-zhehongb commented Dec 5, 2024

ewang675 commented Dec 5, 2024 •

edited

Loading

ewang675 commented Dec 5, 2024 •

edited

Loading

ewang675 commented Dec 5, 2024 •

edited

Loading

ewang675 commented Dec 5, 2024 •

edited

Loading