Reduction, missing indices in dst access #1055

andxalex · 2024-12-05T07:06:16Z

Hello, just saw on #1052 that the intended way to perform reduction is as such:

a[...] = nl.add(a,b)

However performing that change gives a new error:

  File "/home/ubuntu/CS149/asst4-trainium/part2/conv2d.py", line 144, in fused_conv2d_maxpool
    out_row[...] = nl.add(out_row,nl.matmul(x = weights_copy[ii, jj, k, n, :, :],
SyntaxError: Unexpected output dependencies, missing indices in the dst access: ii, jj

I am unsure as to what this error means, and also why it is only generated when performing reduction as above, and not as a[...] += b

private repo link:

https://github.com/andxalex/CS149/blob/main/asst4-trainium/part2/conv2d.py

Changing to a[...] += b generates:

    [TEN404] Internal tensorizer error: BirCodeGenLoop:tensorcopy src start_partition(0) or dst start_partition(ii) is not multiple of 
    partitions_per_bank (32). tensorcopy:           float32<1 x 1> TongaSB partitions[2] float32 [3, 3, 128, 128] 
    %'weights_copy'[height_i,width_i,ii,jj] = tensor_copy(float32<1 x 1> TongaSB partitions[0] float32 [1, 147456] 
    %'weights.6'[0,ii,0,jj,height_i,width_i]) # id=8, , src_id=None, instances=147456 # dl = tensor_op_name:  |  [[];[]] -> [[];[]]   - 
    Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain 
    more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.

The text was updated successfully, but these errors were encountered:

AWSNB · 2024-12-05T07:51:32Z

can you give @AWSNB permission

andxalex · 2024-12-05T08:16:48Z

AWSNB

Added

aws-qieqingy · 2024-12-05T17:17:00Z

Hi! It is correct to use out_row[...] += nl.add(...).

The BIRCodegenLoop error comes from the following loop:

# Need to move dimensions around as such
    for height_i in nl.affine_range(filter_height):
        for width_i in nl.affine_range(filter_width):
            for out_i in nl.affine_range(n_tiles_c_out):
                for in_i in nl.affine_range(n_tiles_c_in):
                    for ii in nl.affine_range(c_out_pmax):
                        for jj in nl.affine_range(c_in_pmax):
                            breakpoint()
                            weights_copy[height_i, width_i, out_i, in_i, ii, jj] = nl.copy(weights[out_i, ii, in_i, jj, height_i, width_i])

As the error message suggest, the start partition of tensor copy must be multiply of 32. However, we are iterating through the partition dimension of weights_copy with a loop ii, which produces the error.

In general, we should not iterate through the partition dimension. Instead, we should copy/load into the partition dimension in batch.

andxalex · 2024-12-05T18:22:01Z

thanks @aws-qieqingy , would this be the intended way to perform the copy?

    # Need to move dimensions around as such
    for height_i in nl.affine_range(filter_height):
        for width_i in nl.affine_range(filter_width):
            for out_i in nl.affine_range(n_tiles_c_out):
                for in_i in nl.affine_range(n_tiles_c_in):
                    # weights_copy[height_i, width_i, out_i, in_i, :, :] = nl.copy(weights[out_i, :, in_i, :, height_i, width_i])
                    for jj in nl.affine_range(c_in_pmax):
                        weights_copy[height_i, width_i, out_i, in_i, :, jj] = nl.copy(weights[out_i, :, in_i, jj, height_i, width_i])

aws-qieqingy · 2024-12-05T18:27:12Z

It is still not quite right. We should also copy the free dimension in batch. In addition, the partition dimension of weights is the first dimension, which might need to be revised as well.

aws-zhehongb · 2024-12-05T21:41:04Z

also check this out https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/nki/api/nki.errors.html#err-unexpected-output-dependencies

andxalex · 2024-12-05T21:52:59Z

Thank you so much for your time all @aws-zhehongb @aws-qieqingy ,the code executes correctly when using --simulate now.

However, dropping the simulation flag produces different results. I've tried to avoid any inferred dependencies by replacing all loops with nl.sequential_range, though this didn't fix the problem.

I would greatly appreciate any pointers! (#1051 is similar so moving there)

aws-zhehongb · 2024-12-05T21:56:04Z

how could we reproduce your error? did you shared your private repo ?

andxalex · 2024-12-05T22:02:39Z

how could we reproduce your error? did you shared your private repo ?

Updated comment in #1051

ggumen added the NKI label Dec 5, 2024

aws-serina-tan mentioned this issue Dec 5, 2024

Internal Tensorizer Error #1052

Open

andxalex closed this as completed Dec 5, 2024

AWSNB mentioned this issue Dec 6, 2024

Stanford CS149 Assignment 4 getting red error [TEN404] Internal tensorizer error: TensorInitialization:Expect NeuronReduceMacro! #1058

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduction, missing indices in dst access #1055

Reduction, missing indices in dst access #1055

andxalex commented Dec 5, 2024 •

edited

Loading

AWSNB commented Dec 5, 2024

andxalex commented Dec 5, 2024

aws-qieqingy commented Dec 5, 2024

andxalex commented Dec 5, 2024 •

edited

Loading

aws-qieqingy commented Dec 5, 2024

aws-zhehongb commented Dec 5, 2024

andxalex commented Dec 5, 2024

aws-zhehongb commented Dec 5, 2024

andxalex commented Dec 5, 2024

Reduction, missing indices in dst access #1055

Reduction, missing indices in dst access #1055

Comments

andxalex commented Dec 5, 2024 • edited Loading

AWSNB commented Dec 5, 2024

andxalex commented Dec 5, 2024

aws-qieqingy commented Dec 5, 2024

andxalex commented Dec 5, 2024 • edited Loading

aws-qieqingy commented Dec 5, 2024

aws-zhehongb commented Dec 5, 2024

andxalex commented Dec 5, 2024

aws-zhehongb commented Dec 5, 2024

andxalex commented Dec 5, 2024

andxalex commented Dec 5, 2024 •

edited

Loading

andxalex commented Dec 5, 2024 •

edited

Loading