Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduction, missing indices in dst access #1055

Closed
andxalex opened this issue Dec 5, 2024 · 9 comments
Closed

Reduction, missing indices in dst access #1055

andxalex opened this issue Dec 5, 2024 · 9 comments
Labels

Comments

@andxalex
Copy link

andxalex commented Dec 5, 2024

Hello, just saw on #1052 that the intended way to perform reduction is as such:

  • a[...] = nl.add(a,b)

However performing that change gives a new error:

  File "/home/ubuntu/CS149/asst4-trainium/part2/conv2d.py", line 144, in fused_conv2d_maxpool
    out_row[...] = nl.add(out_row,nl.matmul(x = weights_copy[ii, jj, k, n, :, :],
SyntaxError: Unexpected output dependencies, missing indices in the dst access: ii, jj

I am unsure as to what this error means, and also why it is only generated when performing reduction as above, and not as a[...] += b

private repo link:

https://github.com/andxalex/CS149/blob/main/asst4-trainium/part2/conv2d.py

Changing to a[...] += b generates:

    [TEN404] Internal tensorizer error: BirCodeGenLoop:tensorcopy src start_partition(0) or dst start_partition(ii) is not multiple of 
    partitions_per_bank (32). tensorcopy:           float32<1 x 1> TongaSB partitions[2] float32 [3, 3, 128, 128] 
    %'weights_copy'[height_i,width_i,ii,jj] = tensor_copy(float32<1 x 1> TongaSB partitions[0] float32 [1, 147456] 
    %'weights.6'[0,ii,0,jj,height_i,width_i]) # id=8, , src_id=None, instances=147456 # dl = tensor_op_name:  |  [[];[]] -> [[];[]]   - 
    Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain 
    more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.
@AWSNB
Copy link
Contributor

AWSNB commented Dec 5, 2024

can you give @AWSNB permission

@andxalex
Copy link
Author

andxalex commented Dec 5, 2024

AWSNB

Added

@aws-qieqingy
Copy link
Contributor

Hi! It is correct to use out_row[...] += nl.add(...).

The BIRCodegenLoop error comes from the following loop:

# Need to move dimensions around as such
    for height_i in nl.affine_range(filter_height):
        for width_i in nl.affine_range(filter_width):
            for out_i in nl.affine_range(n_tiles_c_out):
                for in_i in nl.affine_range(n_tiles_c_in):
                    for ii in nl.affine_range(c_out_pmax):
                        for jj in nl.affine_range(c_in_pmax):
                            breakpoint()
                            weights_copy[height_i, width_i, out_i, in_i, ii, jj] = nl.copy(weights[out_i, ii, in_i, jj, height_i, width_i])

As the error message suggest, the start partition of tensor copy must be multiply of 32. However, we are iterating through the partition dimension of weights_copy with a loop ii, which produces the error.

In general, we should not iterate through the partition dimension. Instead, we should copy/load into the partition dimension in batch.

@ggumen ggumen added the NKI label Dec 5, 2024
@andxalex
Copy link
Author

andxalex commented Dec 5, 2024

thanks @aws-qieqingy , would this be the intended way to perform the copy?

    # Need to move dimensions around as such
    for height_i in nl.affine_range(filter_height):
        for width_i in nl.affine_range(filter_width):
            for out_i in nl.affine_range(n_tiles_c_out):
                for in_i in nl.affine_range(n_tiles_c_in):
                    # weights_copy[height_i, width_i, out_i, in_i, :, :] = nl.copy(weights[out_i, :, in_i, :, height_i, width_i])
                    for jj in nl.affine_range(c_in_pmax):
                        weights_copy[height_i, width_i, out_i, in_i, :, jj] = nl.copy(weights[out_i, :, in_i, jj, height_i, width_i])

@aws-qieqingy
Copy link
Contributor

It is still not quite right. We should also copy the free dimension in batch. In addition, the partition dimension of weights is the first dimension, which might need to be revised as well.

@andxalex
Copy link
Author

andxalex commented Dec 5, 2024

Thank you so much for your time all @aws-zhehongb @aws-qieqingy ,the code executes correctly when using --simulate now.

However, dropping the simulation flag produces different results. I've tried to avoid any inferred dependencies by replacing all loops with nl.sequential_range, though this didn't fix the problem.

I would greatly appreciate any pointers! (#1051 is similar so moving there)

@aws-zhehongb
Copy link

how could we reproduce your error? did you shared your private repo ?

@andxalex
Copy link
Author

andxalex commented Dec 5, 2024

how could we reproduce your error? did you shared your private repo ?

Updated comment in #1051

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants