Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xtheadvector fp16 conversion intrinsic caused crash in optimized build #24

Open
nihui opened this issue Dec 23, 2024 · 10 comments
Open

Comments

@nihui
Copy link

nihui commented Dec 23, 2024

Xuantie-900-gcc-linux-6.6.0-glibc-x86_64-V3.0.1
Xuantie-qemu-x86_64-Ubuntu-20.04-V5.0.4-B20241127-1130

riscv64-unknown-linux-gnu-gcc -march=rv64gc_zfh_xtheadvector_xtheadc -mabi=lp64d -static fp16.c -o fp16 -O2
qemu-riscv64 -cpu c906fdv ./fp16

output

a = -8  -7  -6  -5  -4  -3  -2  -1  0  1  2  3  4  5  6  7  
a = -8  -7  -6  -5  -4  -3  -2  -1  0  1  2  3  4  5  6  7  
Segmentation fault (core dumped)

minimal test

#include <stdio.h>
#include <riscv_vector.h>

int main()
{
    float a[16] = {-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7};

    fprintf(stderr, "a = ");
    for (int i=0; i<16; i++)
    {
        fprintf(stderr, "%.0f  ", a[i]);
    }
    fprintf(stderr, "\n");

    float* ptr = a;

    int n = 16;
    while (n > 0)
    {
        size_t vl = __riscv_vsetvl_e32m8(n);

        // fp32 -> fp16 -> fp32
        vfloat32m8_t _p = __riscv_vle32_v_f32m8(ptr, vl);
        vfloat16m4_t _half = __riscv_vfncvt_f_f_w_f16m4(_p, vl);
        vfloat32m8_t _out = __riscv_vfwcvt_f_f_v_f32m8(_half, vl);
        __riscv_vse32_v_f32m8(ptr, _out, vl);

        ptr += vl;
        n -= vl;
    }

    fprintf(stderr, "a = ");
    for (int i=0; i<16; i++)
    {
        fprintf(stderr, "%.0f  ", a[i]);
    }
    fprintf(stderr, "\n");

    return 0;
}
@nihui
Copy link
Author

nihui commented Dec 23, 2024

compiling for c908v with rvv-1.0 + zvfh works fine, using the same compiler and qemu

riscv64-unknown-linux-gnu-gcc -march=rv64gcv_zfh_zvfh_xtheadc -mabi=lp64d -static fp16.c -o fp16-v -O2
qemu-riscv64 -cpu c908v ./fp16-v

@pz9115
Copy link

pz9115 commented Dec 23, 2024

I want to reproduce the error, can you help me to check is this case right?

/topt/xt/lib/gcc/riscv64-unknown-linux-gnu/10.4.0/include/riscv_vector.h:34:2: error: #error "Vector intrinsics require the vector extension."
   34 | #error "Vector intrinsics require the vector extension."
      |  ^~~~~
fp16.c: In function 'main':
fp16.c:20:21: warning: implicit declaration of function '__riscv_vsetvl_e32m8' [-Wimplicit-function-declaration]
   20 |         size_t vl = __riscv_vsetvl_e32m8(n);
      |                     ^~~~~~~~~~~~~~~~~~~~
fp16.c:23:9: error: unknown type name 'vfloat32m8_t'
   23 |         vfloat32m8_t _p = __riscv_vle32_v_f32m8(ptr, vl);
      |         ^~~~~~~~~~~~
fp16.c:23:27: warning: implicit declaration of function '__riscv_vle32_v_f32m8' [-Wimplicit-function-declaration]
   23 |         vfloat32m8_t _p = __riscv_vle32_v_f32m8(ptr, vl);
      |                           ^~~~~~~~~~~~~~~~~~~~~
fp16.c:24:9: error: unknown type name 'vfloat16m4_t'
   24 |         vfloat16m4_t _half = __riscv_vfncvt_f_f_w_f16m4(_p, vl);
      |         ^~~~~~~~~~~~
fp16.c:24:30: warning: implicit declaration of function '__riscv_vfncvt_f_f_w_f16m4' [-Wimplicit-function-declaration]
   24 |         vfloat16m4_t _half = __riscv_vfncvt_f_f_w_f16m4(_p, vl);
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~
fp16.c:25:9: error: unknown type name 'vfloat32m8_t'
   25 |         vfloat32m8_t _out = __riscv_vfwcvt_f_f_v_f32m8(_half, vl);
      |         ^~~~~~~~~~~~
fp16.c:25:29: warning: implicit declaration of function '__riscv_vfwcvt_f_f_v_f32m8' [-Wimplicit-function-declaration]
   25 |         vfloat32m8_t _out = __riscv_vfwcvt_f_f_v_f32m8(_half, vl);
      |                             ^~~~~~~~~~~~~~~~~~~~~~~~~~
fp16.c:26:9: warning: implicit declaration of function '__riscv_vse32_v_f32m8' [-Wimplicit-function-declaration]
   26 |         __riscv_vse32_v_f32m8(ptr, _out, vl);
      |         ^~~~~~~~~~~~~~~~~~~~~

@nihui
Copy link
Author

nihui commented Dec 23, 2024

@nihui
Copy link
Author

nihui commented Dec 23, 2024

This issue can also be reproduced using upstream gcc without xtheadc extension, optimized

gcc git commit b43bb6591f7f934f9807a2cae3b53fdbe8d27169

riscv64-unknown-linux-gnu-gcc -march=rv64gc_xtheadvector fp16.c -o fp16 -O2 -static
qemu-riscv64 -cpu c906fdv ./fp16
a = -8  -7  -6  -5  -4  -3  -2  -1  0  1  2  3  4  5  6  7  
a = -8  -7  -6  -5  -4  -3  -2  -1  0  1  2  3  4  5  6  7  
Segmentation fault (core dumped)

@pz9115
Copy link

pz9115 commented Dec 23, 2024

It seems have the alignment problem, when I set __attribute__((aligned(64))) at a, -O2 passed, will check the detail later.

#include <stdio.h>
#include <riscv_vector.h>

int main()
{
    float a[16] __attribute__((aligned(64))) = {-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7};

    fprintf(stderr, "a = ");
    for (int i=0; i<16; i++)
    {
        fprintf(stderr, "%.0f  ", a[i]);
    }
    fprintf(stderr, "\n");

    float* ptr = a;

    int n = 16;
    while (n > 0)
    {
        size_t vl = __riscv_vsetvl_e32m8(n);

        // fp32 -> fp16 -> fp32
        vfloat32m8_t _p = __riscv_vle32_v_f32m8(ptr, vl);
        vfloat16m4_t _half = __riscv_vfncvt_f_f_w_f16m4(_p, vl);
        vfloat32m8_t _out = __riscv_vfwcvt_f_f_v_f32m8(_half, vl);
        __riscv_vse32_v_f32m8(ptr, _out, vl);

        ptr += vl;
        n -= vl;
    }

    fprintf(stderr, "a = ");
    for (int i=0; i<16; i++)
    {
        fprintf(stderr, "%.0f  ", a[i]);
    }
    fprintf(stderr, "\n");

    return 0;
}

@majin2020
Copy link

majin2020 commented Dec 23, 2024

I'm very sorry to let you encounter this problem, because in the new version of the compiler we as much as possible to reuse the community's RVV 1.0 framework, so we need to deal with many differences 1.0 RVV. Although we dealt with it as best we could, there were obviously still some omissions.

Regarding this problem, it is because the XTheadVector did not handle the logic of vsetvl properly and still simply reused RVV 1.0. In RVV 1.0, the vsetvli zero,zero,e32,m8 indicates that avl does not change (avl = 16), but in XTheadVector, it indicates that avl takes the maximum value (avl = 32). This results in out-of-bounds occurring when the vector var is stored. We will fix it as soon as possible.

Thanks again.

@nihui
Copy link
Author

nihui commented Jan 8, 2025

reported to upstream https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118357

@nihui
Copy link
Author

nihui commented Jan 14, 2025

gcc-mirror/gcc@8d577a0 may fix this

@majin2020
Copy link

gcc-mirror/gcc@8d577a0 may fix this

Sorry. The patch is to solve the bug of RVV 1.0, not for this issue. I will try to fix this problem in the near future.

@majin2020
Copy link

Hi, @nihui

At present, the upstream has been repaired, and the xuantie-gnu-toolchain will also be repaired in the next version. Thanks.

Best regards,
Jin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants