xtheadvector fp16 conversion intrinsic caused crash in optimized build #24

nihui · 2024-12-23T09:47:58Z

Xuantie-900-gcc-linux-6.6.0-glibc-x86_64-V3.0.1
Xuantie-qemu-x86_64-Ubuntu-20.04-V5.0.4-B20241127-1130

riscv64-unknown-linux-gnu-gcc -march=rv64gc_zfh_xtheadvector_xtheadc -mabi=lp64d -static fp16.c -o fp16 -O2
qemu-riscv64 -cpu c906fdv ./fp16

output

a = -8  -7  -6  -5  -4  -3  -2  -1  0  1  2  3  4  5  6  7  
a = -8  -7  -6  -5  -4  -3  -2  -1  0  1  2  3  4  5  6  7  
Segmentation fault (core dumped)

minimal test

#include <stdio.h>
#include <riscv_vector.h>

int main()
{
    float a[16] = {-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7};

    fprintf(stderr, "a = ");
    for (int i=0; i<16; i++)
    {
        fprintf(stderr, "%.0f  ", a[i]);
    }
    fprintf(stderr, "\n");

    float* ptr = a;

    int n = 16;
    while (n > 0)
    {
        size_t vl = __riscv_vsetvl_e32m8(n);

        // fp32 -> fp16 -> fp32
        vfloat32m8_t _p = __riscv_vle32_v_f32m8(ptr, vl);
        vfloat16m4_t _half = __riscv_vfncvt_f_f_w_f16m4(_p, vl);
        vfloat32m8_t _out = __riscv_vfwcvt_f_f_v_f32m8(_half, vl);
        __riscv_vse32_v_f32m8(ptr, _out, vl);

        ptr += vl;
        n -= vl;
    }

    fprintf(stderr, "a = ");
    for (int i=0; i<16; i++)
    {
        fprintf(stderr, "%.0f  ", a[i]);
    }
    fprintf(stderr, "\n");

    return 0;
}

nihui · 2024-12-23T10:47:25Z

compiling for c908v with rvv-1.0 + zvfh works fine, using the same compiler and qemu

riscv64-unknown-linux-gnu-gcc -march=rv64gcv_zfh_zvfh_xtheadc -mabi=lp64d -static fp16.c -o fp16-v -O2
qemu-riscv64 -cpu c908v ./fp16-v

pz9115 · 2024-12-23T10:59:53Z

I want to reproduce the error, can you help me to check is this case right?

/topt/xt/lib/gcc/riscv64-unknown-linux-gnu/10.4.0/include/riscv_vector.h:34:2: error: #error "Vector intrinsics require the vector extension."
   34 | #error "Vector intrinsics require the vector extension."
      |  ^~~~~
fp16.c: In function 'main':
fp16.c:20:21: warning: implicit declaration of function '__riscv_vsetvl_e32m8' [-Wimplicit-function-declaration]
   20 |         size_t vl = __riscv_vsetvl_e32m8(n);
      |                     ^~~~~~~~~~~~~~~~~~~~
fp16.c:23:9: error: unknown type name 'vfloat32m8_t'
   23 |         vfloat32m8_t _p = __riscv_vle32_v_f32m8(ptr, vl);
      |         ^~~~~~~~~~~~
fp16.c:23:27: warning: implicit declaration of function '__riscv_vle32_v_f32m8' [-Wimplicit-function-declaration]
   23 |         vfloat32m8_t _p = __riscv_vle32_v_f32m8(ptr, vl);
      |                           ^~~~~~~~~~~~~~~~~~~~~
fp16.c:24:9: error: unknown type name 'vfloat16m4_t'
   24 |         vfloat16m4_t _half = __riscv_vfncvt_f_f_w_f16m4(_p, vl);
      |         ^~~~~~~~~~~~
fp16.c:24:30: warning: implicit declaration of function '__riscv_vfncvt_f_f_w_f16m4' [-Wimplicit-function-declaration]
   24 |         vfloat16m4_t _half = __riscv_vfncvt_f_f_w_f16m4(_p, vl);
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~
fp16.c:25:9: error: unknown type name 'vfloat32m8_t'
   25 |         vfloat32m8_t _out = __riscv_vfwcvt_f_f_v_f32m8(_half, vl);
      |         ^~~~~~~~~~~~
fp16.c:25:29: warning: implicit declaration of function '__riscv_vfwcvt_f_f_v_f32m8' [-Wimplicit-function-declaration]
   25 |         vfloat32m8_t _out = __riscv_vfwcvt_f_f_v_f32m8(_half, vl);
      |                             ^~~~~~~~~~~~~~~~~~~~~~~~~~
fp16.c:26:9: warning: implicit declaration of function '__riscv_vse32_v_f32m8' [-Wimplicit-function-declaration]
   26 |         __riscv_vse32_v_f32m8(ptr, _out, vl);
      |         ^~~~~~~~~~~~~~~~~~~~~

nihui · 2024-12-23T11:34:16Z

toolchain download https://www.xrvm.cn/community/download?id=4382928864901402624
qemu download https://www.xrvm.cn/community/download?id=4382612884035801088

nihui · 2024-12-23T11:41:03Z

This issue can also be reproduced using upstream gcc without xtheadc extension, optimized

gcc git commit b43bb6591f7f934f9807a2cae3b53fdbe8d27169

riscv64-unknown-linux-gnu-gcc -march=rv64gc_xtheadvector fp16.c -o fp16 -O2 -static
qemu-riscv64 -cpu c906fdv ./fp16
a = -8  -7  -6  -5  -4  -3  -2  -1  0  1  2  3  4  5  6  7  
a = -8  -7  -6  -5  -4  -3  -2  -1  0  1  2  3  4  5  6  7  
Segmentation fault (core dumped)

pz9115 · 2024-12-23T16:00:20Z

It seems have the alignment problem, when I set __attribute__((aligned(64))) at a, -O2 passed, will check the detail later.

#include <stdio.h>
#include <riscv_vector.h>

int main()
{
    float a[16] __attribute__((aligned(64))) = {-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7};

    fprintf(stderr, "a = ");
    for (int i=0; i<16; i++)
    {
        fprintf(stderr, "%.0f  ", a[i]);
    }
    fprintf(stderr, "\n");

    float* ptr = a;

    int n = 16;
    while (n > 0)
    {
        size_t vl = __riscv_vsetvl_e32m8(n);

        // fp32 -> fp16 -> fp32
        vfloat32m8_t _p = __riscv_vle32_v_f32m8(ptr, vl);
        vfloat16m4_t _half = __riscv_vfncvt_f_f_w_f16m4(_p, vl);
        vfloat32m8_t _out = __riscv_vfwcvt_f_f_v_f32m8(_half, vl);
        __riscv_vse32_v_f32m8(ptr, _out, vl);

        ptr += vl;
        n -= vl;
    }

    fprintf(stderr, "a = ");
    for (int i=0; i<16; i++)
    {
        fprintf(stderr, "%.0f  ", a[i]);
    }
    fprintf(stderr, "\n");

    return 0;
}

majin2020 · 2024-12-23T17:18:09Z

I'm very sorry to let you encounter this problem, because in the new version of the compiler we as much as possible to reuse the community's RVV 1.0 framework, so we need to deal with many differences 1.0 RVV. Although we dealt with it as best we could, there were obviously still some omissions.

Regarding this problem, it is because the XTheadVector did not handle the logic of vsetvl properly and still simply reused RVV 1.0. In RVV 1.0, the vsetvli zero,zero,e32,m8 indicates that avl does not change (avl = 16), but in XTheadVector, it indicates that avl takes the maximum value (avl = 32). This results in out-of-bounds occurring when the vector var is stored. We will fix it as soon as possible.

Thanks again.

nihui · 2025-01-08T13:58:40Z

reported to upstream https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118357

nihui · 2025-01-14T05:09:34Z

gcc-mirror/gcc@8d577a0 may fix this

majin2020 · 2025-01-16T08:19:41Z

gcc-mirror/gcc@8d577a0 may fix this

Sorry. The patch is to solve the bug of RVV 1.0, not for this issue. I will try to fix this problem in the near future.

majin2020 · 2025-01-21T13:19:16Z

Hi, @nihui

At present, the upstream has been repaired, and the xuantie-gnu-toolchain will also be repaired in the next version. Thanks.

Best regards,
Jin

nihui mentioned this issue Dec 23, 2024

update ruapu to detect zfh zvfh xtheadvector Tencent/ncnn#5841

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xtheadvector fp16 conversion intrinsic caused crash in optimized build #24

xtheadvector fp16 conversion intrinsic caused crash in optimized build #24

nihui commented Dec 23, 2024

nihui commented Dec 23, 2024

pz9115 commented Dec 23, 2024 •

edited

Loading

nihui commented Dec 23, 2024

nihui commented Dec 23, 2024 •

edited

Loading

pz9115 commented Dec 23, 2024 •

edited

Loading

majin2020 commented Dec 23, 2024 •

edited

Loading

nihui commented Jan 8, 2025

nihui commented Jan 14, 2025

majin2020 commented Jan 16, 2025

majin2020 commented Jan 21, 2025

xtheadvector fp16 conversion intrinsic caused crash in optimized build #24

xtheadvector fp16 conversion intrinsic caused crash in optimized build #24

Comments

nihui commented Dec 23, 2024

nihui commented Dec 23, 2024

pz9115 commented Dec 23, 2024 • edited Loading

nihui commented Dec 23, 2024

nihui commented Dec 23, 2024 • edited Loading

pz9115 commented Dec 23, 2024 • edited Loading

majin2020 commented Dec 23, 2024 • edited Loading

nihui commented Jan 8, 2025

nihui commented Jan 14, 2025

majin2020 commented Jan 16, 2025

majin2020 commented Jan 21, 2025

pz9115 commented Dec 23, 2024 •

edited

Loading

nihui commented Dec 23, 2024 •

edited

Loading

pz9115 commented Dec 23, 2024 •

edited

Loading

majin2020 commented Dec 23, 2024 •

edited

Loading