[CIR][CIRGen][Builtin][Neon] Lower neon_vabs_v and neon_vabsq_v #1081

ghehg · 2024-11-07T19:38:21Z

Now implement the same as OG, which is to call llvm aarch64 intrinsic which would eventually become an ARM64 instruction.
However, clearly there is an alternative, which is to extend CIR::AbsOp and CIR::FAbsOp to support vector type and only lower it at LLVM Lowering stage to either LLVM::FAbsOP or [LLVM::AbsOP ], provided LLVM dialect could do the right thing of TargetLowering by translating to llvm aarch64 intrinsic eventually.

The question is whether it is worth doing it?

Any way, put up this diff for suggestions and ideas.

bcardosolopes · 2024-11-08T20:58:55Z

The question is whether it is worth doing it?

It's always better to unify and/or make target specific things to be mapped as generic (easier to optimizers to latter understand). As the person working on this for some time, what's your take? When using C++ source that leads to CIR::AbsOp and CIR::FAbsOp with vectors against OG, does LLVM output get llvm.aarch64.neon.abs.v4i16 and llvm.aarch64.neon.abs.v8i16 if the vector size matches?

ghehg · 2024-11-10T02:10:25Z

The question is whether it is worth doing it?

It's always better to unify and/or make target specific things to be mapped as generic (easier to optimizers to latter
understand). As the person working on this for some time, what's your take?

I'm thinking about the same thing (just using CIR:AbsOp and CIR:FAbsOp and only lower to intrinsics later)

When using C++ source that leads to CIR::AbsOp and CIR::FAbsOp with vectors against OG, does LLVM output get
llvm.aarch64.neon.abs.v4i16 and llvm.aarch64.neon.abs.v8i16 if the vector size matches?

Unfortunately, just did experiment. using [LLVM::AbsOP ], gives use LLVM IR like this
%3 = call <4 x i16> @llvm.abs.v4i16(<4 x i16> %0, i1 false),
not really using neon-specific intrinsics even for triplet "aarch64-none-linux-android24".

But it might be OK, we could just do our own smart TargetLowering at LoweringPrepare and lower vector ty CIR::AbsOp and CIR::FAbsOp to CIR::LLVMIntrinsicCallOp if target supports neon. so we can take advantage of hardware neon features.

ghehg · 2024-11-10T02:41:02Z

The question is whether it is worth doing it?

It's always better to unify and/or make target specific things to be mapped as generic (easier to optimizers to latter
understand). As the person working on this for some time, what's your take?

I'm thinking about the same thing (just using CIR:AbsOp and CIR:FAbsOp and only lower to intrinsics later)

When using C++ source that leads to CIR::AbsOp and CIR::FAbsOp with vectors against OG, does LLVM output get
llvm.aarch64.neon.abs.v4i16 and llvm.aarch64.neon.abs.v8i16 if the vector size matches?

Unfortunately, just did experiment. using [LLVM::AbsOP ], gives use LLVM IR like this %3 = call <4 x i16> @llvm.abs.v4i16(<4 x i16> %0, i1 false), not really using neon-specific intrinsics even for triplet "aarch64-none-linux-android24".

But it might be OK, we could just do our own smart TargetLowering at LoweringPrepare and lower vector ty CIR::AbsOp and CIR::FAbsOp to CIR::LLVMIntrinsicCallOp if target supports neon. so we can take advantage of hardware neon features.

Very interesting.
[Traditional clang codegen seems only want to use neon abs in the case of neon abs builtin]
(https://godbolt.org/z/v8Ysxdrd1)
But,
same exact hardware instruction abs v0.4s, v0.4s is generated for both general and neon intrinsics

ghehg · 2024-11-11T03:37:31Z

My current plan is to extend AbsOp to take vector type. PR
Next, extend FAbsOp along with other FpUnaryOps to support vector type.
Then we visit this PR to decide whether we just use AbsOp and FAbsOp and lower them to generic llvm.abs/llvm.fabs which would give us a chance to optimize them in LLVM passes. Or either at code gen or lowering statge to lower them to llvm.aarch64.neon.abs, which would not be optimized away in generated assembly/machine code under -O0.

bcardosolopes · 2024-11-11T19:07:20Z

But, same exact hardware instruction abs v0.4s, v0.4s is generated for both general and neon intrinsics

Nice - as long as the final ASM is the same feels like we're good to map all those things with the same higher level CIR operation.

Then we visit this PR to decide whether we just use AbsOp and FAbsOp

Sounds good, thanks

ghehg marked this pull request as ready for review November 7, 2024 19:40

ghehg requested review from lanza and bcardosolopes as code owners November 7, 2024 19:40

Lower neon_vabs_v and neon_vabsq_v

496364c

ghehg force-pushed the macM3 branch from 3164ac8 to 496364c Compare November 8, 2024 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CIR][CIRGen][Builtin][Neon] Lower neon_vabs_v and neon_vabsq_v #1081

[CIR][CIRGen][Builtin][Neon] Lower neon_vabs_v and neon_vabsq_v #1081

ghehg commented Nov 7, 2024 •

edited

Loading

bcardosolopes commented Nov 8, 2024

ghehg commented Nov 10, 2024 •

edited

Loading

ghehg commented Nov 10, 2024 •

edited

Loading

ghehg commented Nov 11, 2024

bcardosolopes commented Nov 11, 2024

[CIR][CIRGen][Builtin][Neon] Lower neon_vabs_v and neon_vabsq_v #1081

Are you sure you want to change the base?

[CIR][CIRGen][Builtin][Neon] Lower neon_vabs_v and neon_vabsq_v #1081

Conversation

ghehg commented Nov 7, 2024 • edited Loading

bcardosolopes commented Nov 8, 2024

ghehg commented Nov 10, 2024 • edited Loading

ghehg commented Nov 10, 2024 • edited Loading

ghehg commented Nov 11, 2024

bcardosolopes commented Nov 11, 2024

ghehg commented Nov 7, 2024 •

edited

Loading

ghehg commented Nov 10, 2024 •

edited

Loading

ghehg commented Nov 10, 2024 •

edited

Loading