[SYCL] Add pass to finish SROA for cooperative matrices #15038

MrSidims · 2024-08-12T12:50:04Z

SROA has troubles processing structures containing TargetExtType, which in some cases will lead to a situation, when OpAccessChain would attempt to access a structure containing a cooperative matrix, but with indexes set to access the matrix itself, which is invalid. This PR adds routine that finds alloca to such structure and replaces it with alloca to cooperative matrix type.

MrSidims · 2024-08-12T15:09:14Z

@bwlodarcz @VyacheslavLevytskyy for some reasons can't add you as reviewers, yet please take a look when convenient

VyacheslavLevytskyy · 2024-08-12T15:42:35Z

llvm-spirv/lib/SPIRV/SPIRVRegularizeLLVM.cpp

+        Instruction *Ptr =
+            dyn_cast<Instruction>(CI->getArgOperand(0)->stripPointerCasts());
+        StructType *WrapperMatrixTy =
+            dyn_cast<StructType>(cast<AllocaInst>(Ptr)->getAllocatedType());


I guess that "CoopMatrix wrapped into a struct" covers all practical use cases, but I just wonder if it's possible to have any other nested type composed of CoopMatrix where this line would lead to a crash, because Argument #0 was not an alloca instruction wrt. the wrapped matrix type? In other words, would it make any sense to check Ptr != nullptr and dyn_cast<AllocaInst>(Ptr) != nullptr?

because Argument #0 was not an alloca instruction wrt. the wrapped matrix type

Technically, it's unexpected, but yet possible if we store pointer in a GV and then load from it. I've added a sanity check for this. In real life if this appear translator will still fail to translate the code, but it would report an error during OpAccessChain validation.

but I just wonder if it's possible to have any other nested type composed of CoopMatrix where this line would lead to a crash

When we have matrix in a structure in a structure SROA will manage to replace the outer structure with the internal one (yet without promoting cooperative matrix). With O0 it of cause won't happen, but we still won't crash here as yet for the outer structure we would have an alloca. MatrixTy in this case will be nullptr, so finishSROACooperativeMatrix would do nothing and the translator would still again fail during OpAccessChain validation. I'm not keen to handle this non-practical case for a W/A, instead I'd rather fix LLVM SROA, but it requires more time.

AlexeySachkov · 2024-08-12T20:24:46Z

llvm-spirv/lib/SPIRV/SPIRVRegularizeLLVM.cpp

@@ -288,6 +288,49 @@ void SPIRVRegularizeLLVMBase::expandVIDWithSYCLTypeByValComp(Function *F) {
      nullptr, &Attrs, true);
 }

+// intel/llvm customization


Why can't we make it a part of the upstream translator? If we can't, we should have a tracker somewhere listing all our translator customizations, so this PR should be recorded there as a reminder for us to revert it.

What about turning this into a pass which we can run in our device compilation pipeline instead of hacking it into the translator?

If/when landed I'll definitely add this PR to #7592

Why can't we make it a part of the upstream translator?

I'm not sure about this approach. We definitely need to do something for matrix unwrapping just for -O0 case, may be a pass in sycl-post-link or in the translator or alternative for both of them is adding some sort of the translator builtin (like __translate_sampler_initializer and others in OpenCL) with the translator inserting the definition of this builtin. I'd like to try unwrapping like this first without introducing new LLVM IR entity like translator builtin, but want to ensure that such solution is stable.

What about turning this into a pass which we can run in our device compilation pipeline instead of hacking it into the translator?

I haven't though about it tbh, can move it there.

llvm-spirv/lib/SPIRV/SPIRVRegularizeLLVM.cpp

SROA has troubles processing structures containing TargetExtType, which in some cases will lead to a situation, when OpAccessChain would attempt to access a structure containing a cooperative matrix, but with indexes set to access the matrix itself, which is invalid. This PR adds routine that finds alloca to such structure and replaces it with alloca to cooperative matrix type. Signed-off-by: Sidorov, Dmitry <[email protected]>

Signed-off-by: Sidorov, Dmitry <[email protected]>

MrSidims · 2024-08-26T20:25:04Z

@AlexeySachkov thanks for the suggestions! Applied comments and move to be a pass executed during sycl-post-link.

Signed-off-by: Sidorov, Dmitry <[email protected]>

MrSidims · 2024-08-29T12:05:44Z

@intel/dpcpp-tools-reviewers @intel/dpcpp-esimd-reviewers please take a look

MrSidims · 2024-08-29T12:07:26Z

llvm/lib/SYCLLowerIR/SYCLProcessJointMatrix.cpp

+// from sycl::joint_matrix class object if it's used in __spirv_AccessChain
+// function call. It's necessary because otherwise OpAccessChain indices would
+// be wrong.
+bool transformAccessChain(Function *F) {


currently it's the only function, but I see few other candidates in the translator that might be moved into this pass (of cause if we decide to introduce it)

sarnex

no flags for esimd

asudarsa · 2024-08-29T15:05:30Z

llvm/include/llvm/SYCLLowerIR/SYCLProcessJointMatrix.h

+//===----------------------------------------------------------------------===//
+//
+// A transformation pass which mutates Joint Matrix builtin calls to make them
+// conformat with SPIR-V friendly LLVM IR specification.


Suggested change

// conformat with SPIR-V friendly LLVM IR specification.

// conformant with SPIR-V friendly LLVM IR specification.

asudarsa · 2024-08-29T15:10:55Z

llvm/tools/sycl-post-link/sycl-post-link.cpp

@@ -795,6 +796,10 @@ processInputModule(std::unique_ptr<Module> M) {
  if (isModuleUsingAsan(*M))
    Modified |= runModulePass<SanitizeDeviceGlobalPass>(*M);

+  // Transform Joint Matrix builtin calls to align them with SPIR-V friendly


Is there a reason why we need to invoke this pass inside sycl-post-link? Can we not run this pass in the standard LLVM pipeline?

Thanks

I'm open to suggestions if something matches better, but the considerations are the following:

If optimizations are enabled I want this transformation to happen in the end of the pipeline;

If optimizations are disabled I want this transformation anyway.

In both cases sycl-post-link seems like a good match.

I think this is reasonable. We will eventually be trying to refactor sycl-post-link anyways.
Thanks Dmitry

I'm open to suggestions if something matches better, but the considerations are the following:

If optimizations are enabled I want this transformation to happen in the end of the pipeline;

If optimizations are disabled I want this transformation anyway.

In both cases sycl-post-link seems like a good match.

Both can be achieved with a standard optimization pipeline, we have a whole group of such passes:

llvm/clang/lib/CodeGen/BackendUtil.cpp

Lines 1125 to 1166 in 7f9e251

if (LangOpts.SYCLIsDevice) {

MPM.addPass(SYCLMutatePrintfAddrspacePass());

if (LangOpts.EnableDAEInSpirKernels)

MPM.addPass(DeadArgumentEliminationSYCLPass());

// Rerun aspect propagation without warning diagnostics.

MPM.addPass(

SYCLPropagateAspectsUsagePass(/*FP64ConvEmu=*/CodeGenOpts.FP64ConvEmu,

/*ExcludeAspects=*/{},

/*ValidateAspects=*/false));

// Add attribute corresponding to optimization level.

MPM.addPass(SYCLAddOptLevelAttributePass(CodeGenOpts.OptimizationLevel));

// Add SPIRITTAnnotations pass to the pass manager if

// -fsycl-instrument-device-code option was passed. This option can be

// used only with spir or spirv triple.

if (CodeGenOpts.SPIRITTAnnotations) {

assert(

TargetTriple.isSPIROrSPIRV() &&

"ITT annotations can only be added to a module with spir target");

MPM.addPass(SPIRITTAnnotationsPass());

}

// Allocate static local memory in SYCL kernel scope for each allocation

// call.

MPM.addPass(SYCLLowerWGLocalMemoryPass());

// Process properties and annotations

MPM.addPass(CompileTimePropertiesPass());

// Record SYCL aspect names (this should come after propagating aspects

// and before cleaning up metadata)

MPM.addPass(RecordSYCLAspectNamesPass());

if (TargetTriple.isNVPTX())

MPM.addPass(SYCLCreateNVVMAnnotationsPass());

// Remove SYCL metadata added by the frontend, like sycl_aspects

// Note, this pass should be at the end of the pipeline

MPM.addPass(CleanupSYCLMetadataPass());

}

asudarsa · 2024-08-29T15:14:51Z

llvm/lib/SYCLLowerIR/SYCLProcessJointMatrix.cpp

@@ -0,0 +1,78 @@
+//===- SYCLProcessJointMatrix.cpp - SYCL Joint Matrix transformation Pass -===//


Nit: May be we can rename this as 'SYCLJointMatrixTransform.cpp'? I am not too attached to it. I am ok to leave it as it is. Just a thought. Thanks

asudarsa

LGTM. Just one question about where to invoke this pass from.

Thanks

asudarsa

LGTM. Thanks for clarification.

Signed-off-by: Sidorov, Dmitry <[email protected]>

MrSidims · 2024-09-02T23:02:04Z

@intel/llvm-gatekeepers please help with merge

steffenlarsen · 2024-09-03T06:23:38Z

Failure on Windows is after testing and is infrastructural.

sarnex · 2024-09-03T14:34:54Z

@MrSidims Seeing postcommit XPASS failures on DG2, can you take a look?

https://github.com/intel/llvm/actions/runs/10678654371/job/29596502570

MrSidims requested a review from a team as a code owner August 12, 2024 12:50

MrSidims temporarily deployed to WindowsCILock August 12, 2024 12:51 — with GitHub Actions Inactive

MrSidims mentioned this pull request Aug 12, 2024

[SYCL][Matrix] Use KHR cooperative matrix instructions instead of Intel's #13817

Merged

MrSidims temporarily deployed to WindowsCILock August 12, 2024 13:29 — with GitHub Actions Inactive

MrSidims requested review from asudarsa and LU-JOHN August 12, 2024 14:33

VyacheslavLevytskyy reviewed Aug 12, 2024

View reviewed changes

MrSidims had a problem deploying to WindowsCILock August 12, 2024 16:37 — with GitHub Actions Failure

MrSidims temporarily deployed to WindowsCILock August 12, 2024 17:11 — with GitHub Actions Inactive

VyacheslavLevytskyy approved these changes Aug 12, 2024

View reviewed changes

AlexeySachkov reviewed Aug 12, 2024

View reviewed changes

MrSidims added 4 commits August 26, 2024 05:03

add a check

bc02347

Signed-off-by: Sidorov, Dmitry <[email protected]>

apply

803663d

Signed-off-by: Sidorov, Dmitry <[email protected]>

move to a pass

2afe786

Signed-off-by: Sidorov, Dmitry <[email protected]>

MrSidims force-pushed the wa-access-chain-sroa branch from 425dc6a to 2afe786 Compare August 26, 2024 20:16

MrSidims requested review from a team as code owners August 26, 2024 20:16

MrSidims changed the title ~~[SPIR-V] Add W/A to finish SROA for cooperative matrices~~ [SYCL] Add pass to finish SROA for cooperative matrices Aug 26, 2024

MrSidims had a problem deploying to WindowsCILock August 26, 2024 20:17 — with GitHub Actions Error

fix format

8166435

Signed-off-by: Sidorov, Dmitry <[email protected]>

MrSidims requested a review from AlexeySachkov August 26, 2024 20:22

MrSidims had a problem deploying to WindowsCILock August 26, 2024 20:24 — with GitHub Actions Failure

MrSidims temporarily deployed to WindowsCILock August 26, 2024 21:41 — with GitHub Actions Inactive

fix typo

8ef7b7e

Signed-off-by: Sidorov, Dmitry <[email protected]>

MrSidims temporarily deployed to WindowsCILock August 27, 2024 10:31 — with GitHub Actions Inactive

MrSidims temporarily deployed to WindowsCILock August 27, 2024 11:12 — with GitHub Actions Inactive

MrSidims commented Aug 29, 2024

View reviewed changes

sarnex approved these changes Aug 29, 2024

View reviewed changes

sarnex requested a review from a team August 29, 2024 14:08

asudarsa reviewed Aug 29, 2024

View reviewed changes

asudarsa approved these changes Aug 29, 2024

View reviewed changes

rename

a9ec3d5

Signed-off-by: Sidorov, Dmitry <[email protected]>

MrSidims had a problem deploying to WindowsCILock August 30, 2024 08:28 — with GitHub Actions Error

Merge remote-tracking branch 'origin/sycl' into wa-access-chain-sroa

8534879

MrSidims had a problem deploying to WindowsCILock August 30, 2024 10:16 — with GitHub Actions Failure

MrSidims had a problem deploying to WindowsCILock August 30, 2024 13:59 — with GitHub Actions Failure

MrSidims temporarily deployed to WindowsCILock September 2, 2024 14:41 — with GitHub Actions Inactive

MrSidims had a problem deploying to WindowsCILock September 2, 2024 15:22 — with GitHub Actions Failure

MrSidims had a problem deploying to WindowsCILock September 2, 2024 16:34 — with GitHub Actions Failure

MrSidims requested a review from a team September 2, 2024 23:01

steffenlarsen merged commit 8730002 into intel:sycl Sep 3, 2024
12 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Add pass to finish SROA for cooperative matrices #15038

[SYCL] Add pass to finish SROA for cooperative matrices #15038

MrSidims commented Aug 12, 2024

MrSidims commented Aug 12, 2024 •

edited

Loading

VyacheslavLevytskyy Aug 12, 2024

MrSidims Aug 12, 2024 •

edited

Loading

AlexeySachkov Aug 12, 2024

MrSidims Aug 13, 2024 •

edited

Loading

MrSidims commented Aug 26, 2024

MrSidims commented Aug 29, 2024

MrSidims Aug 29, 2024

sarnex left a comment

asudarsa Aug 29, 2024

asudarsa Aug 29, 2024

MrSidims Aug 29, 2024

asudarsa Aug 29, 2024

AlexeySachkov Sep 3, 2024

asudarsa Aug 29, 2024

asudarsa left a comment

asudarsa left a comment

MrSidims commented Sep 2, 2024

steffenlarsen commented Sep 3, 2024

sarnex commented Sep 3, 2024

	// conformat with SPIR-V friendly LLVM IR specification.
	// conformant with SPIR-V friendly LLVM IR specification.

	if (LangOpts.SYCLIsDevice) {
	MPM.addPass(SYCLMutatePrintfAddrspacePass());
	if (LangOpts.EnableDAEInSpirKernels)
	MPM.addPass(DeadArgumentEliminationSYCLPass());

	// Rerun aspect propagation without warning diagnostics.
	MPM.addPass(
	SYCLPropagateAspectsUsagePass(/FP64ConvEmu=/CodeGenOpts.FP64ConvEmu,
	/ExcludeAspects=/{},
	/ValidateAspects=/false));

	// Add attribute corresponding to optimization level.
	MPM.addPass(SYCLAddOptLevelAttributePass(CodeGenOpts.OptimizationLevel));

	// Add SPIRITTAnnotations pass to the pass manager if
	// -fsycl-instrument-device-code option was passed. This option can be
	// used only with spir or spirv triple.
	if (CodeGenOpts.SPIRITTAnnotations) {
	assert(
	TargetTriple.isSPIROrSPIRV() &&
	"ITT annotations can only be added to a module with spir target");
	MPM.addPass(SPIRITTAnnotationsPass());
	}

	// Allocate static local memory in SYCL kernel scope for each allocation
	// call.
	MPM.addPass(SYCLLowerWGLocalMemoryPass());

	// Process properties and annotations
	MPM.addPass(CompileTimePropertiesPass());

	// Record SYCL aspect names (this should come after propagating aspects
	// and before cleaning up metadata)
	MPM.addPass(RecordSYCLAspectNamesPass());

	if (TargetTriple.isNVPTX())
	MPM.addPass(SYCLCreateNVVMAnnotationsPass());

	// Remove SYCL metadata added by the frontend, like sycl_aspects
	// Note, this pass should be at the end of the pipeline
	MPM.addPass(CleanupSYCLMetadataPass());
	}

		@@ -0,0 +1,78 @@
		//===- SYCLProcessJointMatrix.cpp - SYCL Joint Matrix transformation Pass -===//

[SYCL] Add pass to finish SROA for cooperative matrices #15038

[SYCL] Add pass to finish SROA for cooperative matrices #15038

Conversation

MrSidims commented Aug 12, 2024

MrSidims commented Aug 12, 2024 • edited Loading

Choose a reason for hiding this comment

MrSidims Aug 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MrSidims Aug 13, 2024 • edited Loading

Choose a reason for hiding this comment

MrSidims commented Aug 26, 2024

MrSidims commented Aug 29, 2024

Choose a reason for hiding this comment

sarnex left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asudarsa left a comment

Choose a reason for hiding this comment

asudarsa left a comment

Choose a reason for hiding this comment

MrSidims commented Sep 2, 2024

steffenlarsen commented Sep 3, 2024

sarnex commented Sep 3, 2024

MrSidims commented Aug 12, 2024 •

edited

Loading

MrSidims Aug 12, 2024 •

edited

Loading

MrSidims Aug 13, 2024 •

edited

Loading