Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Add pass to finish SROA for cooperative matrices #15038

Merged
merged 8 commits into from
Sep 3, 2024

Conversation

MrSidims
Copy link
Contributor

SROA has troubles processing structures containing TargetExtType, which in some cases will lead to a situation, when OpAccessChain would attempt to access a structure containing a cooperative matrix, but with indexes set to access the matrix itself, which is invalid. This PR adds routine that finds alloca to such structure and replaces it with alloca to cooperative matrix type.

@MrSidims
Copy link
Contributor Author

MrSidims commented Aug 12, 2024

@bwlodarcz @VyacheslavLevytskyy for some reasons can't add you as reviewers, yet please take a look when convenient

Instruction *Ptr =
dyn_cast<Instruction>(CI->getArgOperand(0)->stripPointerCasts());
StructType *WrapperMatrixTy =
dyn_cast<StructType>(cast<AllocaInst>(Ptr)->getAllocatedType());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that "CoopMatrix wrapped into a struct" covers all practical use cases, but I just wonder if it's possible to have any other nested type composed of CoopMatrix where this line would lead to a crash, because Argument #0 was not an alloca instruction wrt. the wrapped matrix type? In other words, would it make any sense to check Ptr != nullptr and dyn_cast<AllocaInst>(Ptr) != nullptr?

Copy link
Contributor Author

@MrSidims MrSidims Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because Argument #0 was not an alloca instruction wrt. the wrapped matrix type

Technically, it's unexpected, but yet possible if we store pointer in a GV and then load from it. I've added a sanity check for this. In real life if this appear translator will still fail to translate the code, but it would report an error during OpAccessChain validation.

but I just wonder if it's possible to have any other nested type composed of CoopMatrix where this line would lead to a crash

When we have matrix in a structure in a structure SROA will manage to replace the outer structure with the internal one (yet without promoting cooperative matrix). With O0 it of cause won't happen, but we still won't crash here as yet for the outer structure we would have an alloca. MatrixTy in this case will be nullptr, so finishSROACooperativeMatrix would do nothing and the translator would still again fail during OpAccessChain validation. I'm not keen to handle this non-practical case for a W/A, instead I'd rather fix LLVM SROA, but it requires more time.

@@ -288,6 +288,49 @@ void SPIRVRegularizeLLVMBase::expandVIDWithSYCLTypeByValComp(Function *F) {
nullptr, &Attrs, true);
}

// intel/llvm customization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we make it a part of the upstream translator? If we can't, we should have a tracker somewhere listing all our translator customizations, so this PR should be recorded there as a reminder for us to revert it.

What about turning this into a pass which we can run in our device compilation pipeline instead of hacking it into the translator?

Copy link
Contributor Author

@MrSidims MrSidims Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If/when landed I'll definitely add this PR to #7592

Why can't we make it a part of the upstream translator?

I'm not sure about this approach. We definitely need to do something for matrix unwrapping just for -O0 case, may be a pass in sycl-post-link or in the translator or alternative for both of them is adding some sort of the translator builtin (like __translate_sampler_initializer and others in OpenCL) with the translator inserting the definition of this builtin. I'd like to try unwrapping like this first without introducing new LLVM IR entity like translator builtin, but want to ensure that such solution is stable.

What about turning this into a pass which we can run in our device compilation pipeline instead of hacking it into the translator?

I haven't though about it tbh, can move it there.

SROA has troubles processing structures containing TargetExtType,
which in some cases will lead to a situation, when OpAccessChain would
attempt to access a structure containing a cooperative matrix, but with
indexes set to access the matrix itself, which is invalid. This PR adds
routine that finds alloca to such structure and replaces it with alloca
to cooperative matrix type.

Signed-off-by: Sidorov, Dmitry <[email protected]>
Signed-off-by: Sidorov, Dmitry <[email protected]>
Signed-off-by: Sidorov, Dmitry <[email protected]>
Signed-off-by: Sidorov, Dmitry <[email protected]>
@MrSidims MrSidims force-pushed the wa-access-chain-sroa branch from 425dc6a to 2afe786 Compare August 26, 2024 20:16
@MrSidims MrSidims requested review from a team as code owners August 26, 2024 20:16
@MrSidims MrSidims changed the title [SPIR-V] Add W/A to finish SROA for cooperative matrices [SYCL] Add pass to finish SROA for cooperative matrices Aug 26, 2024
Signed-off-by: Sidorov, Dmitry <[email protected]>
@MrSidims
Copy link
Contributor Author

@AlexeySachkov thanks for the suggestions! Applied comments and move to be a pass executed during sycl-post-link.

Signed-off-by: Sidorov, Dmitry <[email protected]>
@MrSidims
Copy link
Contributor Author

@intel/dpcpp-tools-reviewers @intel/dpcpp-esimd-reviewers please take a look

// from sycl::joint_matrix class object if it's used in __spirv_AccessChain
// function call. It's necessary because otherwise OpAccessChain indices would
// be wrong.
bool transformAccessChain(Function *F) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently it's the only function, but I see few other candidates in the translator that might be moved into this pass (of cause if we decide to introduce it)

Copy link
Contributor

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no flags for esimd

@sarnex sarnex requested a review from a team August 29, 2024 14:08
//===----------------------------------------------------------------------===//
//
// A transformation pass which mutates Joint Matrix builtin calls to make them
// conformat with SPIR-V friendly LLVM IR specification.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// conformat with SPIR-V friendly LLVM IR specification.
// conformant with SPIR-V friendly LLVM IR specification.

@@ -795,6 +796,10 @@ processInputModule(std::unique_ptr<Module> M) {
if (isModuleUsingAsan(*M))
Modified |= runModulePass<SanitizeDeviceGlobalPass>(*M);

// Transform Joint Matrix builtin calls to align them with SPIR-V friendly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we need to invoke this pass inside sycl-post-link? Can we not run this pass in the standard LLVM pipeline?

Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to suggestions if something matches better, but the considerations are the following:

  1. If optimizations are enabled I want this transformation to happen in the end of the pipeline;
  2. If optimizations are disabled I want this transformation anyway.

In both cases sycl-post-link seems like a good match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is reasonable. We will eventually be trying to refactor sycl-post-link anyways.
Thanks Dmitry

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to suggestions if something matches better, but the considerations are the following:

  1. If optimizations are enabled I want this transformation to happen in the end of the pipeline;
  2. If optimizations are disabled I want this transformation anyway.

In both cases sycl-post-link seems like a good match.

Both can be achieved with a standard optimization pipeline, we have a whole group of such passes:

if (LangOpts.SYCLIsDevice) {
MPM.addPass(SYCLMutatePrintfAddrspacePass());
if (LangOpts.EnableDAEInSpirKernels)
MPM.addPass(DeadArgumentEliminationSYCLPass());
// Rerun aspect propagation without warning diagnostics.
MPM.addPass(
SYCLPropagateAspectsUsagePass(/*FP64ConvEmu=*/CodeGenOpts.FP64ConvEmu,
/*ExcludeAspects=*/{},
/*ValidateAspects=*/false));
// Add attribute corresponding to optimization level.
MPM.addPass(SYCLAddOptLevelAttributePass(CodeGenOpts.OptimizationLevel));
// Add SPIRITTAnnotations pass to the pass manager if
// -fsycl-instrument-device-code option was passed. This option can be
// used only with spir or spirv triple.
if (CodeGenOpts.SPIRITTAnnotations) {
assert(
TargetTriple.isSPIROrSPIRV() &&
"ITT annotations can only be added to a module with spir target");
MPM.addPass(SPIRITTAnnotationsPass());
}
// Allocate static local memory in SYCL kernel scope for each allocation
// call.
MPM.addPass(SYCLLowerWGLocalMemoryPass());
// Process properties and annotations
MPM.addPass(CompileTimePropertiesPass());
// Record SYCL aspect names (this should come after propagating aspects
// and before cleaning up metadata)
MPM.addPass(RecordSYCLAspectNamesPass());
if (TargetTriple.isNVPTX())
MPM.addPass(SYCLCreateNVVMAnnotationsPass());
// Remove SYCL metadata added by the frontend, like sycl_aspects
// Note, this pass should be at the end of the pipeline
MPM.addPass(CleanupSYCLMetadataPass());
}

@@ -0,0 +1,78 @@
//===- SYCLProcessJointMatrix.cpp - SYCL Joint Matrix transformation Pass -===//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: May be we can rename this as 'SYCLJointMatrixTransform.cpp'? I am not too attached to it. I am ok to leave it as it is. Just a thought. Thanks

Copy link
Contributor

@asudarsa asudarsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one question about where to invoke this pass from.

Thanks

Copy link
Contributor

@asudarsa asudarsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for clarification.

Signed-off-by: Sidorov, Dmitry <[email protected]>
@MrSidims
Copy link
Contributor Author

MrSidims commented Sep 2, 2024

@intel/llvm-gatekeepers please help with merge

@steffenlarsen
Copy link
Contributor

Failure on Windows is after testing and is infrastructural.

@steffenlarsen steffenlarsen merged commit 8730002 into intel:sycl Sep 3, 2024
12 of 13 checks passed
@sarnex
Copy link
Contributor

sarnex commented Sep 3, 2024

@MrSidims Seeing postcommit XPASS failures on DG2, can you take a look?

https://github.com/intel/llvm/actions/runs/10678654371/job/29596502570

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants