-
Notifications
You must be signed in to change notification settings - Fork 755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Add pass to finish SROA for cooperative matrices #15038
Conversation
@bwlodarcz @VyacheslavLevytskyy for some reasons can't add you as reviewers, yet please take a look when convenient |
Instruction *Ptr = | ||
dyn_cast<Instruction>(CI->getArgOperand(0)->stripPointerCasts()); | ||
StructType *WrapperMatrixTy = | ||
dyn_cast<StructType>(cast<AllocaInst>(Ptr)->getAllocatedType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that "CoopMatrix wrapped into a struct" covers all practical use cases, but I just wonder if it's possible to have any other nested type composed of CoopMatrix where this line would lead to a crash, because Argument #0 was not an alloca instruction wrt. the wrapped matrix type? In other words, would it make any sense to check Ptr != nullptr
and dyn_cast<AllocaInst>(Ptr) != nullptr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because Argument #0 was not an alloca instruction wrt. the wrapped matrix type
Technically, it's unexpected, but yet possible if we store pointer in a GV and then load from it. I've added a sanity check for this. In real life if this appear translator will still fail to translate the code, but it would report an error during OpAccessChain
validation.
but I just wonder if it's possible to have any other nested type composed of CoopMatrix where this line would lead to a crash
When we have matrix in a structure in a structure SROA will manage to replace the outer structure with the internal one (yet without promoting cooperative matrix). With O0 it of cause won't happen, but we still won't crash here as yet for the outer structure we would have an alloca. MatrixTy
in this case will be nullptr, so finishSROACooperativeMatrix
would do nothing and the translator would still again fail during OpAccessChain
validation. I'm not keen to handle this non-practical case for a W/A, instead I'd rather fix LLVM SROA, but it requires more time.
@@ -288,6 +288,49 @@ void SPIRVRegularizeLLVMBase::expandVIDWithSYCLTypeByValComp(Function *F) { | |||
nullptr, &Attrs, true); | |||
} | |||
|
|||
// intel/llvm customization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we make it a part of the upstream translator? If we can't, we should have a tracker somewhere listing all our translator customizations, so this PR should be recorded there as a reminder for us to revert it.
What about turning this into a pass which we can run in our device compilation pipeline instead of hacking it into the translator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If/when landed I'll definitely add this PR to #7592
Why can't we make it a part of the upstream translator?
I'm not sure about this approach. We definitely need to do something
for matrix unwrapping just for -O0 case, may be a pass in sycl-post-link or in the translator or alternative for both of them is adding some sort of the translator builtin (like __translate_sampler_initializer and others in OpenCL) with the translator inserting the definition of this builtin. I'd like to try unwrapping like this first without introducing new LLVM IR entity like translator builtin, but want to ensure that such solution is stable.
What about turning this into a pass which we can run in our device compilation pipeline instead of hacking it into the translator?
I haven't though about it tbh, can move it there.
SROA has troubles processing structures containing TargetExtType, which in some cases will lead to a situation, when OpAccessChain would attempt to access a structure containing a cooperative matrix, but with indexes set to access the matrix itself, which is invalid. This PR adds routine that finds alloca to such structure and replaces it with alloca to cooperative matrix type. Signed-off-by: Sidorov, Dmitry <[email protected]>
Signed-off-by: Sidorov, Dmitry <[email protected]>
Signed-off-by: Sidorov, Dmitry <[email protected]>
Signed-off-by: Sidorov, Dmitry <[email protected]>
425dc6a
to
2afe786
Compare
Signed-off-by: Sidorov, Dmitry <[email protected]>
@AlexeySachkov thanks for the suggestions! Applied comments and move to be a pass executed during sycl-post-link. |
Signed-off-by: Sidorov, Dmitry <[email protected]>
@intel/dpcpp-tools-reviewers @intel/dpcpp-esimd-reviewers please take a look |
// from sycl::joint_matrix class object if it's used in __spirv_AccessChain | ||
// function call. It's necessary because otherwise OpAccessChain indices would | ||
// be wrong. | ||
bool transformAccessChain(Function *F) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently it's the only function, but I see few other candidates in the translator that might be moved into this pass (of cause if we decide to introduce it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no flags for esimd
//===----------------------------------------------------------------------===// | ||
// | ||
// A transformation pass which mutates Joint Matrix builtin calls to make them | ||
// conformat with SPIR-V friendly LLVM IR specification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// conformat with SPIR-V friendly LLVM IR specification. | |
// conformant with SPIR-V friendly LLVM IR specification. |
@@ -795,6 +796,10 @@ processInputModule(std::unique_ptr<Module> M) { | |||
if (isModuleUsingAsan(*M)) | |||
Modified |= runModulePass<SanitizeDeviceGlobalPass>(*M); | |||
|
|||
// Transform Joint Matrix builtin calls to align them with SPIR-V friendly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we need to invoke this pass inside sycl-post-link? Can we not run this pass in the standard LLVM pipeline?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to suggestions if something matches better, but the considerations are the following:
- If optimizations are enabled I want this transformation to happen in the end of the pipeline;
- If optimizations are disabled I want this transformation anyway.
In both cases sycl-post-link seems like a good match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is reasonable. We will eventually be trying to refactor sycl-post-link anyways.
Thanks Dmitry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to suggestions if something matches better, but the considerations are the following:
- If optimizations are enabled I want this transformation to happen in the end of the pipeline;
- If optimizations are disabled I want this transformation anyway.
In both cases sycl-post-link seems like a good match.
Both can be achieved with a standard optimization pipeline, we have a whole group of such passes:
llvm/clang/lib/CodeGen/BackendUtil.cpp
Lines 1125 to 1166 in 7f9e251
if (LangOpts.SYCLIsDevice) { | |
MPM.addPass(SYCLMutatePrintfAddrspacePass()); | |
if (LangOpts.EnableDAEInSpirKernels) | |
MPM.addPass(DeadArgumentEliminationSYCLPass()); | |
// Rerun aspect propagation without warning diagnostics. | |
MPM.addPass( | |
SYCLPropagateAspectsUsagePass(/*FP64ConvEmu=*/CodeGenOpts.FP64ConvEmu, | |
/*ExcludeAspects=*/{}, | |
/*ValidateAspects=*/false)); | |
// Add attribute corresponding to optimization level. | |
MPM.addPass(SYCLAddOptLevelAttributePass(CodeGenOpts.OptimizationLevel)); | |
// Add SPIRITTAnnotations pass to the pass manager if | |
// -fsycl-instrument-device-code option was passed. This option can be | |
// used only with spir or spirv triple. | |
if (CodeGenOpts.SPIRITTAnnotations) { | |
assert( | |
TargetTriple.isSPIROrSPIRV() && | |
"ITT annotations can only be added to a module with spir target"); | |
MPM.addPass(SPIRITTAnnotationsPass()); | |
} | |
// Allocate static local memory in SYCL kernel scope for each allocation | |
// call. | |
MPM.addPass(SYCLLowerWGLocalMemoryPass()); | |
// Process properties and annotations | |
MPM.addPass(CompileTimePropertiesPass()); | |
// Record SYCL aspect names (this should come after propagating aspects | |
// and before cleaning up metadata) | |
MPM.addPass(RecordSYCLAspectNamesPass()); | |
if (TargetTriple.isNVPTX()) | |
MPM.addPass(SYCLCreateNVVMAnnotationsPass()); | |
// Remove SYCL metadata added by the frontend, like sycl_aspects | |
// Note, this pass should be at the end of the pipeline | |
MPM.addPass(CleanupSYCLMetadataPass()); | |
} |
@@ -0,0 +1,78 @@ | |||
//===- SYCLProcessJointMatrix.cpp - SYCL Joint Matrix transformation Pass -===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: May be we can rename this as 'SYCLJointMatrixTransform.cpp'? I am not too attached to it. I am ok to leave it as it is. Just a thought. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just one question about where to invoke this pass from.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for clarification.
Signed-off-by: Sidorov, Dmitry <[email protected]>
@intel/llvm-gatekeepers please help with merge |
Failure on Windows is after testing and is infrastructural. |
@MrSidims Seeing postcommit XPASS failures on DG2, can you take a look? https://github.com/intel/llvm/actions/runs/10678654371/job/29596502570 |
SROA has troubles processing structures containing TargetExtType, which in some cases will lead to a situation, when OpAccessChain would attempt to access a structure containing a cooperative matrix, but with indexes set to access the matrix itself, which is invalid. This PR adds routine that finds alloca to such structure and replaces it with alloca to cooperative matrix type.