-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLVM and SPIRV-LLVM-Translator pulldown (WW04 2025) #16781
Conversation
- Added support for AArch64-specific build attributes. - Print AArch64 build attributes to assembly. - Parse AArch64 build attributes from assembly. - Emit AArch64 build attributes to ELF. Specification: ARM-software/abi-aa#230
…ap. (#123813) Currently we make two memory allocations for each PyOperation: a Python object, and the PyOperation class itself. With some care we can allocate the PyOperation inline inside the Python object, saving us a malloc() call per object and perhaps improving cache locality.
This PR replaces some instances of `undef` with `function argument value` or `poison` or `concrete values` in several tests under `llvm/test/Transforms/` directory. These changes align with modern LLVM standards for better-defined behavior and test determinism. If this small PR is okay and gets merged, I will work on the rest. This is inspired by [this project](https://discourse.llvm.org/t/gsoc-2024-remove-undefined-behavior-from-tests/77236/29), work done on this by @leewei05
Users of the PlayStation SDK aren't given the means to create or run static executables. Uses of `-static` are limited a few specialized cases within SIE. A `--build-id` isn't wanted in those cases. SIE tracker: TOOLCHAIN-16704
Summary: Previously, managed variables didn't work in rdc mode using the new driver because we just didn't register them. This was previously ignored because we didn't have enough space in the current struct format. This patch amends that by just emitting a struct pair for the two variables and using the single pointer. In the future, a more extensible entry format would be nice, but that can be done later.
This reverts commit 43177b5.
…gn-comprison (#122127) - add an option `EnableQtSupport`, that makes C++17 `q20::cmp_*` alternative available for Qt-based applications.
No test changes with this removed and it appears to be obsolete.
…#123803) This fixes a compile-time regression caused by #116645, where an entry basic block with a very large number of allocas and other instructions caused SROA to take ~100× its expected runtime, as every alloca (with ~2 uses) now calls this method to find the order of those few instructions, rescanning the very large basic block every single time. Since this code was originally written, Instructions now have ordering numbers available to determine relative order without unnecessarily scanning the basic block.
It is sufficient to just use `HAVE_DLOPEN`.
This patch fixes: mlir/lib/Dialect/Tosa/Transforms/TosaInferShapes.cpp:309:7: error: variable 'errs' set but not used [-Werror,-Wunused-but-set-variable]
…827) This commit addresses some uncertainty raised in 84fa175 as to which features Apple M4 has.
…… (#120566) …the distributed IR case. This patch allows `nd_load` and `nd_store` to preserve the tensor descriptor shape during distribution to SIMT. The validation now expects the distributed instruction to retain the `sg_map` attribute and uses it to verify the consistency.
I think the std::begin/end were to work around an old gcc bug. Hopefully we don't need them anymore.
This holds a physical register unit or virtual register and mask. While I was here I've used emplace_back and removed an unneeded use of a template.
…t r… (#122726)" This reverts commit c3ba6f3. We are seeing performance regressions of up to 40% on some compilations with this patch, we will investigate and reland after fixing performance issues.
CONFLICT (content): Merge conflict in libclc/clc/include/clc/clcmacro.h CONFLICT (content): Merge conflict in libclc/generic/lib/common/mix.cl CONFLICT (content): Merge conflict in libclc/generic/lib/common/mix.inc CONFLICT (content): Merge conflict in libclc/generic/lib/math/mad.cl CONFLICT (modify/delete): libclc/generic/lib/math/mad.inc deleted in c8eb865 and modified in HEAD. Version HEAD of libclc/generic/lib/math/mad.inc left in tree. CONFLICT (modify/delete): libclc/generic/lib/math/sincospiF_piby4.h deleted in HEAD and modified in c8eb865. Version c8eb865 of libclc/generic/lib/math/sincospiF_piby4.h left in tree. CONFLICT (content): Merge conflict in libclc/libspirv/lib/generic/math/clc_exp10.cl CONFLICT (content): Merge conflict in libclc/libspirv/lib/generic/math/clc_hypot.cl CONFLICT (content): Merge conflict in libclc/libspirv/lib/generic/math/clc_pow.cl
@@ -517,7 +517,7 @@ void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA, | |||
static bool shouldIncludePTX(const ArgList &Args, StringRef InputArch) { | |||
// The new driver does not include PTX by default to avoid overhead. | |||
bool includePTX = !Args.hasFlag(options::OPT_offload_new_driver, | |||
options::OPT_no_offload_new_driver, true); | |||
options::OPT_no_offload_new_driver, false); // INTEL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't really answer to what's going on here, sorry. I suspect that this indicates we're not passing the right flag to control the new offload driver? The false
should essentially be equivalent to us explicitly passing -fno-offload-new-driver
to the driver.
Perhaps this is okay for now but we need to investigate this properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new offload driver is currently not enabled by default for intel/llvm. The plan is to move to the new model this year.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks. But might it be easier to explicitly disable the new offload driver by passing the option, rather than have to change the default values of various hasFlag
checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the easiest workaround to let cuda sycl use new offload driver for now. Once we switch the default to new offload driver, we should remove this workaround.
@intel/llvm-gatekeepers I think this is ready for merge. Last CI run was success, the new changes after that are mostly NFC (I have tested locally for NVPTX codegen tests). The current CI is broken, so please merge when CI is fixed. |
CI should be fixed, so ping me when CI passes and this is ready for merge |
This is ready for merge now. @sarnex The failure in post-commit e2e-line intel arc are common to others. |
/merge |
Fri 31 Jan 2025 02:45:47 PM UTC --- Start to merge the commit into sycl branch. It will take several minutes. |
Fri 31 Jan 2025 02:51:02 PM UTC --- Merge the branch in this PR to base automatically. Will close the PR later. |
LLVM: llvm/llvm-project@915f3ed
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@cec12d6cf46306d