-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Temporary C microtile is too small sometimes #850
Comments
Note to self: Altra has a hard-coded max stack buf size which is bad. |
devinamatthews
added a commit
that referenced
this issue
Feb 5, 2025
Details: - See #850 for details on the problem. - This is a temporary fix which should work for sdcz data types. - Altra architectures may still not fully work for MP/MD as the stack buffer size is hard-coded.
devinamatthews
added a commit
that referenced
this issue
Feb 5, 2025
Details: - See #850 for details on the problem. - This is a temporary fix which should work for sdcz data types. - Altra architectures may still not fully work for MP/MD as the stack buffer size is hard-coded.
devinamatthews
added a commit
that referenced
this issue
Feb 5, 2025
Details: - See #850 for details on the problem. - This is a temporary fix which should work for sdcz data types. - Altra architectures may still not fully work for MP/MD as the stack buffer size is hard-coded.
Merged
devinamatthews
added a commit
that referenced
this issue
Feb 8, 2025
Details: - This PR adds CircleCI testing in addition to TravisCI and Appveyor. - All of the same tests as on Travis are run, except that different hardware typically ends up being used (usually Zen on Travis, Xeon Platinum on Circle). This has actually exposed a couple of bugs (see #850 and #852). - The `travis` directory has been renamed to `ci` as it is now shared. - Running SDE on CircleCI is a bit problematic because glibc changed how CPUID detection is done. This requires running some architectures with different hardware definition files and forcing a config via `BLIS_ARCH_TYPE`.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
A temporary C microtile is used in various places, such as on the diagonal of a symmetric matrix (GEMMT) where care must be taken not to write to the unstored portion. The size of this microtile is assumed to not be larger than twice the size of all vector registers together (on the assumption that "real" microtiles fit in registers plus some slack). However, several conditions cause a larger microtile to be written:
Currently, this means that a microtile may be "inflated" by as much as 4x. In the future, with a wider range of data types, this factor could be even larger.
This problems occurs concretely for the SKX configuration when doing zsssgemmt.
The text was updated successfully, but these errors were encountered: