Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYSTEMDS-3815] Fused table sequence #2181

Closed
wants to merge 23 commits into from

Conversation

Baunsgaard
Copy link
Contributor

This commit contains a new fused operator for:

table(seq(1, nrow(A)), A, w)

This operation is useful for efficient word embedding of a token matrix A.

This commit contains a new fused operator for:

table(seq(1, nrow(A)), A, w)

That removes the need to generate a vector of incrementing integers in the size of A.
@mboehm7
Copy link
Contributor

mboehm7 commented Jan 16, 2025

Is that an extension (for weighted variants?) of the existing rexpand instruction?

@Baunsgaard
Copy link
Contributor Author

Is that an extension (for weighted variants?) of the existing rexpand instruction?

assuming I understand the current implementation.
I made it work for both. It is constructed to avoid allocating the seq(1, nrow(A)), since this was the main bottleneck.

@Baunsgaard
Copy link
Contributor Author

just to answer the question, yes it is an extension of rexpand.

@mboehm7
Copy link
Contributor

mboehm7 commented Jan 16, 2025

What irritates me a bit is that the rexpand kernels are implemented in LibMatrixReorg, which you not even touch. If this code is for the weighted versions of rexpand, please move it there which would also prevent creating too many Lib* classes which were meant for types of operations with many kernels, not individual operations.

@Baunsgaard
Copy link
Contributor Author

What irritates me a bit is that the rexpand kernels are implemented in LibMatrixReorg, which you not even touch. If this code is for the weighted versions of rexpand, please move it there which would also prevent creating too many Lib* classes which were meant for types of operations with many kernels, not individual operations.

okay will do, I just had a tendency lately to make Lib files since it made it 'cleaner' in my mind for containing parallel and non parallel calls.

@Baunsgaard
Copy link
Contributor Author

Performance improvements:

Before:
Total elapsed time:             16.876 sec.
  1  ctableexpand   12.974    100
  2  seq             2.885    100]

After Uncompressed:
Total elapsed time:             10.496 sec.
  1  ctableexpand    9.236    100

After Compressed:
Total elapsed time:             4.973 sec.
  1  ctableexpand    4.089    100

The change unfortunately makes the current fed_tableexpand instruction invalid and therefore have to be reimplemented.

Copy link

codecov bot commented Jan 17, 2025

Codecov Report

Attention: Patch coverage is 93.41085% with 17 lines in your changes missing coverage. Please review.

Project coverage is 71.89%. Comparing base (751b55f) to head (fd02895).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
...ache/sysds/runtime/matrix/data/LibMatrixReorg.java 89.04% 3 Missing and 5 partials ⚠️
...ache/sysds/runtime/compress/lib/CLALibRexpand.java 94.20% 1 Missing and 3 partials ⚠️
...runtime/instructions/fed/CtableFEDInstruction.java 33.33% 0 Missing and 2 partials ⚠️
src/main/java/org/apache/sysds/hops/TernaryOp.java 85.71% 0 Missing and 1 partial ⚠️
...rewrite/RewriteAlgebraicSimplificationDynamic.java 90.90% 0 Missing and 1 partial ⚠️
src/main/java/org/apache/sysds/lops/Ctable.java 75.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2181      +/-   ##
============================================
+ Coverage     71.84%   71.89%   +0.04%     
- Complexity    44591    44702     +111     
============================================
  Files          1447     1449       +2     
  Lines        168934   169175     +241     
  Branches      32931    32979      +48     
============================================
+ Hits         121374   121622     +248     
+ Misses        38232    38228       -4     
+ Partials       9328     9325       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants