Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYSTEMDS-3644] Compressed transform encode #2171

Closed
wants to merge 81 commits into from

Conversation

Baunsgaard
Copy link
Contributor

No description provided.

@Baunsgaard Baunsgaard force-pushed the FrameBlockUpdates branch 2 times, most recently from e795ec9 to 909535c Compare January 9, 2025 12:33
Copy link

codecov bot commented Jan 9, 2025

Codecov Report

Attention: Patch coverage is 75.84781% with 292 lines in your changes missing coverage. Please review.

Project coverage is 71.86%. Comparing base (8f5a42c) to head (3487c88).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...sds/runtime/transform/encode/CompressedEncode.java 85.27% 34 Missing and 14 partials ⚠️
...e/compress/colgroup/ColGroupUncompressedArray.java 8.00% 46 Missing ⚠️
...rg/apache/sysds/runtime/io/FrameReaderTextCSV.java 70.93% 16 Missing and 9 partials ⚠️
...a/org/apache/sysds/runtime/io/IOUtilFunctions.java 11.11% 23 Missing and 1 partial ⚠️
...pache/sysds/runtime/io/FrameWriterBinaryBlock.java 38.23% 19 Missing and 2 partials ⚠️
...apache/sysds/runtime/compress/io/DictWritable.java 29.62% 17 Missing and 2 partials ⚠️
...rg/apache/sysds/runtime/util/CommonThreadPool.java 73.84% 17 Missing ⚠️
...sysds/runtime/frame/data/columns/HashMapToInt.java 89.54% 12 Missing and 4 partials ⚠️
...pache/sysds/runtime/io/FrameReaderBinaryBlock.java 20.00% 11 Missing and 1 partial ⚠️
...ime/transform/encode/ColumnEncoderPassThrough.java 66.66% 9 Missing and 1 partial ⚠️
... and 19 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2171      +/-   ##
============================================
- Coverage     71.88%   71.86%   -0.02%     
- Complexity    44455    44604     +149     
============================================
  Files          1445     1447       +2     
  Lines        168259   168934     +675     
  Branches      32847    32931      +84     
============================================
+ Hits         120949   121409     +460     
- Misses        38020    38196     +176     
- Partials       9290     9329      +39     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Baunsgaard
Copy link
Contributor Author

Baunsgaard commented Jan 15, 2025

This commit merge the BWARE optimizations to transform encode. Attached are the full logs for various transformations.
Most transformations improve in performance.

Notable results include :

passthough compressed 3x faster.

before
Transform Encode Perf: rows: 10000000 schema:[UINT4, UINT4, UINT4, UINT4, UINT4, UINT4, UINT4, UINT4, UINT4, UINT4]
{}
                             Normal,  207.217+- 11.819 ms,           
                         Compressed,  336.523+- 26.097 ms,   

after:
Transform Encode Perf: rows: 10000000 schema:[UINT4, UINT4, UINT4, UINT4, UINT4, UINT4, UINT4, UINT4, UINT4, UINT4]
{}
                             Normal,  212.838+-  4.184 ms,           
                         Compressed,  106.941+- 43.943 ms,  

hash to dummy encode 3 x faster uncompressed.

Before:
Transform Encode Perf: rows: 10000000 schema:[INT32, INT32, INT32, INT32, INT32, INT32, INT32, INT32, INT32, INT32]
{ids:true, hash:[1,2,3,4,5,6,7,8,9,10], K:10, dummycode:[1,2,3,4,5,6,7,8,9,10]}
                             Normal, 6556.340+-466.647 ms,           
                         Compressed,  445.022+- 29.709 ms,  
After:
Transform Encode Perf: rows: 10000000 schema:[INT32, INT32, INT32, INT32, INT32, INT32, INT32, INT32, INT32, INT32]
{ids:true, hash:[1,2,3,4,5,6,7,8,9,10], K:10, dummycode:[1,2,3,4,5,6,7,8,9,10]}
                             Normal, 2435.954+-267.029 ms,   
                         Compressed,  472.872+- 35.932 ms,    

Full logs:
BeforeSU1.md
afterSU1.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant