A memory efficient implementation of the .mtx reading function #3389

gjeuken · 2024-11-28T13:19:13Z

Closes #
Tests included or not required because: test_datasets.py already implemented

Release notes not necessary because: This is a backend change

Pandas read_csv function is very memory intensive, and this makes loading data (especially large datasets from EBI Single Cell Expression Atlas) impossible on computers with 16gb of ram or less. The subsequent analysis of such datasets with scanpy, however, works well on such computers.

Loading the data into chunks, using the same pandas function, solves this problem.

codecov · 2024-11-28T13:35:55Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.41%. Comparing base (9741ca6) to head (73c2a21).
Report is 9 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3389      +/-   ##
==========================================
- Coverage   75.44%   75.41%   -0.04%     
==========================================
  Files         113      113              
  Lines       13250    13266      +16     
==========================================
+ Hits         9997    10005       +8     
- Misses       3253     3261       +8

Files with missing lines	Coverage Δ
src/scanpy/datasets/_ebi_expression_atlas.py	`94.44% <100.00%> (+0.46%)`	⬆️

... and 3 files with indirect coverage changes

for more information, see https://pre-commit.ci

flying-sheep

Good idea! some small notes:

src/scanpy/datasets/_ebi_expression_atlas.py

for more information, see https://pre-commit.ci

gjeuken · 2025-02-22T15:35:15Z

Please note that the selection of the chunk size of 1e7 is rather arbitrary.
This value solves the loading issue for computers with 8 or 16gb of RAM without looping through too many chunks.

flying-sheep · 2025-02-25T14:40:52Z

Thank you! Since you don’t seem interested in appearing in a release note:

… I’ll merge this as-is. If you want a release note after all, please comment, and I’ll add one!

…tx reading function

…ing function (#3483) Co-authored-by: Gustavo Jeuken <[email protected]>

memory efficient mtx loading

fa91b73

Zethson assigned Intron7 Nov 28, 2024

Zethson added the Area – Performance 🐌 label Nov 28, 2024

Zethson and others added 2 commits January 27, 2025 08:29

Merge branch 'main' into mem_fix

25789c5

[pre-commit.ci] auto fixes from pre-commit.com hooks

792b5e2

for more information, see https://pre-commit.ci

flying-sheep reviewed Jan 27, 2025

View reviewed changes

src/scanpy/datasets/_ebi_expression_atlas.py Outdated Show resolved Hide resolved

src/scanpy/datasets/_ebi_expression_atlas.py Outdated Show resolved Hide resolved

Zethson unassigned Intron7 Jan 27, 2025

gjeuken and others added 2 commits February 22, 2025 16:26

replace sum by append

cf6cc28

[pre-commit.ci] auto fixes from pre-commit.com hooks

73c2a21

for more information, see https://pre-commit.ci

flying-sheep added this to the 1.11.1 milestone Feb 25, 2025

flying-sheep merged commit f6a665b into scverse:main Feb 25, 2025
15 of 16 checks passed

meeseeksmachine pushed a commit to meeseeksmachine/scanpy that referenced this pull request Feb 25, 2025

Backport PR scverse#3389: A memory efficient implementation of the .m…

1baada2

…tx reading function

meeseeksmachine mentioned this pull request Feb 25, 2025

Backport PR #3389 on branch 1.11.x (A memory efficient implementation of the .mtx reading function) #3483

Merged

flying-sheep mentioned this pull request Feb 25, 2025

Further .mtx reading improvements #3484

Draft

flying-sheep pushed a commit that referenced this pull request Feb 25, 2025

Backport PR #3389: A memory efficient implementation of the .mtx read…

93a1651

…ing function (#3483) Co-authored-by: Gustavo Jeuken <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A memory efficient implementation of the .mtx reading function #3389

A memory efficient implementation of the .mtx reading function #3389

gjeuken commented Nov 28, 2024

codecov bot commented Nov 28, 2024 •

edited

Loading

flying-sheep left a comment

gjeuken commented Feb 22, 2025

flying-sheep commented Feb 25, 2025 •

edited

Loading

A memory efficient implementation of the .mtx reading function #3389

A memory efficient implementation of the .mtx reading function #3389

Conversation

gjeuken commented Nov 28, 2024

codecov bot commented Nov 28, 2024 • edited Loading

Codecov Report

flying-sheep left a comment

Choose a reason for hiding this comment

gjeuken commented Feb 22, 2025

flying-sheep commented Feb 25, 2025 • edited Loading

codecov bot commented Nov 28, 2024 •

edited

Loading

flying-sheep commented Feb 25, 2025 •

edited

Loading