Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer boxed slices over vectors internally #863

Merged
merged 2 commits into from
Jan 25, 2024

Conversation

Philippe-Cholet
Copy link
Member

While reading This Week in Rust 531 and specifically Identifying Rust's collect::<Vec<_>>() memory leak footgun, I got the remainder that

The proper data structure for fixed-length lists is Box<[T]> rather than Vec<T>.

So I went through our codebase and found three places to apply this: two are committed and MultiProduct. I thought I would wait #835 to be merged to do it there too but I reached one limitation of edition 2018 (in last specialization) so I'm not.

The other vectors we internally use do not have a fixed length. Such as Combinations that has a growing vector in order for Powerset to work too.
(And I obviously do not want to break our iterators generating vectors when it could be boxed slices.)

Copy link

codecov bot commented Jan 25, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (7a1c22b) 93.48% compared to head (cc0123f) 93.77%.
Report is 1 commits behind head on master.

Files Patch % Lines
src/exactly_one_err.rs 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #863      +/-   ##
==========================================
+ Coverage   93.48%   93.77%   +0.28%     
==========================================
  Files          48       48              
  Lines        6771     6696      -75     
==========================================
- Hits         6330     6279      -51     
+ Misses        441      417      -24     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@phimuemue phimuemue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good catch. I think we should do this and trust that the additional into_boxed_slice does not need to reallocate, and, thus, can be optimized to a no-op.

As for the space consumption problem: Should we shrink_to_fit to avoid this?

@Philippe-Cholet
Copy link
Member Author

Philippe-Cholet commented Jan 25, 2024

@phimuemue I noted that collect to Box<[_]> does .collect::<Vec<_>>().into_boxed_slice() in the std so if it reallocates then the std thinks it's the way.

into_boxed_slice does shrink_to_fit so I'm not sure I understand your "space consumption problem".

The only place I've thought of shrink_to_fit was for our pub fn kmerge_by.

Copy link
Contributor

@scottmcm scottmcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since they're stored in state enums this is worth considering, and since it looks like they're created from ExactSizeIterator sources anyway so it's probably not going to mean an extra re-allocation (as the collect will use that size hint), this makes good sense to me.

@phimuemue
Copy link
Member

into_boxed_slice does shrink_to_fit so I'm not sure I understand your "space consumption problem".

I was thinking about the cases where we keep the Vec: Should this shrink_to_fit then? (We can also postpone this.)

@phimuemue phimuemue added this pull request to the merge queue Jan 25, 2024
@Philippe-Cholet Philippe-Cholet added this to the next milestone Jan 25, 2024
Merged via the queue into rust-itertools:master with commit a0411d6 Jan 25, 2024
12 of 13 checks passed
@Philippe-Cholet Philippe-Cholet deleted the boxed-slices branch January 25, 2024 17:44
@Philippe-Cholet
Copy link
Member Author

@phimuemue I think it has sense when we know it won't ever grow and we don't repeatedly shrink it. There is not that many cases in itertools: kmerge_by definition only I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants