-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce retention and early materialization of VirtualActionInput
#24891
base: master
Are you sure you want to change the base?
Conversation
@justinhorvitz Context: I'm currently investigating higher memory usage with path mapping enabled and am thus planning to make it compatible with your command line preprocessing logic. While working on that, I noticed that Bazel may not make good use of this optimization as it's RE implementation may retain full param file contents and some other places would at least temporarily materialize the full file just to calculate a digest. This PR is meant to address that. Ignoring the specifics of the integration with Bazel's RE backend, which other folks can review, do you consider it reasonable? |
61fbbcb
to
14916d4
Compare
// TODO: Avoid materializing the entire file in memory. This requires changing the | ||
// upload to be driven by an OutputStream rather than consuming an InputStream. | ||
remoteCacheClient.uploadBlob( | ||
context, digest, () -> virtualActionInput.getBytes().newInput()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might virtualActionInput.getBytes()
be called multiple times by Chunker
? The approach we use for the Google-internal RE is:
- Materialize the
ByteString
as late as possible, using double-checked locking, storing it as a field in the "supplier" - The "supplier" has a
close()
method that is called when all chunks have been sent to the CAS. Calling this nulls theByteString
field.
This is a good catch. The google-internal RE code does not do this. Instead, it makes use of I left a comment about the way you're passing a supplier into |
3adf3ed
to
560b5b2
Compare
ada9cdb
to
5555522
Compare
@bazel-io fork 8.1.0 |
@fmeum |
5555522
to
ed0a1ee
Compare
@iancha1992 Done! |
b82dcdc introduced a way for parameter files to be materialized lazily and in a streaming fashion. For this to be effective, the contents of
VirtualActionInput
s should not be retained and, ideally, never or only briefly materialized in full.This is achieved by replacing the digest computations with a streaming approach and avoiding retention of
VirtualActionInput
inMerkleTree
s, which may be cached. All cache protocols are streamlined to accept aBlob
, which is a closeable supplier ofInputStream
s that is automatically closed after the upload has completed (which may require retries).Along the way, file nodes in
MerkleTree
s are stored unwrapped to reduce memory usage.