-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement disk-based DataCache with no eviction #593
Conversation
Signed-off-by: Daniel Carl Jones <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments from a quick pass.
|
||
/// Get the relative path for the given block. | ||
fn get_path_for_key(&self, cache_key: &CacheKey) -> PathBuf { | ||
let mut path = self.cache_directory.join("v1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides/instead of the version (which is not really needed for a temporary cache), should we include the process id, or something equivalent, to ensure the path is unique?
Maybe for this type it could just be a token to be passed at construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll push an update to this PR now, but not address this today. Let me think about it.
I left in the versioning while I'm not super happy about a) not purging the cache on mount and b) not having things like etag, s3key in the blocks themselves on disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Versioning here seems good. We definitely need some safety around (a) not using a non-empty cache directory where we might overwrite existing files, and (b) coordinating enough to avoid multiple Mountpoints trying to use the same directory. But happy for that to come in a followup.
For (a), we'll need to think about whether we should allow non-empty directories that appear to be Mountpoint caches from a previous execution (it's perhaps very cumbersome for customers to have to manually rm -rf
on every restart), and if so, whether to clean them up or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping the versioning for now, including that into the block data persisted to disk also.
Deferring the safety around non-empty cache and concurrent access to cache from different MP processes to a later discussion.
Signed-off-by: Daniel Carl Jones <[email protected]>
…ache Signed-off-by: Daniel Carl Jones <[email protected]>
Signed-off-by: Daniel Carl Jones <[email protected]>
Signed-off-by: Daniel Carl Jones <[email protected]>
Just pushed a few changes based on some feedback, but still need to address tomorrow:
|
|
||
/// Get the relative path for the given block. | ||
fn get_path_for_key(&self, cache_key: &CacheKey) -> PathBuf { | ||
let mut path = self.cache_directory.join("v1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Versioning here seems good. We definitely need some safety around (a) not using a non-empty cache directory where we might overwrite existing files, and (b) coordinating enough to avoid multiple Mountpoints trying to use the same directory. But happy for that to come in a followup.
For (a), we'll need to think about whether we should allow non-empty directories that appear to be Mountpoint caches from a previous execution (it's perhaps very cumbersome for customers to have to manually rm -rf
on every restart), and if so, whether to clean them up or not.
Planning to update this PR as follows to have it in a state we're happy to merge and iterate later on:
|
Signed-off-by: Daniel Carl Jones <[email protected]>
Signed-off-by: Daniel Carl Jones <[email protected]>
Signed-off-by: Daniel Carl Jones <[email protected]>
Signed-off-by: Daniel Carl Jones <[email protected]>
Signed-off-by: Daniel Carl Jones <[email protected]>
Signed-off-by: Daniel Carl Jones <[email protected]>
Signed-off-by: Daniel Carl Jones <[email protected]>
…ny FS-specific max number of dir entries Signed-off-by: Daniel Carl Jones <[email protected]>
Signed-off-by: Daniel Carl Jones <[email protected]>
Signed-off-by: Daniel Carl Jones <[email protected]>
Signed-off-by: Daniel Carl Jones <[email protected]>
Signed-off-by: Daniel Carl Jones <[email protected]>
Signed-off-by: Daniel Carl Jones <[email protected]>
Signed-off-by: Daniel Carl Jones <[email protected]>
Signed-off-by: Daniel Carl Jones <[email protected]>
Signed-off-by: Daniel Carl Jones <[email protected]>
I'll merge this now. If there's any more feedback, please still add and I can follow up in another PR. |
Description of change
Initial version of the data cache on disk.
We now write data to disk including some metadata to check we haven't mixed up blocks.
Related: #255
Does this change impact existing behavior?
No. This code is not yet adopted in main releases.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).