Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement disk-based DataCache with no eviction #593

Merged
merged 25 commits into from
Nov 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
cb6958f
Implement disk-based DataCache with no checksums or eviction
dannycjones Oct 26, 2023
8eb82b0
Fix typos
dannycjones Nov 1, 2023
55fe548
Replace Base64URL encoding with Base64URLUnpadded encoding for data c…
dannycjones Nov 1, 2023
d15596d
Ensure cached indicies are sorted in DiskDataCache
dannycjones Nov 1, 2023
ad08838
Add trace message when creating block in cache
dannycjones Nov 1, 2023
6e79390
WIP: Add checksums to on-disk cache
dannycjones Oct 26, 2023
bebe52a
Remove cached_block_indices implementation on DiskDataCache
dannycjones Nov 2, 2023
c153ac7
Move version identifier to constant
dannycjones Nov 2, 2023
55524b2
Replace SerializableCrc32c with u32
dannycjones Nov 2, 2023
aa12176
Update DataBlock::new(..) to return Result
dannycjones Nov 2, 2023
524d865
Add verification of block metadata to unpack after reading
dannycjones Nov 2, 2023
9e8b407
Replace Base64 encoding with SHA256 hash
dannycjones Nov 2, 2023
a62c87b
Add TODO to split directories into sub-directories to avoid hitting a…
dannycjones Nov 2, 2023
6f55978
Remove intermediate buffers when (de)serializing DataBlock with bincode
dannycjones Nov 2, 2023
ec5814c
Add cache version identifer to the start of blocks written to disk
dannycjones Nov 2, 2023
3d9e8b1
Fix comment on ETag::into_inner
dannycjones Nov 2, 2023
f5b134a
Add rustdoc to DataBlock::new
dannycjones Nov 2, 2023
5debfc5
Fix typo in rustdoc for DataBlock::data
dannycjones Nov 2, 2023
56efa0c
Add expected version to data block read error message
dannycjones Nov 2, 2023
a429d39
Split DataBlock header fields into BlockHeader
dannycjones Nov 3, 2023
b05936d
Add checksum validation on on-disk cache DataBlock header contents
dannycjones Nov 3, 2023
6dd6b16
Remove outdated TODO
dannycjones Nov 3, 2023
4a49d2c
Add test for detecting when DataBlock requires version bump
dannycjones Nov 3, 2023
6f1f17c
Refactor errors for DataBlock
dannycjones Nov 3, 2023
86841a4
Rename DataBlock to DiskBlock
dannycjones Nov 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions mountpoint-s3-client/src/object_client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,11 @@ impl ETag {
&self.etag
}

/// Unpack the [String] contained by the [ETag] wrapper
pub fn into_inner(self) -> String {
self.etag
}

/// Creating default etag for tests
#[doc(hidden)]
pub fn for_tests() -> Self {
Expand Down
6 changes: 5 additions & 1 deletion mountpoint-s3/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ anyhow = { version = "1.0.64", features = ["backtrace"] }
async-channel = "1.8.0"
async-lock = "2.6.0"
async-trait = "0.1.57"
bytes = "1.2.1"
bytes = { version = "1.2.1", features = ["serde"] }
clap = { version = "4.1.9", features = ["derive"] }
crc32c = "0.6.3"
ctrlc = { version = "3.2.3", features = ["termination"] }
Expand All @@ -36,6 +36,10 @@ nix = "0.26.2"
time = { version = "0.3.17", features = ["macros", "formatting"] }
const_format = "0.2.30"
serde_json = "1.0.95"
serde = { version = "1.0.190", features = ["derive"] }
bincode = "1.3.3"
sha2 = "0.10.6"
hex = "0.4.3"

[target.'cfg(target_os = "linux")'.dependencies]
procfs = { version = "0.15.1", default-features = false }
Expand Down
10 changes: 10 additions & 0 deletions mountpoint-s3/src/checksums.rs
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,16 @@ impl ChecksummedBytes {
}
Ok(())
}

/// Provide the underlying bytes and the associated checksum,
/// which may be recalculated if the checksum covers a larger slice than the current slice.
/// Validation may or may not be triggered, and **bytes or checksum may be corrupt** even if result returns [Ok].
///
/// If you are only interested in the underlying bytes, **you should use `into_bytes()`**.
pub fn into_inner(self) -> Result<(Bytes, Crc32c), IntegrityError> {
self.shrink_to_fit()?;
Ok((self.curr_slice, self.checksum))
}
}

impl Default for ChecksummedBytes {
Expand Down
1 change: 1 addition & 0 deletions mountpoint-s3/src/data_cache.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
//! reducing both the number of requests as well as the latency for the reads.
//! Ultimately, this means reduced cost in terms of S3 billing as well as compute time.

pub mod disk_data_cache;
pub mod in_memory_data_cache;

use std::ops::RangeBounds;
Expand Down
Loading
Loading