-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: How can I verify the integrity of the outboard bytes using the blake3 root hash of the file? #42
Comments
That's a really interesting question. Consider a five-chunk Bao tree that looks like this:
In that case the outboard tree contains these nodes:
Currently I don't think there's any way to use the public Bao API to validate those nodes, if you don't have the chunks available to do a complete outboard decoding. But in theory you could do it. The "chaining value" (CV) of the root node (which you can reproduce using a public but Having done this, you'd know that all the parent nodes were consistent with the root hash. But you still wouldn't know:
Maybe you could tell me more about your use case? Are you worried that your storage/transport isn't reliable? Or are you worried about attackers trying to DOS you? In both of those cases I'm not sure whether validating the outboard tree directly is the best way to go. One thing I want to highlight just in case: |
Thanks for your answer! So right now I'm storing files like this: The generated CID (content identifier) consists of So if I know the CID of a specific video I want to watch, I just ask on the network if someone stores it. If yes, I first download the BAO_OUTBOARD_BYTES part of the file - because I know the total size of the file without the bao metadata from the CID, I can directly download the correct range. After fetching the bao outboard bytes, I can now start streaming parts of the video/file I'm interested in and want to verify the integrity of these parts on the fly with the root blake3 hash (from the CID) and the bao bytes I fetched. The outboard files are at the end of the file because some applications using the network have their own integrity verification mechanisms (for example with authenticated encryption) and in that case they can start streaming the file directly without needing to avoid potential bao bytes at the start of the file. |
So to be precise I don't really need to verify the entire bao outboard bytes with the root hash, I just thought that would be easier to implement because I'm holding the entire bao tree in memory anyways. I think I'm looking for an API that looks like this: pub fn verify_integrity(bytes: Vec<u8>, offset: u32, bao_outboard_bytes: Vec<u8>, blake3_hash: Vec<u8>) -> bool { |
My instinct is that you can't fully vet the outboard tree until you have the contents, and it's probably not worth the complexity to get partway there.
Are you implementing video seeking through this API? Have you considered adding some code on the server/sender side to extract "slices"? That's Bao's way of saying "I want to read just this range of bytes, serve me those bytes along with just the specific tree nodes I need to verify them." |
Thanks for your help, I now ended up with this which seems to do what I need: (using code from #34 (comment)) pub fn verify_integrity(
chunk_bytes: Vec<u8>,
offset: u64,
bao_outboard_bytes: Vec<u8>,
blake3_hash: Vec<u8>,
) -> Result<u8, anyhow::Error> {
let mut extractor = bao::encode::SliceExtractor::new_outboard(
FakeSeeker::new(&chunk_bytes[..]),
Cursor::new(&bao_outboard_bytes),
offset,
262144,
);
let mut slice = Vec::new();
extractor.read_to_end(&mut slice)?;
let mut decoded = Vec::new();
let mut decoder =
bao::decode::SliceDecoder::new(&*slice, &bao::Hash::from(from_vec_to_array(blake3_hash)), offset, 262144);
decoder.read_to_end(&mut decoded)?;
Ok(1)
} The video seeking/streaming is implemented in my code directly because it has some optimizations for streaming over the network like caching chunks. I'm just calling this new method to verify the integrity of a chunk before storing it in the local disk cache and showing it to the user. |
If you have the ability to seek through your file contents from the client side, then there's no particular need to use use std::error::Error;
use std::io::prelude::*;
use std::io::{Cursor, SeekFrom};
fn main() -> Result<(), Box<dyn Error>> {
let bigfile = vec![0x42; 1_000_000];
let (outboard, hash) = bao::encode::outboard(&bigfile);
let read_start = 123_456;
let read_len = 654_321;
let mut reader =
bao::decode::Decoder::new_outboard(Cursor::new(&bigfile), Cursor::new(&outboard), &hash);
reader.seek(SeekFrom::Start(read_start as u64))?;
// Here we read the 654,321 content bytes into a Vec, but we could e.g. write them to stdout
// without ever allocating memory to hold the whole thing.
let mut decoded = vec![0; read_len];
reader.read_exact(&mut decoded)?;
assert_eq!(decoded, &bigfile[read_start..][..read_len]);
Ok(())
} Note that in the above example, the client doesn't even need the whole outboard tree. It only needs something like a The advantage of slicing is that it doesn't require the client to be able to seek (or to do a network round trip for each seek), and it also doesn't require the client to have two readers/cursors open: use std::error::Error;
use std::io::prelude::*;
use std::io::Cursor;
fn main() -> Result<(), Box<dyn Error>> {
let bigfile = vec![0x42; 1_000_000];
let (outboard, hash) = bao::encode::outboard(&bigfile);
let read_start = 123_456;
let read_len = 654_321;
// The server side provides this stream over e.g. HTTP.
let mut slice_stream = bao::encode::SliceExtractor::new_outboard(
Cursor::new(&bigfile),
Cursor::new(&outboard),
read_start,
read_len,
);
// The client side consumes the slice stream. The slice itself is never materialized on either
// end.
let mut decode_stream =
bao::decode::SliceDecoder::new(&mut slice_stream, &hash, read_start, read_len);
// Here we read the 654,321 content bytes into a Vec, but we could e.g. write them to stdout
// without ever allocating memory to hold the whole thing.
let mut decoded = Vec::new();
decode_stream.read_to_end(&mut decoded)?;
assert_eq!(
decoded,
&bigfile[read_start as usize..][..read_len as usize],
);
Ok(())
} |
Thanks, your second example is an optimized version of what I need. I can't really use the first example, because I'm calling the API from native Dart code or JavaScript in the web browser (with WASM) and the file itself is not downloaded/streamed in Rust. So implementing a seekable Cursor in Rust does not seem like the right solution. That's also the reason I don't really care about the decode result, I just want to verify the integrity. |
Hi, how can I verify the integrity of the outboard bytes using the blake3 root hash of the file using Rust?
The use case here is that I have a 32-byte blake3 hash for a file from a trusted source and download the entire outboard metadata into memory first and verify it before starting to stream. The reason for this is that I usually want to stream the entire file and that streaming happens over the network, where reading the entire outboard bytes first is usually faster than having to send a ton of requests for small byte ranges.
I forked https://github.com/oconnor663/bao/tree/chunk_groups and changed the chunk group size to 256 KiB, but I don't think that's relevant for this question.
The text was updated successfully, but these errors were encountered: