-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
floresta-chain: new optimized chainstore #251
base: master
Are you sure you want to change the base?
Conversation
My understanding is that |
That was my understanding too, but for some reason, even before using the cache (second part of #169) it didn't really flush on its own, and if we had an unclean shutdown, we would lose our progress (or a good part of it). It would also become increasingly more CPU and IO heavy as we made progress, I suspect it's due to the block locator getting bigger as our chain grows. But the biggest problem for me, that I couldn't find an alternative, was the heap spike. It would always crash on my phone, before or after #169. With this PR, it runs fine! |
Isn't that the expected behavior of a node in IBD, as it moves from old empty blocks to more recent ones? Also was the OS not flushing on its own on desktop or on mobile? |
Not this much, at least not for headers. They are small and have constant-size
Both. At least on my setup. |
4f43d68
to
9961e8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice changes, heres mine superficial review... I still didnt finished.
Youre right, this needs a lot of testing and review.
It looks a nice job!
9f33428
to
66f70e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial review, looks good so far! I still have to check flat_chain_store.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly nits, but I think I have found an issue with the update_block_index
method impl, as explained below.
} | ||
} | ||
|
||
/// The (short) hash function we use to compute where is the map a given block height should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/where is the map/where in the map
fn update_block_index(&self, height: u32, hash: BlockHash) -> Result<(), Self::Error> { | ||
let index = IndexEntry::new(height); | ||
unsafe { | ||
self.block_index | ||
.set_index_for_hash(hash, index, |height| self.get_block_header_by_index(height)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this update_block_index
method impl we update the block index with a new mainchain block (tagging the block index as mainchain
). But if this is done for reorging the chain, we would also need to update the tag for the reorged block indexes to fork
. Otherwise we would end up with two mainchain
-tagged block hashes with the same height.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A straightforward fix would be to create a update_fork_block_index
, such that it is called recursively within mark_chain_as_inactive
(similar to what happens with update_block_index
within mark_chain_as_active
). And conditionally compile such method if experimental-db
is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a function that unmarks an index positions if it goes from mainchain to fork chain
unsafe fn unset_index_pos(&self, header: DiskBlockHeader) -> Result<(), FlatChainstoreError> {
let pos = self.hash_map_find_pos(header.block_hash(), |_| {
Ok(DiskBlockHeaderAndHash {
header,
hash: header.block_hash(),
})
})?;
match pos {
Entry::Empty(_) => {}
Entry::Occupied(pos) => pos.write(IndexEntry::new(0)),
}
Ok(())
}
and updated save_fork_block
:
unsafe fn save_fork_block(&self, header: DiskBlockHeader) -> Result<(), FlatChainstoreError> {
let metadata = self.get_metadata_mut()?;
let fork_blocks = metadata.fork_count;
let offset = size_of::<DiskBlockHeaderAndHash>() * fork_blocks as usize;
let ptr = self.fork_headers.as_ptr().wrapping_add(offset) as *mut DiskBlockHeaderAndHash;
ptr.write(DiskBlockHeaderAndHash {
header,
hash: header.block_hash(),
});
self.block_index.unset_index_pos(header)?;
let mut index = IndexEntry::new(fork_blocks);
index.set_fork();
self.block_index
.set_index_for_hash(header.block_hash(), index, |height| {
self.get_block_header_by_index(height)
})?;
metadata.fork_count += 1;
Ok(())
}
73fad85
to
5175e30
Compare
The current chainstore is based on `kv`, but it has a few problems: - When we flush, we get a huge heap spike - We are getting a 2 or 3 times overhead on headers - It gets kinda slow to retrieve headers during IBD if we flush early This commit introduces a bare-bones, ad-hock store that consists in two parts: - A open addressing, file backed and memory-mapped hash map to keep the relation block_hash -> block_height - A flat file that contains block headers serialized, in ascending order - A LRU cache to avoid going througth the map every time To recover a header, given the block height, we simply use pointer arithmetic inside the flat file. If we need to get from the block hash, use the map first, then find it inside the flat file. This has the advantage of not needing explicit flushes (the os will flush it in fixed intervals), flushes are async (the os will do it), we get caching for free (mmap-ed pages will stay in memory if we need) and our cache can react to system constraints, because the kernel will always know how much memory we sill have
5175e30
to
0597417
Compare
The current chainstore is based on
kv
, but it has a few problems:This commit introduces a bare-bones, ad-hock store that consists in two parts:
To recover a header, given the block height, we simply use pointer arithmetic inside the flat file. If we need to get from the block hash, use the map first, then find it inside the flat file. This has the advantage of not needing explicit flushes (the os will flush it in fixed intervals), flushes are async (the os will do it), we get caching for free (mmap-ed pages will stay in memory if we need) and our cache can react to system constraints, because the kernel will always know how much memory we sill have