optimize compressed CLVM serialization #562
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This patch adds a new class,
TreeCache
, which combines the functionality ofReadCacheLookup
andObjectCache
into one data structure.In the incremental serialized (
Serializer
) it replaces the use ofReadCacheLookup
andObjectCache
with the newTreeCache
. This means the originalnode_to_bytes_backrefs()
is unchanged.This is part of the larger effort to farm full compressed blocks. Today we fill the block and then compress. We don't attempt to keep adding transactions into the space freed up by compression.
There are 3 commits. The first is the main change, which primarily introduces the new
TreeCache
class, followed by using SHA-1 hashing to deduplicate sub trees followed by using a bump allocator for allocating paths during the search.TreeCache
The way compressed CLVM serialization works is described in some detail here. The
TreeCache
separates the serialization into two steps.update()
)push()
,pop2_and_cons()
andfind_path()
)de-duplication
In the de-duplication step, the entire tree (of
NodePtr
) is traversed and a "shadow" tree is built. This tree is stored in anVec<NodeEntry>
and each node is referenced by its index into this array. This shadow tree maintains some metadata about the correspondingNodePtr
node.NodeEntry
tree is de-duplicated as it's being built, which means that a node may have multiple parents. We record the parent of each location it's de-duplicated from.serialization
As we serialize the tree, we maintain the "parse stack", tracking the state of the deserializer. This is necessary to form valid back-references for the deserializer to follow. Nodes are
push()
ed to the stack and popped and joined in pairs.The interesting part is
find_path()
. Before serializing a node, we try to find a back-reference to it by callingfind_path()
. This function does the following:NodeEntry
corresponding to the tree node we're looking forAs we traverse parents up the tree, we may encounter a node that's
on_stack
, meaning this node is somewhere on the current parse stack. This is another branch to search. Traversing the stack is simpler than traversing the tree, as it doesn't branch. It's possible to encounter a node deep down on the stack whereas one of its parents is high up on the stack. The path following the parent will reach the target first, and will be the shortest path.An earlier version of
find_path()
could also find paths to stack nodes, i.e. not a node in the original tree, but the node that makes up a link in the parse stack. However, this feature was quite expensive, as every stack push would require computing the tree hash for that new node.This is a major difference between
ReadCacheLookup
andTreeCache
. It is not believed that it's essential to be able to form such back references in the common case of serializing a block generator. We recently optimized the de-serializer to expect the common case to not point onto the stack (here).PathBuilder
PathBuilder
is a utility class to help build CLVM paths. A path is a collection of bits, read from the right to left (from least significant to most) and each bit determines whether to follow the left (0) or right (1) side of a tree node.The main challenge is to avoid re-allocations and minimize bit manipulations until we want to convert it into a CLVM path. The bits start out left-aligned, and we right-align them once when we're done.
VisitedNodes
VisitedNodes
implements aHashSet<u32>
but for dense indices. It uses a bitfield under the hood. This is used to mark indices in the shadow tree whether we have visited the node or not (required to terminate search branches in the breadth first search). It's also used to indicate whether a node has been serialized or not. If it has not, we can't form a valid path to it.SHA-1 hashing
We compute tree hashes only to identify identical sub trees. We don't actually need to know the SHA256 tree-hash. SHA-1 is a cheaper hash to compute and we save time by using it instead of SHA-256. To mitigate the weaker hash properties, we salt the hashes. Every time we serialize a tree, we will use a different salt.
bump allocator
The breadth-first search traverses a lot of branches and causes a lot of small allocations and deallocations during the search phase. Using a local arena with a bump allocator for these saves a lot of time.
Performance
The justification for this change is performance. There are two separate considerations:
Serializer
where we may need to undo the addition of the most recent transaction, in case it made the block exceed the max cost limit.The
TreeCache
is especially beneficial for (2) but also give material speed-up for (1).1. Serializing
Benchmarks on Ubuntu Threadripper.
(numbers are run-time, normalized to the shortest)
Threadripper output
before:after:
critcmp:
RPi output
before:after:
critcmp:
MacOS output
before:after:
critcmp:
2. Undo-state
In
chia_rs
there is a test ofbuild_compressed_block.rs
, which builds a block incrementally, 1 transaction at a time. For each transaction, it needs to save the undo state in case it exceeds the limit.Timings on Ubuntu Threadripper:
complete output
before:after:
Tests
The most important tests are fuzzers that verify certain properties.
tree_cache
The
tree_cache
fuzzer build a random CLVM tree (make_tree()
) and picks a random node in it.The starts to traverse it as if it's serializing the tree, and once the selected node has been "serialized" it ensures that the
TreeCache
produces valid paths to this node, every step of the traversal.The paths are checked to never be longer than the serialized length. It also verifies that the path can be looked up by
traverse_path()
and that the resulting node compares equal to the one we were trying to find.serializer_cmp
The
serializer_cmp
fuzzers generates a random tree and traverses it as if it was being serialized. It maintains the serialization state both withTreeCache
andObjectCache
andReadCacheLookup
and ensures that every path that's found are equivalent. Some back-references have several equally good paths, and which one we use isn't important. The property that's checked is that they are equal in length.The one exception is that
TreeCache
cannot generate paths onto the stack itself, just items in the stack. This needs a special case in the fuzzer.