optimize compressed CLVM serialization #562

arvidn · 2025-02-25T11:32:05Z

Overview

This patch adds a new class, TreeCache, which combines the functionality of ReadCacheLookup and ObjectCache into one data structure.

In the incremental serialized (Serializer) it replaces the use of ReadCacheLookup and ObjectCache with the new TreeCache. This means the original node_to_bytes_backrefs() is unchanged.

This is part of the larger effort to farm full compressed blocks. Today we fill the block and then compress. We don't attempt to keep adding transactions into the space freed up by compression.

There are 3 commits. The first is the main change, which primarily introduces the new TreeCache class, followed by using SHA-1 hashing to deduplicate sub trees followed by using a bump allocator for allocating paths during the search.

TreeCache

The way compressed CLVM serialization works is described in some detail here. The TreeCache separates the serialization into two steps.

The de-duplication step (update())
The serialization and path-finding step (push(), pop2_and_cons() and find_path())

de-duplication

In the de-duplication step, the entire tree (of NodePtr) is traversed and a "shadow" tree is built. This tree is stored in an Vec<NodeEntry> and each node is referenced by its index into this array. This shadow tree maintains some metadata about the corresponding NodePtr node.

The tree hash of the node (used for de-duplication)
The serialized length of the node. The number of bytes it would take to serialized the node is used as the upper limit on how long paths we're willing to form to reference it.
Parents. The NodeEntry tree is de-duplicated as it's being built, which means that a node may have multiple parents. We record the parent of each location it's de-duplicated from.

serialization

As we serialize the tree, we maintain the "parse stack", tracking the state of the deserializer. This is necessary to form valid back-references for the deserializer to follow. Nodes are push()ed to the stack and popped and joined in pairs.

The interesting part is find_path(). Before serializing a node, we try to find a back-reference to it by calling find_path(). This function does the following:

look up the NodeEntry corresponding to the tree node we're looking for
perform a breadth-first search up the tree (following all parents)
If we find a path that terminates at the top of the stack that is short enough, return it

As we traverse parents up the tree, we may encounter a node that's on_stack, meaning this node is somewhere on the current parse stack. This is another branch to search. Traversing the stack is simpler than traversing the tree, as it doesn't branch. It's possible to encounter a node deep down on the stack whereas one of its parents is high up on the stack. The path following the parent will reach the target first, and will be the shortest path.

An earlier version of find_path() could also find paths to stack nodes, i.e. not a node in the original tree, but the node that makes up a link in the parse stack. However, this feature was quite expensive, as every stack push would require computing the tree hash for that new node.

This is a major difference between ReadCacheLookup and TreeCache. It is not believed that it's essential to be able to form such back references in the common case of serializing a block generator. We recently optimized the de-serializer to expect the common case to not point onto the stack (here).

PathBuilder

PathBuilder is a utility class to help build CLVM paths. A path is a collection of bits, read from the right to left (from least significant to most) and each bit determines whether to follow the left (0) or right (1) side of a tree node.

The main challenge is to avoid re-allocations and minimize bit manipulations until we want to convert it into a CLVM path. The bits start out left-aligned, and we right-align them once when we're done.

VisitedNodes

VisitedNodes implements a HashSet<u32> but for dense indices. It uses a bitfield under the hood. This is used to mark indices in the shadow tree whether we have visited the node or not (required to terminate search branches in the breadth first search). It's also used to indicate whether a node has been serialized or not. If it has not, we can't form a valid path to it.

SHA-1 hashing

We compute tree hashes only to identify identical sub trees. We don't actually need to know the SHA256 tree-hash. SHA-1 is a cheaper hash to compute and we save time by using it instead of SHA-256. To mitigate the weaker hash properties, we salt the hashes. Every time we serialize a tree, we will use a different salt.

bump allocator

The breadth-first search traverses a lot of branches and causes a lot of small allocations and deallocations during the search phase. Using a local arena with a bump allocator for these saves a lot of time.

Performance

The justification for this change is performance. There are two separate considerations:

the cost of finding duplicate nodes and paths to already serialized nodes. This is the normal cost of just serializing with back-refs.
The cost of creating and restoring "undo state". This is only a feature of Serializer where we may need to undo the addition of the most recent transaction, in case it made the block exceed the max cost limit.

The TreeCache is especially beneficial for (2) but also give material speed-up for (1).

1. Serializing

Benchmarks on Ubuntu Threadripper.
(numbers are run-time, normalized to the shortest)

benchmark	speed-up (Threadripper)	speed-up (RPi-5)	speed-up (MacOS M1)
Serializer 0	2.45	2.55	2.87
Serializer 1	3.44	3.63	4.05
Serializer 2	6.58	7.43	11.10
Serializer 3	3.87	4.94	5.14
Serializer 4	3.20	4.19	5.04

Threadripper output

before:
$ cargo bench --bench serialize "Serializer " -- --save-baseline main Compiling clvmr v0.12.0 (/home/arvid/dev/clvm_rs) Finished `bench` profile [optimized] target(s) in 6.51s Running benches/serialize.rs (target/release/deps/serialize-e0c68296f62ee72e) Benchmarking serialize/Serializer 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 54.9s, or reduce sample count to 10. serialize/Serializer 0 time: [543.00 ms 545.99 ms 549.14 ms] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild Benchmarking serialize/Serializer 1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 68.7s, or reduce sample count to 10. serialize/Serializer 1 time: [660.55 ms 663.68 ms 666.97 ms] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild Benchmarking serialize/Serializer 2: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60. serialize/Serializer 2 time: [1.2468 ms 1.2522 ms 1.2578 ms] serialize/Serializer 3 time: [10.600 ms 10.642 ms 10.686 ms] Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe Benchmarking serialize/Serializer 4: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.7s, enable flat sampling, or reduce sample count to 60. serialize/Serializer 4 time: [1.2964 ms 1.3022 ms 1.3083 ms] Found 12 outliers among 100 measurements (12.00%) 12 (12.00%) high mild

after:

$ cargo bench --bench serialize "Serializer " -- --save-baseline tree-cache Compiling clvmr v0.12.0 (/home/arvid/dev/clvm_rs) Finished `bench` profile [optimized] target(s) in 6.41s Running benches/serialize.rs (target/release/deps/serialize-7130b2ac7ee765d8) Benchmarking serialize/Serializer 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 22.5s, or reduce sample count to 20. serialize/Serializer 0 time: [221.44 ms 222.88 ms 224.52 ms] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe Benchmarking serialize/Serializer 1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 19.6s, or reduce sample count to 20. serialize/Serializer 1 time: [191.23 ms 192.84 ms 194.58 ms] Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) high mild 1 (1.00%) high severe serialize/Serializer 2 time: [190.05 µs 191.86 µs 193.78 µs] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild serialize/Serializer 3 time: [2.7350 ms 2.7505 ms 2.7741 ms] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe serialize/Serializer 4 time: [405.22 µs 407.20 µs 409.27 µs]

critcmp:

group main tree-cache ----- ---- ---------- serialize/Serializer 0 2.45 546.0±15.88ms ? ?/sec 1.00 222.9±7.86ms ? ?/sec serialize/Serializer 1 3.44 663.7±16.57ms ? ?/sec 1.00 192.8±8.58ms ? ?/sec serialize/Serializer 2 6.58 1250.8±23.55µs ? ?/sec 1.00 190.0±6.04µs ? ?/sec serialize/Serializer 3 3.87 10.6±0.22ms ? ?/sec 1.00 2.8±0.10ms ? ?/sec serialize/Serializer 4 3.20 1305.5±26.01µs ? ?/sec 1.00 408.5±7.36µs ? ?/sec

RPi output

before:
$ cargo bench --bench serialize "Serializer " -- --save-baseline main Compiling clvmr v0.12.0 (/home/arvid/dev/clvm_rs) Finished `bench` profile [optimized] target(s) in 24.70s Running benches/serialize.rs (target/release/deps/serialize-0f4468c8710377bc) Benchmarking serialize/Serializer 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 97.0s, or reduce sample count to 10. serialize/Serializer 0 time: [967.98 ms 968.73 ms 969.50 ms] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild Benchmarking serialize/Serializer 1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 113.5s, or reduce sample count to 10. serialize/Serializer 1 time: [1.1068 s 1.1073 s 1.1078 s] serialize/Serializer 2 time: [2.6834 ms 2.6896 ms 2.6962 ms] Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild serialize/Serializer 3 time: [22.471 ms 22.678 ms 22.936 ms] Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high severe serialize/Serializer 4 time: [2.4508 ms 2.4730 ms 2.5070 ms] Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe

after:

$ cargo bench --bench serialize "Serializer " -- --save-baseline tree-cache Finished `bench` profile [optimized] target(s) in 0.10s Running benches/serialize.rs (target/release/deps/serialize-b5f6251ad4a7b738) Benchmarking serialize/Serializer 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 37.8s, or reduce sample count to 10. serialize/Serializer 0 time: [380.06 ms 380.36 ms 380.65 ms] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low severe 1 (1.00%) low mild Benchmarking serialize/Serializer 1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 30.7s, or reduce sample count to 10. serialize/Serializer 1 time: [305.00 ms 305.40 ms 305.85 ms] Found 10 outliers among 100 measurements (10.00%) 10 (10.00%) high severe serialize/Serializer 2 time: [358.36 Âµs 360.11 Âµs 361.95 Âµs] serialize/Serializer 3 time: [4.5726 ms 4.5894 ms 4.6072 ms] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild serialize/Serializer 4 time: [588.32 Âµs 590.53 Âµs 592.75 Âµs]

critcmp:

group main tree-cache ----- ---- ---------- serialize/Serializer 0 2.55 968.7Â±3.89ms ? ?/sec 1.00 380.4Â±1.50ms ? ?/sec serialize/Serializer 1 3.63 1107.3Â±2.46ms ? ?/sec 1.00 305.4Â±2.19ms ? ?/sec serialize/Serializer 2 7.43 2.7Â±0.03ms ? ?/sec 1.00 362.2Â±8.53Âµs ? ?/sec serialize/Serializer 3 4.94 22.7Â±1.20ms ? ?/sec 1.00 4.6Â±0.09ms ? ?/sec serialize/Serializer 4 4.19 2.5Â±0.15ms ? ?/sec 1.00 590.8Â±9.00Âµs ? ?/sec

MacOS output

before:
$ cargo bench --bench serialize "Serializer " -- --save-baseline main Compiling clvmr v0.12.0 (/Users/arvid/Documents/dev/clvm_rs) Finished `bench` profile [optimized] target(s) in 10.44s Running benches/serialize.rs (target/release/deps/serialize-7a920c3288d8e0d3) Benchmarking serialize/Serializer 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 44.2s, or reduce sample count to 10. serialize/Serializer 0 time: [420.62 ms 421.77 ms 423.00 ms] Found 15 outliers among 100 measurements (15.00%) 15 (15.00%) high mild Benchmarking serialize/Serializer 1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 49.5s, or reduce sample count to 10. serialize/Serializer 1 time: [491.93 ms 492.36 ms 492.83 ms] Found 15 outliers among 100 measurements (15.00%) 5 (5.00%) high mild 10 (10.00%) high severe Benchmarking serialize/Serializer 2: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.0s, enable flat sampling, or reduce sample count to 50. serialize/Serializer 2 time: [1.6148 ms 1.6174 ms 1.6202 ms] Found 19 outliers among 100 measurements (19.00%) 17 (17.00%) low severe 1 (1.00%) high mild 1 (1.00%) high severe serialize/Serializer 3 time: [10.395 ms 10.399 ms 10.403 ms] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) high mild 4 (4.00%) high severe Benchmarking serialize/Serializer 4: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.5s, enable flat sampling, or reduce sample count to 60. serialize/Serializer 4 time: [1.2827 ms 1.2849 ms 1.2875 ms] Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe

after:

$ cargo bench --bench serialize "Serializer " -- --save-baseline tree-cache Compiling clvmr v0.12.0 (/Users/arvid/Documents/dev/clvm_rs) Finished `bench` profile [optimized] target(s) in 8.85s Running benches/serialize.rs (target/release/deps/serialize-225aa20e6ccff72f) Benchmarking serialize/Serializer 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 14.7s, or reduce sample count to 30. serialize/Serializer 0 time: [146.24 ms 146.83 ms 147.59 ms] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe Benchmarking serialize/Serializer 1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 12.2s, or reduce sample count to 40. serialize/Serializer 1 time: [121.35 ms 121.70 ms 122.07 ms] serialize/Serializer 2 time: [144.81 µs 145.14 µs 145.61 µs] Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe serialize/Serializer 3 time: [2.0211 ms 2.0218 ms 2.0225 ms] Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 1 (1.00%) high mild 6 (6.00%) high severe serialize/Serializer 4 time: [253.77 µs 254.66 µs 255.73 µs] Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) high mild 9 (9.00%) high severe

critcmp:

group main tree-cache ----- ---- ---------- serialize/Serializer 0 2.87 421.8±6.11ms ? ?/sec 1.00 146.8±3.50ms ? ?/sec serialize/Serializer 1 4.05 492.4±2.30ms ? ?/sec 1.00 121.7±1.87ms ? ?/sec serialize/Serializer 2 11.10 1611.0±37.16µs ? ?/sec 1.00 145.1±1.30µs ? ?/sec serialize/Serializer 3 5.14 10.4±0.02ms ? ?/sec 1.00 2.0±0.00ms ? ?/sec serialize/Serializer 4 5.04 1286.0±10.18µs ? ?/sec 1.00 255.3±6.99µs ? ?/sec

2. Undo-state

In chia_rs there is a test of build_compressed_block.rs, which builds a block incrementally, 1 transaction at a time. For each transaction, it needs to save the undo state in case it exceeds the limit.

Timings on Ubuntu Threadripper:

test	before	after	speed up
0	3.35 s	0.71 s	4.72
1	2.62 s	0.66 s	3.97
2	3.16 s	0.66 s	4.79
3	0.97 s	0.15 s	6.47
4	2.89 s	0.61 s	4.74
5	1.13 s	0.21 s	5.38
6	2.58 s	0.60 s	4.3
7	10.70 s	1.36 s	7.87
8	3.58 s	0.75 s	4.77
9	2.68 s	0.67 s	4
10	2.45 s	0.62 s	3.95
11	2.53 s	0.57 s	4.44
12	2.48 s	0.64 s	3.88
13	3.34 s	0.70 s	4.77
14	10.86 s	1.39 s	7.81
15	2.88 s	0.72 s	4
16	3.22 s	0.64 s	5.03
17	0.98 s	0.18 s	5.44
18	0.98 s	0.17 s	5.76
19	2.10 s	0.60 s	3.5
20	2.11 s	0.64 s	3.30
21	3.42 s	0.71 s	4.82
22	11.51 s	1.31 s	8.79
23	3.14 s	0.63 s	4.98
24	11.34 s	1.33 s	8.53
25	2.41 s	0.60 s	4.02
26	1.28 s	0.21 s	6.10
27	2.44 s	0.59 s	4.14
28	2.49 s	0.69 s	3.60
29	2.49 s	0.70 s	3.56

complete output

before:
running 1 test loading spend bundles from disk loaded 90 spend bundles idx: 0 built block in 3.35 seconds, cost: 10998575664 skipped: 3 longest-call: 2.12s TX: 37 idx: 1 built block in 2.62 seconds, cost: 10983357358 skipped: 7 longest-call: 1.76s TX: 32 idx: 2 built block in 3.16 seconds, cost: 10847563861 skipped: 7 longest-call: 2.34s TX: 62 idx: 3 built block in 0.97 seconds, cost: 10965747249 skipped: 7 longest-call: 0.24s TX: 54 idx: 4 built block in 2.89 seconds, cost: 10754122135 skipped: 7 longest-call: 1.80s TX: 40 idx: 5 built block in 1.13 seconds, cost: 10583555007 skipped: 7 longest-call: 0.28s TX: 46 idx: 6 built block in 2.58 seconds, cost: 10993197686 skipped: 7 longest-call: 1.80s TX: 55 idx: 7 built block in 10.70 seconds, cost: 10977464946 skipped: 7 longest-call: 8.72s TX: 40 idx: 8 built block in 3.58 seconds, cost: 10758545088 skipped: 7 longest-call: 1.68s TX: 39 idx: 9 built block in 2.68 seconds, cost: 10889601079 skipped: 7 longest-call: 1.61s TX: 36 idx: 10 built block in 2.45 seconds, cost: 10996639192 skipped: 3 longest-call: 1.90s TX: 29 idx: 11 built block in 2.53 seconds, cost: 10970637745 skipped: 7 longest-call: 1.64s TX: 34 idx: 12 built block in 2.48 seconds, cost: 10996973255 skipped: 6 longest-call: 1.52s TX: 37 idx: 13 built block in 3.34 seconds, cost: 10999105794 skipped: 6 longest-call: 2.33s TX: 31 test gen::build_compressed_block::tests::test_build_block has been running for over 60 seconds idx: 14 built block in 10.86 seconds, cost: 10994129896 skipped: 6 longest-call: 7.66s TX: 40 idx: 15 built block in 2.88 seconds, cost: 10664917802 skipped: 7 longest-call: 1.52s TX: 55 idx: 16 built block in 3.22 seconds, cost: 10994836257 skipped: 6 longest-call: 2.03s TX: 56 idx: 17 built block in 0.98 seconds, cost: 10913252607 skipped: 7 longest-call: 0.24s TX: 42 idx: 18 built block in 0.98 seconds, cost: 10990565458 skipped: 7 longest-call: 0.25s TX: 51 idx: 19 built block in 2.10 seconds, cost: 10997538040 skipped: 3 longest-call: 1.61s TX: 28 idx: 20 built block in 2.11 seconds, cost: 10995570352 skipped: 0 longest-call: 1.70s TX: 23 idx: 21 built block in 3.42 seconds, cost: 10793651464 skipped: 7 longest-call: 2.04s TX: 61 idx: 22 built block in 11.51 seconds, cost: 10996436556 skipped: 4 longest-call: 9.08s TX: 27 idx: 23 built block in 3.14 seconds, cost: 10741182550 skipped: 7 longest-call: 2.01s TX: 47 idx: 24 built block in 11.34 seconds, cost: 10996512566 skipped: 4 longest-call: 9.26s TX: 35 idx: 25 built block in 2.41 seconds, cost: 10511794612 skipped: 7 longest-call: 1.55s TX: 31 idx: 26 built block in 1.28 seconds, cost: 10880611048 skipped: 7 longest-call: 0.29s TX: 55 idx: 27 built block in 2.44 seconds, cost: 10761481781 skipped: 7 longest-call: 1.75s TX: 50 idx: 28 built block in 2.49 seconds, cost: 10700285540 skipped: 7 longest-call: 1.88s TX: 33 idx: 29 built block in 2.49 seconds, cost: 10999810076 skipped: 4 longest-call: 1.53s TX: 32

after:

running 1 test loading spend bundles from disk loaded 90 spend bundles idx: 0 built block in 0.71 seconds, cost: 10998575664 skipped: 3 longest-call: 0.52s TX: 37 idx: 1 built block in 0.66 seconds, cost: 10983357358 skipped: 7 longest-call: 0.52s TX: 32 idx: 2 built block in 0.66 seconds, cost: 10847563861 skipped: 7 longest-call: 0.53s TX: 62 idx: 3 built block in 0.15 seconds, cost: 10965747249 skipped: 7 longest-call: 0.04s TX: 54 idx: 4 built block in 0.61 seconds, cost: 10754122135 skipped: 7 longest-call: 0.49s TX: 40 idx: 5 built block in 0.21 seconds, cost: 10583555007 skipped: 7 longest-call: 0.04s TX: 46 idx: 6 built block in 0.60 seconds, cost: 10993197686 skipped: 7 longest-call: 0.50s TX: 55 idx: 7 built block in 1.36 seconds, cost: 10977464946 skipped: 7 longest-call: 0.74s TX: 40 idx: 8 built block in 0.75 seconds, cost: 10758545088 skipped: 7 longest-call: 0.49s TX: 39 idx: 9 built block in 0.67 seconds, cost: 10889601079 skipped: 7 longest-call: 0.54s TX: 36 idx: 10 built block in 0.62 seconds, cost: 10996639192 skipped: 3 longest-call: 0.51s TX: 29 idx: 11 built block in 0.57 seconds, cost: 10970637745 skipped: 7 longest-call: 0.50s TX: 34 idx: 12 built block in 0.64 seconds, cost: 10996973255 skipped: 6 longest-call: 0.52s TX: 37 idx: 13 built block in 0.70 seconds, cost: 10999105794 skipped: 6 longest-call: 0.53s TX: 31 idx: 14 built block in 1.39 seconds, cost: 10994129896 skipped: 6 longest-call: 0.78s TX: 40 idx: 15 built block in 0.72 seconds, cost: 10664917802 skipped: 7 longest-call: 0.56s TX: 55 idx: 16 built block in 0.64 seconds, cost: 10994836257 skipped: 6 longest-call: 0.51s TX: 56 idx: 17 built block in 0.18 seconds, cost: 10913252607 skipped: 7 longest-call: 0.04s TX: 42 idx: 18 built block in 0.17 seconds, cost: 10990565458 skipped: 7 longest-call: 0.04s TX: 51 idx: 19 built block in 0.60 seconds, cost: 10997538040 skipped: 3 longest-call: 0.50s TX: 28 idx: 20 built block in 0.64 seconds, cost: 10995570352 skipped: 0 longest-call: 0.56s TX: 23 idx: 21 built block in 0.71 seconds, cost: 10793651464 skipped: 7 longest-call: 0.54s TX: 61 idx: 22 built block in 1.31 seconds, cost: 10996436556 skipped: 4 longest-call: 0.75s TX: 27 idx: 23 built block in 0.63 seconds, cost: 10741182550 skipped: 7 longest-call: 0.50s TX: 47 idx: 24 built block in 1.33 seconds, cost: 10996512566 skipped: 4 longest-call: 0.76s TX: 35 idx: 25 built block in 0.60 seconds, cost: 10511794612 skipped: 7 longest-call: 0.50s TX: 31 idx: 26 built block in 0.21 seconds, cost: 10880611048 skipped: 7 longest-call: 0.04s TX: 55 idx: 27 built block in 0.59 seconds, cost: 10761481781 skipped: 7 longest-call: 0.49s TX: 50 idx: 28 built block in 0.69 seconds, cost: 10700285540 skipped: 7 longest-call: 0.58s TX: 33 idx: 29 built block in 0.70 seconds, cost: 10999810076 skipped: 4 longest-call: 0.56s TX: 32

Tests

The most important tests are fuzzers that verify certain properties.

tree_cache

The tree_cache fuzzer build a random CLVM tree (make_tree()) and picks a random node in it.

The starts to traverse it as if it's serializing the tree, and once the selected node has been "serialized" it ensures that the TreeCache produces valid paths to this node, every step of the traversal.

The paths are checked to never be longer than the serialized length. It also verifies that the path can be looked up by traverse_path() and that the resulting node compares equal to the one we were trying to find.

serializer_cmp

The serializer_cmp fuzzers generates a random tree and traverses it as if it was being serialized. It maintains the serialization state both with TreeCache and ObjectCache and ReadCacheLookup and ensures that every path that's found are equivalent. Some back-references have several equally good paths, and which one we use isn't important. The property that's checked is that they are equal in length.

The one exception is that TreeCache cannot generate paths onto the stack itself, just items in the stack. This needs a special case in the fuzzer.

coveralls-official · 2025-02-25T13:22:31Z

Pull Request Test Coverage Report for Build 13531514861

Details

576 of 589 (97.79%) changed or added relevant lines in 5 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.3%) to 94.454%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/serde/path_builder.rs	88	93	94.62%
src/serde/tree_cache.rs	383	391	97.95%

Totals
Change from base Build 13517775949:	0.3%
Covered Lines:	6847
Relevant Lines:	7249

💛 - Coveralls

…s, in find_path()

arvidn force-pushed the tree-cache-simplified branch from 0f04c63 to 316cc83 Compare February 25, 2025 11:46

arvidn changed the title ~~Tree cache simplified~~ optimize compressed CLVM serialization Feb 25, 2025

arvidn force-pushed the tree-cache-simplified branch 2 times, most recently from 94fe645 to 57fe04d Compare February 25, 2025 15:37

arvidn added 3 commits February 25, 2025 22:40

TreeCache to speed up serialization with backrefs

96573cb

use SHA-1 to deduplicate sub trees

f4f73c4

use bump allocator (bumpalo) to speed up allocations for partial path…

54d8325

…s, in find_path()

arvidn force-pushed the tree-cache-simplified branch from 57fe04d to 54d8325 Compare February 25, 2025 21:41

arvidn marked this pull request as ready for review February 25, 2025 23:29

arvidn requested a review from richardkiss February 25, 2025 23:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize compressed CLVM serialization #562

optimize compressed CLVM serialization #562

arvidn commented Feb 25, 2025 •

edited

Loading

coveralls-official bot commented Feb 25, 2025 •

edited

Loading

optimize compressed CLVM serialization #562

Are you sure you want to change the base?

optimize compressed CLVM serialization #562

Conversation

arvidn commented Feb 25, 2025 • edited Loading

Overview

TreeCache

de-duplication

serialization

PathBuilder

VisitedNodes

SHA-1 hashing

bump allocator

Performance

1. Serializing

2. Undo-state

Tests

tree_cache

serializer_cmp

coveralls-official bot commented Feb 25, 2025 • edited Loading

Pull Request Test Coverage Report for Build 13531514861

Details

💛 - Coveralls

arvidn commented Feb 25, 2025 •

edited

Loading

coveralls-official bot commented Feb 25, 2025 •

edited

Loading