Skip to content

Commit

Permalink
Add frozen support to roaring64 (#688)
Browse files Browse the repository at this point in the history
* Array-backed ART

* Array-backed r64

* ART serialization

* r64 frozen serialization

* Synthetic benchmarks for r64

* Address review comments

* Add random insert / remove benchmark

* Link free nodes together

This adds the index of the next free node into a newly freed node, or `capacity` if there are no more free indices.

This significantly speeds up finding the next free index, which is important for add+remove workloads.

Benchmarks
Old:
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
r64InsertRemoveRandom/0        127 ns          127 ns      5461079
r64InsertRemoveRandom/1      31633 ns        31604 ns        24028
r64InsertRemoveRandom/2      30782 ns        30769 ns        21859
r64InsertRemoveRandom/3      31985 ns        31969 ns        21558
r64InsertRemoveRandom/4        356 ns          356 ns      1962694
r64InsertRemoveRandom/5      28972 ns        28962 ns        21366
r64InsertRemoveRandom/6      30632 ns        30623 ns        22682
r64InsertRemoveRandom/7        448 ns          448 ns      1601550
r64InsertRemoveRandom/8      32506 ns        32495 ns        21591
r64InsertRemoveRandom/9        689 ns          689 ns      1002237
cppInsertRemoveRandom/0        131 ns          131 ns      5319673
cppInsertRemoveRandom/1      16106 ns        16104 ns        43632
cppInsertRemoveRandom/2       3881 ns         3881 ns       180087
cppInsertRemoveRandom/3       3582 ns         3582 ns       171298
cppInsertRemoveRandom/4        403 ns          402 ns      1666697
cppInsertRemoveRandom/5        993 ns          993 ns       706038
cppInsertRemoveRandom/6       4039 ns         4038 ns       172421
cppInsertRemoveRandom/7        469 ns          469 ns      1440197
cppInsertRemoveRandom/8       1454 ns         1454 ns       633551
cppInsertRemoveRandom/9        654 ns          654 ns      1091588
setInsertRemoveRandom/0       1944 ns         1943 ns       368926
setInsertRemoveRandom/1       1955 ns         1953 ns       404931
setInsertRemoveRandom/2       1911 ns         1910 ns       358466
setInsertRemoveRandom/3       1953 ns         1951 ns       362351
setInsertRemoveRandom/4       2104 ns         2102 ns       321387
setInsertRemoveRandom/5       1944 ns         1943 ns       354836
setInsertRemoveRandom/6       1835 ns         1835 ns       359099
setInsertRemoveRandom/7       1970 ns         1968 ns       372625
setInsertRemoveRandom/8       1894 ns         1892 ns       355456
setInsertRemoveRandom/9       1659 ns         1659 ns       355902

New:
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
r64InsertRemoveRandom/0        128 ns          128 ns      5614266
r64InsertRemoveRandom/1        935 ns          935 ns       739679
r64InsertRemoveRandom/2        916 ns          916 ns       739944
r64InsertRemoveRandom/3        936 ns          936 ns       690708
r64InsertRemoveRandom/4        368 ns          368 ns      1957642
r64InsertRemoveRandom/5       1141 ns         1140 ns       592505
r64InsertRemoveRandom/6       1139 ns         1138 ns       657840
r64InsertRemoveRandom/7        481 ns          481 ns      1434967
r64InsertRemoveRandom/8       1447 ns         1446 ns       484463
r64InsertRemoveRandom/9        721 ns          721 ns      1017456
cppInsertRemoveRandom/0        134 ns          134 ns      5524804
cppInsertRemoveRandom/1      15616 ns        15608 ns        47666
cppInsertRemoveRandom/2       3855 ns         3854 ns       180265
cppInsertRemoveRandom/3       3809 ns         3808 ns       183595
cppInsertRemoveRandom/4        412 ns          412 ns      1695708
cppInsertRemoveRandom/5       1012 ns         1011 ns       713501
cppInsertRemoveRandom/6       3410 ns         3409 ns       199214
cppInsertRemoveRandom/7        474 ns          474 ns      1496740
cppInsertRemoveRandom/8       1421 ns         1420 ns       465868
cppInsertRemoveRandom/9        564 ns          564 ns      1148076
setInsertRemoveRandom/0       1956 ns         1956 ns       351283
setInsertRemoveRandom/1       1959 ns         1958 ns       355766
setInsertRemoveRandom/2       1886 ns         1885 ns       357406
setInsertRemoveRandom/3       1905 ns         1904 ns       355235
setInsertRemoveRandom/4       1945 ns         1944 ns       364599
setInsertRemoveRandom/5       1902 ns         1902 ns       350312
setInsertRemoveRandom/6       1907 ns         1906 ns       346962
setInsertRemoveRandom/7       1937 ns         1936 ns       356168
setInsertRemoveRandom/8       1881 ns         1880 ns       341472
setInsertRemoveRandom/9       1962 ns         1961 ns       350643

* Sort free lists in art_shrink_to_fit

This avoids a bug in the following scenario:
  art->leaves = [2,0,x]
  art->first_free[leaf_type] = 1

Where `2` and `0` are pointers to the next free index, and `x` is an occupied
leaf. In this case, if `art_shrink_to_fit` was called, then we would have the
following result:
  art->leaves = [2,x,0]
  art->first_free[leaf_type] = 0

This is not fully shrunken, and therefore wrong. Sorting the free indices fixes
this scenario. Before `art_shrink_to_fit`:
  art->leaves = [1,2,x]
  art->first_free[leaf_type] = 0

After `art_shrink_to_fit`:
  art->leaves = [x,2,3]
  art->first_free[leaf_type] = 1

* Minor cleanups to ART and r64 internals

* Replace size_t with uint64_t where applicable

Also replace malloc+memset with calloc.

* Use a generic pointer array for ART nodes

This, combined with a static array of node type sizes, allows us to generically manipulate the nodes.

* Correct outdated comment

* Always try to shrink containers

* Replace size_t with uint64_t where applicable in r64

* Check if ART is shrunken when checking if r64 is shrunken
  • Loading branch information
SLieve authored Feb 28, 2025
1 parent 34b2271 commit f30bfa9
Show file tree
Hide file tree
Showing 8 changed files with 2,929 additions and 1,124 deletions.
127 changes: 88 additions & 39 deletions include/roaring/art/art.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@
* chunks _differ_. This means that if there are two entries with different
* high 48 bits, then there is only one inner node containing the common key
* prefix, and two leaves.
* * Intrusive leaves: the leaf struct is included in user values. This removes
* a layer of indirection.
* * Mostly pointer-free: nodes are referred to by index rather than pointer,
* so that the structure can be deserialized with a backing buffer.
*/

// Fixed length of keys in the ART. All keys are assumed to be of this length.
Expand All @@ -33,25 +33,33 @@ namespace internal {
#endif

typedef uint8_t art_key_chunk_t;
typedef struct art_node_s art_node_t;

// Internal node reference type. Contains the node typecode in the low 8 bits,
// and the index in the relevant node array in the high 48 bits. Has a value of
// CROARING_ART_NULL_REF when pointing to a non-existent node.
typedef uint64_t art_ref_t;

typedef void art_node_t;

/**
* Wrapper to allow an empty tree.
* The ART is empty when root is a null ref.
*
* Each node type has its own dynamic array of node structs, indexed by
* art_ref_t. The arrays are expanded as needed, and shrink only when
* `shrink_to_fit` is called.
*/
typedef struct art_s {
art_node_t *root;
art_ref_t root;

// Indexed by node typecode, thus 1 larger than they need to be for
// convenience. `first_free` indicates the index where the first free node
// lives, which may be equal to the capacity.
uint64_t first_free[6];
uint64_t capacities[6];
art_node_t *nodes[6];
} art_t;

/**
* Values inserted into the tree have to be cast-able to art_val_t. This
* improves performance by reducing indirection.
*
* NOTE: Value pointers must be unique! This is because each value struct
* contains the key corresponding to the value.
*/
typedef struct art_val_s {
art_key_chunk_t key[ART_KEY_BYTES];
} art_val_t;
typedef uint64_t art_val_t;

/**
* Compares two keys, returns their relative order:
Expand All @@ -63,14 +71,21 @@ int art_compare_keys(const art_key_chunk_t key1[],
const art_key_chunk_t key2[]);

/**
* Inserts the given key and value.
* Initializes the ART.
*/
void art_init_cleared(art_t *art);

/**
* Inserts the given key and value. Returns a pointer to the value inserted,
* valid as long as the ART is not modified.
*/
void art_insert(art_t *art, const art_key_chunk_t *key, art_val_t *val);
art_val_t *art_insert(art_t *art, const art_key_chunk_t *key, art_val_t val);

/**
* Returns the value erased, NULL if not found.
* Returns true if a value was erased. Sets `*erased_val` to the value erased,
* if any.
*/
art_val_t *art_erase(art_t *art, const art_key_chunk_t *key);
bool art_erase(art_t *art, const art_key_chunk_t *key, art_val_t *erased_val);

/**
* Returns the value associated with the given key, NULL if not found.
Expand All @@ -83,42 +98,39 @@ art_val_t *art_find(const art_t *art, const art_key_chunk_t *key);
bool art_is_empty(const art_t *art);

/**
* Frees the nodes of the ART except the values, which the user is expected to
* free.
* Frees the contents of the ART. Should not be called when using
* `art_deserialize_frozen_safe`.
*/
void art_free(art_t *art);

/**
* Returns the size in bytes of the ART. Includes size of pointers to values,
* but not the values themselves.
*/
size_t art_size_in_bytes(const art_t *art);

/**
* Prints the ART using printf, useful for debugging.
*/
void art_printf(const art_t *art);

/**
* Callback for validating the value stored in a leaf.
* Callback for validating the value stored in a leaf. `context` is a
* user-provided value passed to the callback without modification.
*
* Should return true if the value is valid, false otherwise
* If false is returned, `*reason` should be set to a static string describing
* the reason for the failure.
*/
typedef bool (*art_validate_cb_t)(const art_val_t *val, const char **reason);
typedef bool (*art_validate_cb_t)(const art_val_t val, const char **reason,
void *context);

/**
* Validate the ART tree, ensuring it is internally consistent.
* Validate the ART tree, ensuring it is internally consistent. `context` is a
* user-provided value passed to the callback without modification.
*/
bool art_internal_validate(const art_t *art, const char **reason,
art_validate_cb_t validate_cb);
art_validate_cb_t validate_cb, void *context);

/**
* ART-internal iterator bookkeeping. Users should treat this as an opaque type.
*/
typedef struct art_iterator_frame_s {
art_node_t *node;
art_ref_t ref;
uint8_t index_in_node;
} art_iterator_frame_t;

Expand All @@ -130,6 +142,8 @@ typedef struct art_iterator_s {
art_key_chunk_t key[ART_KEY_BYTES];
art_val_t *value;

art_t *art;

uint8_t depth; // Key depth
uint8_t frame; // Node depth

Expand All @@ -143,19 +157,19 @@ typedef struct art_iterator_s {
* depending on `first`. The iterator is not valid if there are no entries in
* the ART.
*/
art_iterator_t art_init_iterator(const art_t *art, bool first);
art_iterator_t art_init_iterator(art_t *art, bool first);

/**
* Returns an initialized iterator positioned at a key equal to or greater than
* the given key, if it exists.
*/
art_iterator_t art_lower_bound(const art_t *art, const art_key_chunk_t *key);
art_iterator_t art_lower_bound(art_t *art, const art_key_chunk_t *key);

/**
* Returns an initialized iterator positioned at a key greater than the given
* key, if it exists.
*/
art_iterator_t art_upper_bound(const art_t *art, const art_key_chunk_t *key);
art_iterator_t art_upper_bound(art_t *art, const art_key_chunk_t *key);

/**
* The following iterator movement functions return true if a new entry was
Expand All @@ -174,14 +188,49 @@ bool art_iterator_lower_bound(art_iterator_t *iterator,
/**
* Insert the value and positions the iterator at the key.
*/
void art_iterator_insert(art_t *art, art_iterator_t *iterator,
const art_key_chunk_t *key, art_val_t *val);
void art_iterator_insert(art_iterator_t *iterator, const art_key_chunk_t *key,
art_val_t val);

/**
* Erase the value pointed at by the iterator. Moves the iterator to the next
* leaf. Returns the value erased or NULL if nothing was erased.
* leaf.
* Returns true if a value was erased. Sets `*erased_val` to the value erased,
* if any.
*/
bool art_iterator_erase(art_iterator_t *iterator, art_val_t *erased_val);

/**
* Shrinks the internal arrays in the ART to remove any unused elements. Returns
* the number of bytes freed.
*/
size_t art_shrink_to_fit(art_t *art);

/**
* Returns true if the ART has no unused elements.
*/
bool art_is_shrunken(const art_t *art);

/**
* Returns the serialized size in bytes.
* Requires `art_shrink_to_fit` to be called first.
*/
size_t art_size_in_bytes(const art_t *art);

/**
* Serializes the ART and returns the number of bytes written. Returns 0 on
* error. Requires `art_shrink_to_fit` to be called first.
*/
size_t art_serialize(const art_t *art, char *buf);

/**
* Deserializes the ART from a serialized buffer, reading up to `maxbytes`
* bytes. Returns 0 on error. Requires `buf` to be 8 byte aligned.
*
* An ART deserialized in this way should only be used in a readonly context.The
* underlying buffer must not be freed before the ART. `art_free` should not be
* called on the ART deserialized in this way.
*/
art_val_t *art_iterator_erase(art_t *art, art_iterator_t *iterator);
size_t art_frozen_view(const char *buf, size_t maxbytes, art_t *art);

#ifdef __cplusplus
} // extern "C"
Expand Down
55 changes: 54 additions & 1 deletion include/roaring/roaring64.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ namespace api {
#endif

typedef struct roaring64_bitmap_s roaring64_bitmap_t;
typedef struct roaring64_leaf_s roaring64_leaf_t;
typedef uint64_t roaring64_leaf_t;
typedef struct roaring64_iterator_s roaring64_iterator_t;

/**
Expand Down Expand Up @@ -312,6 +312,12 @@ uint64_t roaring64_bitmap_maximum(const roaring64_bitmap_t *r);
*/
bool roaring64_bitmap_run_optimize(roaring64_bitmap_t *r);

/**
* Shrinks internal arrays to eliminate any unused capacity. Returns the number
* of bytes freed.
*/
size_t roaring64_bitmap_shrink_to_fit(roaring64_bitmap_t *r);

/**
* (For advanced users.)
* Collect statistics about the bitmap
Expand Down Expand Up @@ -564,6 +570,53 @@ size_t roaring64_bitmap_portable_deserialize_size(const char *buf,
roaring64_bitmap_t *roaring64_bitmap_portable_deserialize_safe(const char *buf,
size_t maxbytes);

/**
* Returns the number of bytes required to serialize this bitmap in a "frozen"
* format. This is not compatible with any other serialization formats.
*
* `roaring64_bitmap_shrink_to_fit()` must be called before this method.
*/
size_t roaring64_bitmap_frozen_size_in_bytes(const roaring64_bitmap_t *r);

/**
* Serializes the bitmap in a "frozen" format. The given buffer must be at least
* `roaring64_bitmap_frozen_size_in_bytes()` in size. Returns the number of
* bytes used for serialization.
*
* `roaring64_bitmap_shrink_to_fit()` must be called before this method.
*
* The frozen format is optimized for speed of (de)serialization, as well as
* allowing the user to create a bitmap based on a memory mapped file, which is
* possible because the format mimics the memory layout of the bitmap.
*
* Because the format mimics the memory layout of the bitmap, the format is not
* fixed across releases of Roaring Bitmaps, and may change in future releases.
*
* This function is endian-sensitive. If you have a big-endian system (e.g., a
* mainframe IBM s390x), the data format is going to be big-endian and not
* compatible with little-endian systems.
*/
size_t roaring64_bitmap_frozen_serialize(const roaring64_bitmap_t *r,
char *buf);

/**
* Creates a readonly bitmap that is a view of the given buffer. The buffer
* must be created with `roaring64_bitmap_frozen_serialize()`, and must be
* aligned by 64 bytes.
*
* Returns NULL if deserialization fails.
*
* The returned bitmap must only be used in a readonly manner. The bitmap must
* be freed using `roaring64_bitmap_free()` as normal. The backing buffer must
* only be freed after the bitmap.
*
* This function is endian-sensitive. If you have a big-endian system (e.g., a
* mainframe IBM s390x), the data format is going to be big-endian and not
* compatible with little-endian systems.
*/
roaring64_bitmap_t *roaring64_bitmap_frozen_view(const char *buf,
size_t maxbytes);

/**
* Iterate over the bitmap elements. The function `iterator` is called once for
* all the values with `ptr` (can be NULL) as the second parameter of each call.
Expand Down
4 changes: 4 additions & 0 deletions microbenchmarks/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,7 @@ add_executable(bench bench.cpp)
target_link_libraries(bench PRIVATE roaring)
target_link_libraries(bench PRIVATE benchmark::benchmark)
target_compile_definitions(bench PRIVATE BENCHMARK_DATA_DIR="${BENCHMARK_DATA_DIR}")

add_executable(synthetic_bench synthetic_bench.cpp)
target_link_libraries(synthetic_bench PRIVATE roaring)
target_link_libraries(synthetic_bench PRIVATE benchmark::benchmark)
Loading

0 comments on commit f30bfa9

Please sign in to comment.