Adding support for sharing memory between the module and the engine #1804
+221
−24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Sharing memory between the module and engine reduces memory overhead by eliminating redundant copies of stored records in the module. This is particularly beneficial for search workloads that require indexing large volumes of documents.
Vectors
Vector similarity search requires storing large volumes of high-cardinality vectors. For example, a single vector with 512 dimensions consumes 2048 bytes, and typical workloads often involve millions of vectors. Due to the lack of a memory-sharing mechanism between the module and the engine, ValkeySearch currently doubles memory consumption when indexing vectors, significantly increasing operational costs. This limitation introduces adoption friction and reduces ValkeySearch's competitiveness.
Technical Details
Memory Allocation Strategy
At a fundamental level, there are two primary allocation strategies:
For ValkeySearch, it is crucial that vectors reside in cache-aligned memory to maximize SIMD optimizations. Allowing the module to allocate memory provides greater flexibility for different use cases, though it introduces slightly higher implementation complexity.
Shared SDS
Shared SDS, a new data type, facilitates module-engine memory sharing with thread-safe intrusive reference counting. It preserves SDS semantics and structure while adding ref-counting and a free callback for statistics tracking.
New Module APIs
VM_CreateSharedSDS
:VM_SharedSDSPtrLen
: Retrieves the raw buffer pointer and length of a Shared SDS.VM_ReleaseSharedSDS
: Decreases the Shared SDS ref-count by 1.Extended Module APIs
VM_HashSet
: Supports setting a shared SDS in the hash.VM_HashGet
: Retrieves a shared SDS and increments its ref-count by 1.Engine Hash Data-Type
ValkeySearch indexes documents which reside in engine as
t_hash
data-type records. While JSON is also supported, it is out of scope for this discussion. Thet_hash
implementation is based on either list-pack for small datasets or hashtable for larger ones.Since list-pack performs deep copies, it cannot support intrusive ref-counting semantics. As a result, if list-pack is used as the underline data-type while setting a shared SDS, .e.g. by calling
VM_HashSet
, it is converted tohashtable
. Additionally, for the same reason, a shared SDS is never stored asembedded
value in hashtable entry.