Skip to content

Commit

Permalink
Update KVrocks search index encoding documentation (#241)
Browse files Browse the repository at this point in the history
* Update KVrocks search index encoding documentation

* Update community/kvrocks-search-index-encoding.md

* Update community/kvrocks-search-index-encoding.md

---------

Co-authored-by: Twice <[email protected]>
  • Loading branch information
Beihao-Zhou and PragmaTwice authored Aug 2, 2024
1 parent 3765f0c commit b476325
Showing 1 changed file with 50 additions and 0 deletions.
50 changes: 50 additions & 0 deletions community/kvrocks-search-index-encoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ The common encoding format of key is as follows:
|-------------|-------------|
| tag | 1 |
| numeric | 2 |
| vector | 3 |

The common encoding format of a *field flag* is:

Expand Down Expand Up @@ -92,6 +93,28 @@ where *separator* currently can only be an ASCII character, and case sensitive c
| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | -> | 1 byte |
```

### HNSW Vector Field Metadata

This metadata format is specifically designed to support efficient vector search using the HNSW (Hierarchical Navigable Small World) algorithm. The encoding captures various parameters and settings relevant for managing the vector index properties and optimizing vector search operations.

```
| namespace | FIELD_META | index name | field name | | field flag | vector type | dimension | distance metric | initial cap | m | ef construction | ef runtime | epsilon | number of levels |
|-----------|------------|------------|------------| -> |------------|-------------|-----------|-----------------|-------------|-----------|-----------------|------------|---------|------------------|
| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | -> | 1 byte | 1 byte | 2 bytes | 1 byte | 4 bytes | 2 bytes | 4 bytes | 4 bytes | 8 bytes | 2 bytes |
```
#### Required attributes
- **vector type**: Specifies the type of vectors stored (e.g., `FLOAT32`, `FLOAT64`); Now Kvrocks only supports `FLOAT64`.
- **dimension**: The dimensionality of the vectors (number of elements in each vector).
- **distance metric**: Metric used for distance calculation between vectors (i.e. `L2`, `IP`, `COSINE`).

#### Optional attributes
- **initial cap**: Initial capacity of the HNSW graph, indicating the initial number of elements; Default is 500000.
- **m**: Maximum number of edges per node in the HNSW graph; Default is 16.
- **ef construction**: Size of the dynamic candidate list during the index construction phase; Default is 200.
- **ef runtime**: Size of the dynamic candidate list during the search phase; Default is 10.
- **epsilon**: Epsilon value for approximate search, controlling the trade-off between search precision and speed; Default is 0.01.
- **number of levels**: Number of levels in the HNSW graph, affecting the hierarchical structure of the graph.

## Index data encoding

Index data refers to the information stored after indexing the real data,
Expand All @@ -112,3 +135,30 @@ which is used to quickly get corresponding data in subsequent query processes.
|-----------|---------|------------|------------|-----------------|------------| -> |------------|
| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | 8 bytes | 4+B bytes | -> | 0 byte |
```

### HNSW Vector field

#### HNSW graph entry types

| hnsw type | enum value |
|--------------|-------------|
| NODE | 1 |
| EDGE | 2 |

#### HNSW node index encoding

```
| namespace | FIELD | index name | field name | level | hnsw type | user key | | num of neighbours | vector dimension | vector data |
|-----------|---------|------------|------------|-----------|----------------|------------| -> |-------------------|------------------|-----------------------|
| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | 2 bytes | NODE (1 byte) | 4+B bytes | -> | 2 bytes | 2 bytes | dimension * 8 bytes |
```

#### HNSW edge index encoding

```
| namespace | FIELD | index name | field name | level | hnsw type | user key 1 | user key 2 | | null |
|-----------|---------|------------|------------|-----------|----------------|------------|------------| -> |------------|
| 1+X bytes | 1 byte | 4+Y bytes | 4+Z bytes | 2 bytes | EDGE (1 byte) | 4+B bytes | 4+B bytes | -> | 0 byte |
```

where *user key 1* and *user key 2* represent the endpoints of an edge at a specific level within the HNSW graph.

0 comments on commit b476325

Please sign in to comment.