Blog post about new hash table (WIP) #195

zuiderkwast · 2025-01-13T22:40:25Z

The new new hash table is one of the highlights of the upcoming 8.1 release.

Improve structure and content of the text
Replace ascii art with other art
Decide which benchmarks we want
Add benchmark results

Signed-off-by: Viktor Söderqvist <[email protected]>

SoftlyRaining · 2025-01-16T23:09:28Z

content/blog/2025-03-20-new-hash-table.md

+| Valkey 8.0 | ? bytes                |
+| Valkey 8.1 | ? bytes                |
+
+The benchmarks below were run using a key size of N and a value size of M bytes, without pipelining.


Let's add something for set/zset/hash and see if we get even more performance and memory savings since those datatypes are hashtables inside of a hashtable. :)

Yes... Feel free to replace these tables with some completely different tests.

There's a fixed overhead for the key and then per field-value. Still I'd like to see a table of memory savings per element/field/etc. for these types.

I want to do hash value embedding (to save the value pointer and an extra allocation) and Ran noticed that our embedded sds (key and field) are sds8 even when they should be sds5, so we could save a two more bytes for those. That's because they're copied from an EMBSTR robj value and those are always sds8. I have some idea to fix that too though.

SoftlyRaining · 2025-01-16T23:48:22Z

content/blog/2025-03-20-new-hash-table.md

+Why not use an open-source state-of-the-art hash table implementation such as
+Swiss tables? The answer is that we require some specific features, apart from
+the basic operations like add, lookup, replace, delete:


Another reason: Swiss table is very fast, but it stores the elements directly in a contiguous array, which requires that the elements all be the same size. Because our elements vary in size, we had to choose a different design - we chose cache-line sized buckets with element pointers. (This idea was mentioned at the end of the swiss table talk - up to you if you want to make that reference though.)

No, you can store pointers in a Swiss table, just like how we store pointers in our bucket layout. The pointers are the fixed-size elements, no?

I don't think it allows a custom key-value entry design like we do though. It can be either a set or a map (key and value) IIUC.

I think we could have picked an off-the-shelf implementation even if we couldn't embed key and value the way we do, as long as would be better than dict. It's good to use a battle-tested ready-to-use one too. It's easier to get it right, and less work... I think scan and incremental rehashing were clearly blockers though.

madolson

Overall I think the content is really good. Is @SoftlyRaining planning on adding more information about benchmark results?

madolson · 2025-02-26T00:06:37Z

content/blog/2025-03-20-new-hash-table.md

+title= "A new hash table"
+date= 2025-03-20 00:00:00
+description= "Designing a state-of-the art hash table implementation"
+authors= [ "zuiderkwast", "SoftlyRaining"]


Yall need author pages, or it won't render correctly.

I'll just blank out authors then and write my name in the bottom of the article text instead. 😆

Suggested change

authors= [ "zuiderkwast", "SoftlyRaining"]

authors= []

madolson · 2025-02-26T00:09:02Z

content/blog/2025-03-20-new-hash-table.md

+Memory usage for keys of length N and value of length M bytes. TBD.
+
+| Version    | Memory usage per key   |
+|------------|------------------------|
+| Valkey 7.2 | ? bytes                |
+| Valkey 8.0 | ? bytes                |
+| Valkey 8.1 | ? bytes                |
+


@SoftlyRaining Are you working on generating these numbers?

madolson · 2025-02-26T00:09:58Z

content/blog/2025-03-20-new-hash-table.md

+memory usage by roughly 20 bytes per key-value pair and improve the latency and
+CPU usage by rougly 10% for instances without I/O threading.
+
+Results


A typical blog would leave the results till the end, and talk about the constraints up front.

Just move the Results section to the end?

I'll keep brief text in the abstract above "we have managed to reduce the memory usage by roughly 20 bytes per key-value pair and improve the latency and CPU usage by rougly 10% for instances without I/O threading"?

madolson · 2025-02-26T00:13:13Z

content/blog/2025-03-20-new-hash-table.md

+The slowest operation when looking up a key-value pair is by far reading from
+the main RAM memory. A key point when optimizing a hash table is therefore to
+make sure we have as few memory accesses as possible. Ideally, the memory
+reading is already in the CPU cache, which is much faster memory that belong to
+the CPU.


I think this background is OK, but I think it would be beneficial to show the previous implementation, and then explain how we iterated on it to the new one. Then it becomes easier to show the memory accesses that are getting removed. You already show the previous design later on, so we can probably move it earlier.

A good general framing is, show the current state (the dict), highlight the gaps (lots of memory accesses and allocations), introduce a new concept that solves that (cache friendly), then discuss the solution. Most of that content is here, just needs a little bit of moving around.

madolson · 2025-02-26T00:14:23Z

content/blog/2025-03-20-new-hash-table.md

+
+When a computer loads some data from the main memory into the CPU cache, it does
+so in blocks of one cache line. The cache-line size is 64 bytes on almost all
+modern hardware. Recent work on hash tables, such as "Swiss tables", are highly


Should probably link the swiss table design notes or youtube video here.

madolson · 2025-02-26T00:19:38Z

content/blog/2025-03-20-new-hash-table.md

+To lookup a key "FOO" and access the value "BAR", Valkey still had to read from
+memory four times. If there is a hash collission, it has to follow two more
+pointers for each hash collission and thus read twice more from memory (the key
+and the next pointer).
+
+In Valkey 8.0, an optimization was made to embed the key ("FOO" in the drawing
+above) in the dictEntry, eliminating one pointer and one memory access.


We have a full blog post dedicated to embedding the key, so let's just start with the state of the world in 7.2.

madolson · 2025-02-26T00:20:19Z

content/blog/2025-03-20-new-hash-table.md

+authors= [ "zuiderkwast", "SoftlyRaining"]
+++
+
+Valkey is essentially a giant hash table attached to the network. A hash table


Probably want a better hook. We can talk about how many caching workloads are bound on storing data, so being able to to store more data allows you to reduce the size of your clusters.

madolson · 2025-02-26T00:21:54Z

content/blog/2025-03-20-new-hash-table.md

+In the new hash table designed for Valkey 8.1, the table consists of buckets of
+64 bytes, one cache line. Each bucket can store up to seven elements. Keys that
+map to the same bucket are all stored in the same bucket. The bucket also has a
+metadata section which contains a one byte secondary hash for each key. This is


Need to explain a bit about what the secondary hash is doing here, it's not really explained what it is or how it solve hash collisions.

I'm not sure we did any testing, but this is also really good for caching workloads, since we save a memory access most of the time when there is a cache miss. A common cache hit rate is like 80%, so that 20% gets a nice speed boost as well.

content/blog/2025-03-20-new-hash-table.md

madolson · 2025-02-26T00:32:50Z

content/blog/2025-03-20-new-hash-table.md

+Iterator prefetching
+--------------------


This is technically a separate point. It addresses some of the content of the blog, but ultimately feels a bit random at the end. Maybe we drop it and write another followup blog on memory prefetching?

We could include it in the 8.1 overview blog post. I'm not sure anyone wants to write a prefetching blog post specifically.

Signed-off-by: Viktor Söderqvist <[email protected]>

Blog post about new hash table (WIP)

83491b6

Signed-off-by: Viktor Söderqvist <[email protected]>

SoftlyRaining reviewed Jan 16, 2025

View reviewed changes

madolson reviewed Feb 26, 2025

View reviewed changes

Remove "there is some probability" formulation

7741e22

Signed-off-by: Viktor Söderqvist <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blog post about new hash table (WIP) #195

Blog post about new hash table (WIP) #195

zuiderkwast commented Jan 13, 2025 •

edited

Loading

SoftlyRaining Jan 16, 2025

zuiderkwast Jan 16, 2025

SoftlyRaining Jan 16, 2025

zuiderkwast Jan 17, 2025

zuiderkwast Jan 17, 2025

madolson left a comment

madolson Feb 26, 2025

zuiderkwast Feb 26, 2025

madolson Feb 26, 2025

madolson Feb 26, 2025

zuiderkwast Feb 26, 2025

madolson Feb 26, 2025

madolson Feb 26, 2025

madolson Feb 26, 2025

madolson Feb 26, 2025

madolson Feb 26, 2025

madolson Feb 26, 2025

zuiderkwast Feb 26, 2025

		Iterator prefetching
		--------------------

Blog post about new hash table (WIP) #195

Are you sure you want to change the base?

Blog post about new hash table (WIP) #195

Conversation

zuiderkwast commented Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madolson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zuiderkwast commented Jan 13, 2025 •

edited

Loading