Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks vs google dens hash map and flat_hash_map #1

Open
aminya opened this issue Jan 8, 2021 · 7 comments
Open

Benchmarks vs google dens hash map and flat_hash_map #1

aminya opened this issue Jan 8, 2021 · 7 comments

Comments

@aminya
Copy link

aminya commented Jan 8, 2021

Could you add some benchmarks for the library

https://github.com/skarupke/flat_hash_map

@aminya aminya changed the title Benchmarks vs google dens hash map Benchmarks vs google dens hash map and flat_hash_map Jan 8, 2021
@ktprime
Copy link

ktprime commented Feb 13, 2022

a good hash map, it's quite same as one of my emhash8(https://github.com/ktprime/emhash/blob/master/hash_table8.hpp) design
I have added it into my bench (https://github.com/ktprime/emhash/blob/master/bench/ebench.cpp).

@ktprime
Copy link

ktprime commented Jun 23, 2022

if index moves from node to bucket, your hashmap will be more efficient(memory and performance)

@Jiwan
Copy link
Owner

Jiwan commented Jun 23, 2022

Hi @ktprime would you mind expanding a bit more on your idea of moving the index?

@ktprime
Copy link

ktprime commented Jun 24, 2022

I use the following bucket vector as index(metadata)

    struct Bucket
    {
        size_type next;   //next collision index in current bucket vector. 
        size_type index; //index in node vector
    };

.....
Bucket bucket_[];
NodeType node[]; //std::pair<key,value> NodeType;

your current implemention is simple(use integer index/vector to replace chained map's pointer/list) and also efficient,
but it's slow for finding miss because of serach/comparation in node vector may produces more cache miss than in bucket vector(small dataset).

@ktprime
Copy link

ktprime commented Jun 24, 2022

huangyuanbing1@ubuntu:~/map_benchmark/build ```
./bench_jiwan_dense_hash_map__robin_hood_hash IterateIntegers

"jg::dense_hash_map"; "robin_hood::hash"; "IterateIntegers"; "01"; "iterate while adding"; 20833333325000; 0.63688; 2.08984
"jg::dense_hash_map"; "robin_hood::hash"; "IterateIntegers"; "02"; "iterate while removing"; 62498750000000; 0.636369; 2.08984

huangyuanbing1@ubuntu:~/map_benchmark/build
./bench_ktprime_hash_table8__robin_hood_hash IterateIntegers

"emhash8::HashMap"; "robin_hood::hash"; "IterateIntegers"; "01"; "iterate while adding"; 20833333325000; 0.566379; 1.56641
"emhash8::HashMap"; "robin_hood::hash"; "IterateIntegers"; "02"; "iterate while removing"; 62498750000000; 0.558062; 1.56641

at present the two hashmap(dense hash map) is the fastest iteration implemention compared with other's flat hash map in martinus's benchmark code.

@Jiwan
Copy link
Owner

Jiwan commented Jun 24, 2022

Right. I am getting what you mean now. By moving the next index into the bucket's structure, you are removing some unnecessary gaps in between the nodes.

I believe that a colleague of mine tried that strategy on our dense_hash_map but couldn't see any significant improvements and we abandoned that idea. But it seems that you have reproducible results which are really interesting. Thanks for pointing it out 👍

I will think of the trade-offs of doing it this way (if there is any of course) and could attempt to improve that hash_map.

@Fios-Dev
Copy link

Have you considered implementing this suggestion as well as updating to C++20? I know you mentioned interest in doing so in your blog post

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants