Hard to achieve high recall rate(recall@R) for huge workload #4190
EvagelineFEI
started this conversation in
General
Replies: 2 comments 2 replies
-
Hi @EvagelineFEI what's the data distribution like for the smaller datasets as well as the larger datasets? Is the data randomly sampled and provide uniform data distribution? |
Beta Was this translation helpful? Give feedback.
1 reply
-
@EvagelineFEI how many query vectors are you using? All 10k from sift1b? And can you paste how you are computing recall@100? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello team,
I am using faiss.IndexHNSWFlat and my dataset is sift1b from http://corpus-texmex.irisa.fr/.
I add 10M data into the index and then I adjust the parameters(M, efc, efs)to improve the recall rate(recall@100).
I have tried many pairs,(M from 5 to 500) but my best result stuck at 0.93.
It doesn't seem that my precision calculation code has error. And it goes better for smaller dataset(like 1M,2M, where the precision can reach 0.96 or higher). What can be the problem?
Some of my machine info:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 80
On-line CPU(s) list: 0-79
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
CPU family: 6
Model: 79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
Stepping: 1
CPU max MHz: 3600.0000
CPU min MHz: 1200.0000
BogoMIPS: 4600.00
Beta Was this translation helpful? Give feedback.
All reactions