Low recall with SPACEV1B dataset on GPU #3593

karthik86248 · 2024-04-04T11:23:26Z

karthik86248
Apr 4, 2024

Summary

I have downloaded 1 billion vectors (subset) from the original SPACEV1B dataset hosted on the SPTAG repo . The groundtruth was computed manually.

I'm using a slightly modified version of the bench_gpu_1bn.py script file to run ANN on SPACEV1B.

The recall values reported are relatively low (around 0.3). the index used is : OPQ24_96,IVF262144,PQ24.
Tried experimenting higher PQ values like 38, 32 etc but no significant improvement.

From the big-ann-benchmarks competition baseline, the suggested index is : IVF1048576,SQ8. Planning to try this index next. The GPU benchmark scripts in FAISS repo don't seem to support this SQ index.

Platform

OS: Ubuntu 22.04.1 LTS

Faiss version: 1.7.4

Installed from: Conda

Faiss compilation options:

Running on:

CPU
GPU

Interface:

C++
Python

Reproduction instructions

mdouze · 2024-04-05T22:15:54Z

mdouze
Apr 5, 2024
Collaborator

Space1B should not be too problematic, see tab2 in https://proceedings.mlr.press/v176/simhadri22a/simhadri22a.pdf
Support for SQ8 was added to Faiss after the GPU bench script. However, if you run on a single GPU the dataset may not fit in the GPU ram.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low recall with SPACEV1B dataset on GPU #3593

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Low recall with SPACEV1B dataset on GPU #3593

karthik86248 Apr 4, 2024

Summary

Platform

Reproduction instructions

Replies: 1 comment

mdouze Apr 5, 2024 Collaborator

karthik86248
Apr 4, 2024

mdouze
Apr 5, 2024
Collaborator