Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and benchmark performance tradeoffs between sqlite-vec and FAISS inline VectorDB providers #1165

Open
franciscojavierarceo opened this issue Feb 20, 2025 · 4 comments
Labels
enhancement New feature or request

Comments

@franciscojavierarceo
Copy link
Contributor

🚀 Describe the new functionality needed

faiss is purely in-memory and sqlite-vec uses disk space.

Understanding the latency/memory tradeoffs and documenting for a single computer is likely sufficient to give users an understanding about the pros/cons of each VectorDB.

Additionally, as sqlite-vec adds new functionality (e.g., #1158), it would be useful to have a benchmark dataset for evaluating the retrieval efficacy. We could use, e.g., the CISI dataset for Information Retrieval for benchmarking.

💡 Why is this needed? What if we don't build it?

This is needed to give more options to users when using RAG. Some users may have a large set of documents and documenting the tradeoffs may help them make a better informed decision.

Other thoughts

No response

@franciscojavierarceo franciscojavierarceo added the enhancement New feature or request label Feb 20, 2025
@yanxi0830
Copy link
Contributor

Thanks! Would be great to have a comprehensive comparison on the available VectorDB. I think starting off with faiss v.s. sqlite-vec benchmarks would also guide us to the decision on which vector_db to provider as default templates.

https://github.com/zilliztech/VectorDBBench looks relevant that compares different vectorDBs out there.

@franciscojavierarceo
Copy link
Contributor Author

Thanks @yanxi0830 ! I'll see if I can use that.

@jwm4
Copy link
Contributor

jwm4 commented Feb 23, 2025

For anyone planning to work on this, I would note that there is an open issue ( #1082 ) to add inline Qdrant support too. I think that could also be a serious contender for inclusion in the default templates, but I agree that we'd want head-to-head benchmarks for speed and scalability to make an informed decision about this.

@franciscojavierarceo
Copy link
Contributor Author

I am planning on working on this. I'll make the script available that does the analysis, so folks at Qdrant, Milvus, and others can also use it. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants