You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
faiss is purely in-memory and sqlite-vec uses disk space.
Understanding the latency/memory tradeoffs and documenting for a single computer is likely sufficient to give users an understanding about the pros/cons of each VectorDB.
Additionally, as sqlite-vec adds new functionality (e.g., #1158), it would be useful to have a benchmark dataset for evaluating the retrieval efficacy. We could use, e.g., the CISI dataset for Information Retrieval for benchmarking.
💡 Why is this needed? What if we don't build it?
This is needed to give more options to users when using RAG. Some users may have a large set of documents and documenting the tradeoffs may help them make a better informed decision.
Other thoughts
No response
The text was updated successfully, but these errors were encountered:
Thanks! Would be great to have a comprehensive comparison on the available VectorDB. I think starting off with faiss v.s. sqlite-vec benchmarks would also guide us to the decision on which vector_db to provider as default templates.
For anyone planning to work on this, I would note that there is an open issue ( #1082 ) to add inline Qdrant support too. I think that could also be a serious contender for inclusion in the default templates, but I agree that we'd want head-to-head benchmarks for speed and scalability to make an informed decision about this.
🚀 Describe the new functionality needed
faiss
is purely in-memory andsqlite-vec
uses disk space.Understanding the latency/memory tradeoffs and documenting for a single computer is likely sufficient to give users an understanding about the pros/cons of each VectorDB.
Additionally, as sqlite-vec adds new functionality (e.g., #1158), it would be useful to have a benchmark dataset for evaluating the retrieval efficacy. We could use, e.g., the CISI dataset for Information Retrieval for benchmarking.
💡 Why is this needed? What if we don't build it?
This is needed to give more options to users when using RAG. Some users may have a large set of documents and documenting the tradeoffs may help them make a better informed decision.
Other thoughts
No response
The text was updated successfully, but these errors were encountered: