Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: expanded embedding capabilities #12

Open
bhazzard opened this issue Jun 23, 2023 · 2 comments
Open

Feature Request: expanded embedding capabilities #12

bhazzard opened this issue Jun 23, 2023 · 2 comments

Comments

@bhazzard
Copy link

Butterfish is supercool. I'd like expanded embedding capabilities. Primarily, I'd like to see:

  • More robust searching capabilities
  • Ability to query my index remotely
  • Ability to integrate index with other software

Potential Approaches:

If this feature is prioritized, there are likely numerous approaches to implementation. A few I've thought of:

  • Pinecone as an alternative vector index & query engine: Provide a means to switch from using local indexing and brute force querying to Pinecone. Perhaps through environment variables or a config file.
  • Ability to sync local index up to Pinecone: Provide a command like butterfish indexsync . to upsert / delete embeddings in a local index to Pinecone.
  • VectorDB Abstraction/Plugins: Provide an extension point where multiple vector database implementations could be built, and used based on user configuration. In this way you eventually support local, pinecone, milvus, weaviate, etc.
  • Implement efficient search algorithms and remote querying: Keep a local-only solution, but implement more efficient search algorithms, and implement a way to query your local index remotely.
@bhazzard
Copy link
Author

Also for this and #13, if we can figure out a solution approach that you're happy with (and that I'm capable of implementing), I'm happy to contribute.

@bakks
Copy link
Owner

bakks commented Jun 24, 2023

Heyo @bhazzard - these are good thoughts and I'm bullish on this. I've been most focused on shell dev and for personal reasons (just had a baby) I haven't had as much time lately for development, if you want to contribute these I'm all for it! Tag me on PRs.

Some specific notes:

  • I haven't gone down the external vector store road myself because I haven't used it myself heavily and my understanding is that the threshold at which it makes sense to use a non-bruteforce vector search is actually sort of high (I've read like 100k vectors is where it starts to matter, which matches my intuition). BUT I totally buy that if you're already using an external vector store that it makes sense.
  • In general the local vector management code is fairly primitive so don't be afraid of modifying it.
  • If butterfish does ship with external vector store capabilities I would want to avoid installing shared libs by default, i.e. that install should be optional.
  • The idea of allowing external querying similar to plugin access to your local machine is something I haven't thought of and a cool idea. If you want to implement I would copy how the plugin setup currently works.
  • The plugin goes through a separate private repo of mine that I use to manage https://butterfi.sh. The pathway is OpenAI plugin -> butterfi.sh server -> local gRPC client. I will share you in to the private repo. If you implement this I'm happy to host it, up to you.
  • Also if you contribute, I don't have a contributor agreement, I think it's fine to just comment with "I'll assign copyright to the butterfish project under the MIT License" and let's call it done 😃 .

Anyway I'll add you to the separate repo and reach out separately. HAVE AT IT!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants