Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds VectorIndex extension #378

Merged
merged 7 commits into from
Feb 9, 2025
Merged

feat: adds VectorIndex extension #378

merged 7 commits into from
Feb 9, 2025

Conversation

jayhack
Copy link
Contributor

@jayhack jayhack commented Feb 9, 2025

Introduces very naive vector index as an "extension":

  • Get all embeddings from OpenAI
  • Store then in a numpy array
  • Ability to store/save/load on disk

Initial investigations show this takes about 50mb of memory for all of pytorch and takes 2.5 minutes.

Future iterations on this can show how to:

  • invalidate embeddings when a file blob hash changes
  • store on a symbol level
  • compute symbol-level embeddings including their extended context

etc.

This is designed to be used as input for other APIs, like the "semantic search" tool, LlamaIndex retrievers etc.

image

@jayhack jayhack requested review from codegen-team and a team as code owners February 9, 2025 21:51
Copy link

codecov bot commented Feb 9, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

Additional details and impacted files

@jayhack jayhack merged commit e2b2da2 into develop Feb 9, 2025
23 of 24 checks passed
@jayhack jayhack deleted the jay/vector-index branch February 9, 2025 23:28
Copy link

🎉 This PR is included in version 0.6.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

tomcodgen pushed a commit that referenced this pull request Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants