Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weaviate embeddings rework #77

Merged
merged 15 commits into from
Mar 4, 2024
Merged

Weaviate embeddings rework #77

merged 15 commits into from
Mar 4, 2024

Conversation

DrejcPesjak
Copy link
Contributor

This rework contains:

  • added Weaviate vector database support
  • added an abstract class VectorDB_API
  • moved from required uuid as file name, to storing image file paths alongside in the vector DB
  • removed pytorch dependencies from the library (still present in examples)
  • reworked the examples jupyter notebooks

Note: currently supports only cosine similarity as a distance score, as it is nicely limited between 0 and 1.

return model


def load_model_resnet50() -> nn.Module:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nicer to merge these two functions together:

def load_model_resnet50(discard_last_layer: bool):
    ...

@conorsim
Copy link
Collaborator

conorsim commented Feb 7, 2024

I'll review later but it would be nice to add type hints to all the function definitions - I noticed them occasionally but not consistently.

@@ -83,7 +78,7 @@ def _plot_kde(xs, s, density, maxima, minima):
plt.show()


def kde_peaks(data, bandwidth="scott", plot=False):
def kde_peaks(data: np.ndarray, bandwidth: Union[str, float] = "scott", plot: bool = False) -> tuple[np.ndarray, np.ndarray, int, float]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK luxonis-ml has python3.8>= requirement and for python3.8 tuple type annotation should be switched with Tuple or it will throw an error

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to this, it would be nice to create at least a very dummy test case for each added module that at least imports everything, so it catches errors like this.

Copy link
Collaborator

@conorsim conorsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - I like the processing example!

Requesting minor change due to a typing error which breaks the import in the notebooks. And then not sure if my Weaviate issue is unique to me or not.

plot=False,
):
def find_similar(
reference_embeddings: Union[str, List[str], List[List[float], np.ndarray]],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
reference_embeddings: Union[str, List[str], List[List[float], np.ndarray]],
reference_embeddings: Union[str, List[str], List[List[float]], np.ndarray]

I'm not sure if this should be the type or not, but the current typing throws an error at import in the example notebook.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this error once I get to the weaviate section of the notebook. Not sure if it would be a simple fix or that something could be wrong in my environment. If there is additional weaviate setup besides the embeddings requirements we should try to document it somewhere.

WeaviateStartUpError: Weaviate did not start up in 5 seconds. Either the Weaviate URL http://localhost:8080/ is wrong or Weaviate did not start up in the interval given in 'startup_period'.


return img_transposed

def get_image_tensors_from_LFS(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Minor): Maybe get_image_tensors_from_remote or even get_image_tensors_from_lfs would be a better function name

within a given set of embeddings. The removal process uses Kernel Density
Estimation (KDE) on embeddings cosine similarity for optimal split, making
this approach particularly suited for embeddings in high dimensional spaces.
**Overview:**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use B{Overview:} for bold text instead of asterisks, see epytext documentation for more details.


- For Weaviate-specific examples, refer to the provided code examples.
- The `search_vectordb` function can be used with either Qdrant or Weaviate, depending on the provided `vectordb_api` object.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, C{search_vectordb} instead of backticks.

@DrejcPesjak
Copy link
Contributor Author

Pre-commit hooks passed, but pre-commit CI test still fails

Ran the pre-commits as described here. And got the following output when commiting:

git commit -m "Pre-commit hook changes"

ruff.....................................................................Passed
ruff-format..............................................................Passed
docformatter.............................................................Passed
don't commit to branch...................................................Passed
mdformat.............................................(no files to check)Skipped
[weaviate_emb 25427a1] Pre-commit hook changes
5 files changed, 267 insertions(+), 235 deletions(-)

But the CI/ call-reusable-workflow/pre-commit (pull_request) test still fails.

@kozlov721
Copy link
Collaborator

kozlov721 commented Feb 12, 2024

The pre-commit command does not pass on my end with the same errors as the ones in the CI. I see where's the issue probably and will update the CONTRIBUTING.md. When commiting it only runs on the commited files, which might pass, but it doesn't mean it passes on all files. This would be caused by previously pushing without the pre-commit checks, so the problematic files are already upstream and no longer checked when git commit is ran. Run pre-commit run --all-files to get the errors on all files.

Copy link

github-actions bot commented Feb 14, 2024

Test Results

17 tests   17 ✅  0s ⏱️
 1 suites   0 💤
 1 files     0 ❌

Results for commit 7b63170.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Feb 14, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
3247 284 9% 0% 🟢

New Files

File Coverage Status
luxonis_ml/embeddings/utils/vectordb.py 0% 🟢
luxonis_ml/embeddings/utils/weaviate.py 0% 🟢
TOTAL 0% 🟢

Modified Files

File Coverage Status
luxonis_ml/embeddings/init.py 0% 🟢
luxonis_ml/embeddings/methods/OOD.py 0% 🟢
luxonis_ml/embeddings/methods/init.py 0% 🟢
luxonis_ml/embeddings/methods/duplicate.py 0% 🟢
luxonis_ml/embeddings/methods/mistakes.py 0% 🟢
luxonis_ml/embeddings/methods/representative.py 0% 🟢
luxonis_ml/embeddings/utils/init.py 0% 🟢
luxonis_ml/embeddings/utils/embedding.py 0% 🟢
luxonis_ml/embeddings/utils/ldf.py 0% 🟢
luxonis_ml/embeddings/utils/model.py 0% 🟢
luxonis_ml/embeddings/utils/qdrant.py 0% 🟢
TOTAL 0% 🟢

updated for commit: 7b63170 by action🐍

Copy link
Collaborator

@conorsim conorsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now - the import issue is fixed

@conorsim conorsim merged commit 7d44719 into dev Mar 4, 2024
12 checks passed
@conorsim conorsim deleted the weaviate_emb branch March 4, 2024 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants