You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I copy-pasted the two scripts [0][1] into a notebook without any changes. They produce different embeddings and different results.
HG gives: Cosine similarity between "I'm searching for a planet not too far from Earth." and "Neptune is the eight..." is: 0.622 Cosine similarity between "I'm searching for a planet not too far from Earth." and "TRAPPIST-1d, also de..." is: 0.490 Cosine similarity between "I'm searching for a planet not too far from Earth." and "A harsh desert world..." is: 0.433
Sentence Transformers gives: Cosine similarity between "I'm searching for a planet not too far from Earth." and "Neptune is the eight..." is: 0.480 Cosine similarity between "I'm searching for a planet not too far from Earth." and "TRAPPIST-1d, also de..." is: 0.370 Cosine similarity between "I'm searching for a planet not too far from Earth." and "A harsh desert world..." is: 0.369
I checked the embeddings; both the doc and the query embeddings are different between the two scripts. I also tried running on GPU (by adding .cuda() in relevant places) - same results as above.
If it helps, I can dump the embedding vectors or the full code in the comments.
It would be nice to have the expected output in the README as well.
I figured it out. I was using the wrong sentence-transformers library.
I was using the latest official version: pip install --upgrade git+https://github.com/UKPLab/sentence-transformers.git
while what worked was your fork: pip install --upgrade git+https://github.com/Muennighoff/sentence-transformers.git@sgpt_poolings_specb
I copy-pasted the two scripts [0][1] into a notebook without any changes. They produce different embeddings and different results.
HG gives:
Cosine similarity between "I'm searching for a planet not too far from Earth." and "Neptune is the eight..." is: 0.622
Cosine similarity between "I'm searching for a planet not too far from Earth." and "TRAPPIST-1d, also de..." is: 0.490
Cosine similarity between "I'm searching for a planet not too far from Earth." and "A harsh desert world..." is: 0.433
Sentence Transformers gives:
Cosine similarity between "I'm searching for a planet not too far from Earth." and "Neptune is the eight..." is: 0.480
Cosine similarity between "I'm searching for a planet not too far from Earth." and "TRAPPIST-1d, also de..." is: 0.370
Cosine similarity between "I'm searching for a planet not too far from Earth." and "A harsh desert world..." is: 0.369
I checked the embeddings; both the doc and the query embeddings are different between the two scripts. I also tried running on GPU (by adding .cuda() in relevant places) - same results as above.
If it helps, I can dump the embedding vectors or the full code in the comments.
It would be nice to have the expected output in the README as well.
[0] https://github.com/Muennighoff/sgpt#asymmetric-semantic-search-be
[1] https://github.com/Muennighoff/sgpt#asymmetric-semantic-search-be-st
The text was updated successfully, but these errors were encountered: