Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain API changes #2403

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

neetikasinghal
Copy link

@neetikasinghal neetikasinghal commented Jan 17, 2025

Description

Add support for explain for Exact/ANN/Radial/Disk/Filtering k-nn search. Score calculation explanation is currently added only for ANN search.
Proposal for explain is given here: #875 (comment)

ITs - WIP..

Related Issues

Resolves #875

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Collaborator

@Vikasht34 Vikasht34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have Unit test for each case

  1. Disk-based search with valid rescore context.
  2. Radial search.
  3. ANN search (default case).
  4. Shard-level rescoring enabled.
  5. Shard-level rescoring disabled.
  6. Filter weight case where filtered IDs are less than k.
  7. Filter threshold value greater than cardinality.
  8. Missing native engine files.
  9. Valid context with matching document and disk-based search.

And Please validate with Explaination Object.

@neetikasinghal
Copy link
Author

We need to have Unit test for each case

  1. Disk-based search with valid rescore context.
  2. Radial search.
  3. ANN search (default case).
  4. Shard-level rescoring enabled.
  5. Shard-level rescoring disabled.
  6. Filter weight case where filtered IDs are less than k.
  7. Filter threshold value greater than cardinality.
  8. Missing native engine files.
  9. Valid context with matching document and disk-based search.

And Please validate with Explaination Object.

@Vikasht34 yes the tests are not yet added in here, hence its in a draft status. I will add the coverage with all the possible cases.

@neetikasinghal
Copy link
Author

@navneet1v / @Vikasht34 would you please review the changes?

@navneet1v
Copy link
Collaborator

@neetikasinghal can you please add an entry in the change log

@neetikasinghal
Copy link
Author

@neetikasinghal can you please add an entry in the change log

yup i generally add it towards the end of the review so that its easier to rebase with the latest changes.

Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at high level in the code, need to go through more in terms of explanation. But one thing I want to add is I am not seeing any ITs related to explain api. Can we please add them too

@neetikasinghal
Copy link
Author

Looked at high level in the code, need to go through more in terms of explanation. But one thing I want to add is I am not seeing any ITs related to explain api. Can we please add them too

yup, that WIP as my setup for ITs was broken. I am able to fix that now, however the PR has the coverage for all the UTs.

Copy link
Collaborator

@Vikasht34 Vikasht34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will continue on other files.

@neetikasinghal
Copy link
Author

pending items -

  • rebase to the latest changes
  • Add ITs

public Explanation explain(LeafReaderContext context, int doc, float score, KNNScorer knnScorer) {
knnQuery.setExplain(true);
try {
knnScorer = getOrCreateKnnScorer(context, knnScorer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a scenario when knnScorer is not null? If no, then you should not pass it to this method and always create new instance inside the method.

try {
return FieldInfoExtractor.getSpaceType(modelDao, fieldInfo);
} catch (IllegalArgumentException e) {
return knnQuery.getVectorDataType() == VectorDataType.BINARY ? SpaceType.DEFAULT_BINARY : SpaceType.DEFAULT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we are not signaling to user that system is in bad state, wouldn't this be misleading, I mean giving false data to the user?

if (explanationFormula != null) {
return explanationFormula;
}
throw new UnsupportedOperationException("explainScoreTranslation is not defined for this space type.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this effectively means if object created with old constructor (formula is null) then this method will throw exception, is this desired behavior? Can we have some constant like "undefined" or similar, or if formula isn't defined the rets of explanation detail don't make sense as well?

public class KnnExplanation {

@Getter
private final Map<Object, Integer> annResultPerLeaf;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure you want to expose raw map? clients will be able to modify them. You may want to consider returning Collections.unmodifiableMap(your_map) in getter method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Explain API not compatible with k-NN queries
4 participants