Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor derived source for nested field types #2583

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jmazanec15
Copy link
Member

Description

Refactors derived source nested injector to be more readable and reliable. Moves low-level lucene calls into a helper class. Simplifies the iterator and the injector.

The overall logic for the nested portion is complex because of the following case:

Assume we have the following document from the user and we are deriving the value for nested.vector
{
  "nested": [
    {
      "text": "text1"
    },
    {
      "vector": [vec1]
    },
    {
      "vector": [vec2],
      "text": "text2"
    }
  ]
}

This would get filtered and serialized as:
{
  "nested": [
    {
      "text": "text1"
    },
    {
      "text": "text2"
    }
  ]
}

We need to ensure that when we want to inject vec1 back, we create a new map and put it in between
the existing fields.

To do this, we need to know what docs the existing 2 maps map to.

My strategy to map the items in the source to docIds is to:

  1. Match a fieldInfo that the document must have
  2. After the given offset, find the first document that has this field
  3. Get the next doc that is a sibling of the field being injected.

Because of the complexity, for 3.0, this PR also blocks enabling this feature for fields with more than 2 levels of nesting. This includes object fields as well. The focus will be getting this to work in a very stable fashion before moving on to generalized support.

Additionally, it adds a new integ test. I want to add a lot more unit tests, but wanted to put out a PR to get initial feedback.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@jmazanec15 jmazanec15 force-pushed the derived-nested-enhancement branch from cbe34bd to 0630739 Compare March 5, 2025 20:53
* @return lowest docId for the field that is greater than the offset. Returns {@link DocIdSetIterator#NO_MORE_DOCS} if doc cannot be found
* @throws IOException if there is an issue reading from the formats
*/
public int getFirstDocWhereFieldExists(String fieldToMatch, int offset) throws IOException {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to want to use this multiple times in a single request? Would it make sense to return the DocIdSetIterator instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will make this call multiple times, but for the field will not be consistent. But we could reuse if we map it to a particular field.

@jmazanec15 jmazanec15 force-pushed the derived-nested-enhancement branch from 0630739 to 5fb1f9d Compare March 5, 2025 22:02
Refactors derived source nested injector to be more readable and
reliable. Moves low-level lucene calls into a helper class.
Simplifies the iterator and the injector.

Along with refactoring, it blocks enabling this feature for fields
with more than 2 levels of nesting. This includes object fields as
well. The focus will be getting this to work in a very stable fashion
before moving on to generalized support.

Additionally, it adds a new integ test. Still need to add uTs but the
refactor overall for nested should make it a lot easier.

Signed-off-by: John Mazanec <[email protected]>
@jmazanec15 jmazanec15 force-pushed the derived-nested-enhancement branch from 5fb1f9d to f006b33 Compare March 5, 2025 22:09
@jmazanec15 jmazanec15 force-pushed the derived-nested-enhancement branch from 980fea2 to b721e83 Compare March 6, 2025 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants