-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Remote Index Client] Implement remote client build awaiting functionality, validate encoder support #2576
base: main
Are you sure you want to change the base?
[Remote Index Client] Implement remote client build awaiting functionality, validate encoder support #2576
Conversation
9a9dfb6
to
53d63d4
Compare
.../java/org/opensearch/knn/index/codec/nativeindex/remote/DefaultVectorRepositoryAccessor.java
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/nativeindex/remote/RemoteIndexBuildStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/nativeindex/remote/RemoteIndexBuildStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/remote/KNNRemoteConstants.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/remote/RemoteIndexPoller.java
Outdated
Show resolved
Hide resolved
Thanks @owenhalpert, the overall approach looks good to me, just needs some cleanup in a few places. |
On a high level - looking at the changes - I am not sure if we should add in all the client/HTTP constructs directly into the plugin. The job status constants, client configuration seems like a better fit for a client package. |
BlobPath path = repository.basePath().add(indexSettings.getUUID() + VECTORS_PATH); | ||
BlobContainer blobContainer = repository.blobStore().blobContainer(path); | ||
|
||
assert blobContainer != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assserts do not work in prod. Can we do null checks instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the logic of getting the blob container comes from OpenSearch core so the assertion is just a sanity check on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jed326 if assertion fails will we fallback to CPU based index builds? and will there be a useful exception which can tell what happened? If no, please add these information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is there is not any way for blobContainer
to be null
, and none of the methods in the calling path to retrieve blobContainer
throw any checked exceptions. The only thing that can throw an exception is getRepository
, and creating a blobContainer on top of a repository reference is more or less just setting the file path to be written/read to. This assertion just helps make sure in tests if we do try to test/mock this method we are properly configuring things.
@kotwanikunal , since remote index build service is not an hosted service like S3 which vends out a HTTP client. Also, we don't want to maintain another repo as the maintainence overhead will be high. Hence the client code is better suited to be in k-NN repo only. If you see right now, the code is abstracted in a different java package. Now if you think we should move the java package to a separate module then that is something which can be easily achieved in upcoming iterations. Let us know your thoughts. |
8e6ce08
to
ce7f8a0
Compare
private String getMethodName(String parametersJson) throws IOException { | ||
XContentParser parser = XContentType.JSON.xContent() | ||
.createParser(NamedXContentRegistry.EMPTY, DeprecationHandler.THROW_UNSUPPORTED_OPERATION, parametersJson.getBytes()); | ||
|
||
while (parser.nextToken() != XContentParser.Token.END_OBJECT) { | ||
if (parser.currentToken() == XContentParser.Token.FIELD_NAME) { | ||
String fieldName = parser.currentName(); | ||
parser.nextToken(); | ||
if (NAME.equals(fieldName)) { | ||
return parser.text(); | ||
} | ||
} | ||
} | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was mentioned before, but can we track in a GitHub issue somewhere to improve this parsing? I think that's a generic problem in k-NN and not specific to remote index build too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will get this out tomorrow
src/test/java/org/opensearch/knn/index/engine/KNNEngineTests.java
Outdated
Show resolved
Hide resolved
ce7f8a0
to
1a9577b
Compare
@@ -64,7 +64,7 @@ public NativeIndexBuildStrategy getBuildStrategy( | |||
if (KNNFeatureFlags.isKNNRemoteVectorBuildEnabled() | |||
&& repositoriesServiceSupplier != null | |||
&& indexSettings != null | |||
&& knnEngine.supportsRemoteIndexBuild() | |||
&& knnEngine.supportsRemoteIndexBuild(fieldInfo.attributes()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why cannot we just pass the whole fieldInfo Object, rather than passing attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my previous PR I used BuildIndexParams.parameters instead of the whole BuildIndexParams in a similar way, because of feedback from @Vikasht34 who suggested I pass only the minimum necessary data between these classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I understand that, but fieldInfo.attributes()
give a map
which can be generic and open for exploit. Hence in this case I was thinking having a fieldInfo object would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Me and Navneet has discussed , Please keep Map as it is , Rather than passing FieldInfo
BlobPath path = repository.basePath().add(indexSettings.getUUID() + VECTORS_PATH); | ||
BlobContainer blobContainer = repository.blobStore().blobContainer(path); | ||
|
||
assert blobContainer != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jed326 if assertion fails will we fallback to CPU based index builds? and will there be a useful exception which can tell what happened? If no, please add these information
.../java/org/opensearch/knn/index/codec/nativeindex/remote/DefaultVectorRepositoryAccessor.java
Outdated
Show resolved
Hide resolved
private String getMethodName(String parametersJson) throws IOException { | ||
XContentParser parser = XContentType.JSON.xContent() | ||
.createParser(NamedXContentRegistry.EMPTY, DeprecationHandler.THROW_UNSUPPORTED_OPERATION, parametersJson.getBytes()); | ||
|
||
while (parser.nextToken() != XContentParser.Token.END_OBJECT) { | ||
if (parser.currentToken() == XContentParser.Token.FIELD_NAME) { | ||
String fieldName = parser.currentName(); | ||
parser.nextToken(); | ||
if (NAME.equals(fieldName)) { | ||
return parser.text(); | ||
} | ||
} | ||
} | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
src/main/java/org/opensearch/knn/index/engine/faiss/FaissHNSWMethod.java
Outdated
Show resolved
Hide resolved
if (COMPLETED_INDEX_BUILD.equals(builder.taskStatus) && StringUtils.isBlank(builder.fileName)) { | ||
throw new IOException("Invalid response format, missing " + FILE_NAME + " for completed status"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this validation be part of fromXContent() ? I feel we should move this validation out of this function and put it at the place where we are validating the status and other attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the actual content checks to the poller (while the status response object will just be checking the validity of the JSON).
src/main/java/org/opensearch/knn/index/remote/RemoteIndexPoller.java
Outdated
Show resolved
Hide resolved
public RemoteBuildStatusResponse awaitVectorBuild(RemoteBuildResponse remoteBuildResponse) throws InterruptedException, IOException { | ||
RemoteIndexPoller remoteIndexPoller = new RemoteIndexPoller(this); | ||
return remoteIndexPoller.pollRemoteEndpoint(remoteBuildResponse); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I mentioned that Polling function should not be part of client I mean the Poller class should call the awaitVectorBuild
api from outside. What is being done here is exactly reversed. This is the reason why I still think the api name should be getIndexBuildStatus
rather than awaitVectorBuild
. Because polling is one way to wait for the IndexBuild to complete.
I think RemoteIndexPoller -> getIndexBuildStatus()
rather than awaitVectorBuild() -> RemoteIndexPoller()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks. The new flow will be that the build strategy gets the Waiter implementation from a factory method. In this case it's a Poller. The Poller uses client.getBuildStatus and does the polling all on its own.
The client is specifically responsible for getting and sending requests — the Waiter will use the client to do so when needed.
src/main/java/org/opensearch/knn/index/remote/RemoteIndexPoller.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/remote/RemoteIndexHTTPClient.java
Outdated
Show resolved
Hide resolved
Signed-off-by: owenhalpert <[email protected]>
Pull polling logic out to generic RemoteIndexPoller class (that can use any client), add tests Signed-off-by: owenhalpert <[email protected]>
Refactor, use more string constants where possible, fix unit tests after refactor Signed-off-by: owenhalpert <[email protected]>
Signed-off-by: owenhalpert <[email protected]> # Conflicts: # src/test/java/org/opensearch/knn/index/engine/KNNEngineTests.java
Signed-off-by: owenhalpert <[email protected]>
acd6118
to
2a80d32
Compare
src/main/java/org/opensearch/knn/index/engine/faiss/FaissHNSWMethod.java
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/remote/RemoteIndexHTTPClient.java
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/remote/RemoteIndexPoller.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/remote/RemoteIndexPoller.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/remote/RemoteIndexPoller.java
Outdated
Show resolved
Hide resolved
Signed-off-by: owenhalpert <[email protected]>
8c0a04a
to
fbb2d1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @owenhalpert
@@ -10,6 +10,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), | |||
* [Remote Vector Index Build] Implement vector data upload and vector data size threshold setting [#2550](https://github.com/opensearch-project/k-NN/pull/2550) | |||
* [Remote Vector Index Build] Implement data download and IndexOutput write functionality [#2554](https://github.com/opensearch-project/k-NN/pull/2554) | |||
* [Remote Vector Index Build] Introduce Client Skeleton + basic Build Request implementation [#2560](https://github.com/opensearch-project/k-NN/pull/2560) | |||
* [Remote Vector Index Build] Client polling mechanism [#2576](https://github.com/opensearch-project/k-NN/pull/2576) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please improve your commit message and the changelog entry.
@@ -64,7 +64,7 @@ public NativeIndexBuildStrategy getBuildStrategy( | |||
if (KNNFeatureFlags.isKNNRemoteVectorBuildEnabled() | |||
&& repositoriesServiceSupplier != null | |||
&& indexSettings != null | |||
&& knnEngine.supportsRemoteIndexBuild() | |||
&& knnEngine.supportsRemoteIndexBuild(fieldInfo.attributes()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I understand that, but fieldInfo.attributes()
give a map
which can be generic and open for exploit. Hence in this case I was thinking having a fieldInfo object would be better.
public boolean supportsRemoteIndexBuild(Map<String, String> attributes) throws IOException { | ||
String parametersJson = attributes.get(PARAMETERS); | ||
if (parametersJson != null) { | ||
String methodName = getMethodName(parametersJson); | ||
if (METHOD_HNSW.equals(methodName)) { | ||
return FaissHNSWMethod.supportsRemoteIndexBuild(attributes); | ||
} | ||
} | ||
return false; | ||
} | ||
|
||
/** | ||
* Get method name from a {@link FieldInfo} formatted attributes map | ||
* Example: | ||
* { | ||
* "index_description": "HNSW12,Flat", | ||
* "spaceType": "l2", | ||
* "name": "hnsw", | ||
* ... | ||
* } | ||
*/ | ||
private String getMethodName(String parametersJson) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets avoid IOException in method signature, it is just making the whole interfaces ugly. Lets catch the exception in the functions and then see if we want throw a proper runtime exception or we want to return some defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think we can catch and return null for the parsing methods on IOExceptions so the supportsRemoteIndexBuild just returns false.
* @return encoder name or null if not found | ||
* @throws IOException if the json string is not valid | ||
*/ | ||
private static String getEncoderName(String parametersJson) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmazanec15 and @naveentatikonda can you please help in validating this logic to ensure that it is full proof, with all the different combinations that can be possible for the encoders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm thinking about this a bit more. Do we only need the encoder on write? If so, I think itd be safest to retrieve it from the KnnVectorFieldType via mapper service (https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/mapper/KNNVectorFieldType.java). We should expose it as a method in https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/engine/KNNLibraryIndexingContext.java which will be available via the mapped field type.
If we need it on read, then we have to parse it out of this. But, from my understanding, we shouldnt need it on read.
public class RemoteBuildStatusRequest { | ||
private final String jobId; | ||
|
||
public RemoteBuildStatusRequest(RemoteBuildResponse remoteBuildResponse) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public RemoteBuildStatusRequest(RemoteBuildResponse remoteBuildResponse) { | |
public static RemoteBuildStatusRequest build(RemoteBuildResponse remoteBuildResponse) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Status Build Request work based on RemoteBuildResponse rather than it has to work with minimum JobId which is enough for get status.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this keeps the design more extensible if the request/response were to contain something in addition to or instead of the job ID (e.g. tenantId).
String fileName; | ||
String errorMessage; | ||
|
||
static RemoteBuildStatusResponse fromXContent(XContentParser parser) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets add a java doc with all the possible json that this function can parse.
long timeout = KNNSettings.getRemoteBuildClientTimeout().getNanos(); | ||
// Thread.sleep expects millis | ||
long pollInterval = KNNSettings.getRemoteBuildClientPollInterval().getMillis(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these values be part of while loop so that if settings are changed the loop can exits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this too -- I lean towards no though for 3 main reasons:
- I think it's safer / more stable to have a merge/flush task be immutable wrt settings, similar to how we are doing with the endpoint
- It would be good to not make so may
getClusterSettings()
calls in a loop (although we could instead cache the value with a settings update consumer) - If we want to support cancel functionality we should properly scope that out instead of overloading it into the timeout settings
*/ | ||
class RemoteIndexPoller implements RemoteIndexWaiter { | ||
// The poller waits KNN_REMOTE_BUILD_CLIENT_POLL_INTERVAL * INITIAL_DELAY_FACTOR before sending the first status request | ||
private static final int INITIAL_DELAY_FACTOR = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since your KNN_DEFAULT_REMOTE_BUILD_CLIENT_POLL_INTERVAL_SECONDS
is 30sec, the factor of 3 is pretty high. decrease the value of 30sec to say may be 10sec or 5 sec, then a factor of 3 will be good.
I understand we want to tune these values, but I can tell you from my initial benchmarks this value is very high already
|
||
package org.opensearch.knn.index.remote; | ||
|
||
public class RemoteIndexWaiterFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add java docs on all new classes and public functions.
@owenhalpert please check why CIs are failing. Also, I think code is in pretty good shape and close to be getting shipped. |
@Value | ||
@Builder | ||
public class RemoteBuildStatusResponse { | ||
private static final ParseField TASK_STATUS_FIELD = new ParseField(TASK_STATUS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where we have defined Seervice API contract and how clinet can assume these parsing!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
This PR is a followup to #2560. This PR adds a RemoteIndexWaiter interface to await the completion of the remote vector build. For the HTTP client, this is implemented as a simple polling mechanism in
RemoteIndexPoller
. While there is a fair amount of files changed due to refactoring, the crux of this PR is in theRemoteIndexPoller
,RemoteIndexHTTPClient
, andRemoteIndexBuildStrategy
classes.This PR also adds a check on the encoder type, enforcing that only
HNSWFlat
is supported by the remote build service.This PR does not include:
which will go in the next PR.
Related Issues
#2560 PR 1
#2518 LLD
#2391 Meta issue
Check List
- [ ] New functionality has been documented.- [ ] API changes companion pull request created.--signoff
.- [ ] Public documentation issue/PR created.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.