-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Search request processors receive different inputs than the search API request body #17358
Comments
I think this is a known assumption that a search request processor rewrites a search request, not query dsl string. The query is constructed in an expanded form with optional parameters in search request before reaching the search pipeline. The JSON parsing should target this expanded query form. I am thinking, we should have a way to show query expansion form, then it can help users to write proper json path. for example for a match query:
we can have _search/view_source API then when sending
returns
this new preview API can help users to understand the expansion query with optional parameters. In this way, when users want to get the query text "jeans", they can come up the json path |
As a user myself, I wouldn't agree with this assumption. See documentation as well.
Query DSL is constrained to be a JSON interface, and my intuition is always that the search request processors are acting on just that - receiving that JSON, performing some transformation, and passing the JSON on. This is consistent with how ingest processors or search response processors act on individual JSON documents. While I'm not opposed to having another API to fetch the translated/expanded query, I think it still is adding an unnecessary layer of knowledge for users to understand; and, would be required to understand when building processors that are dependent on them. Currently, I believe ML inference search request processors are the only ones that are prone to this, but I have further concerns that over time as more and more search request processors get added, will require understanding this. This makes it difficult on the user:
|
This is an incorrect intuition. Query DSL is constrained to be Java objects. For example, if you use a transport or gRPC client, there is no JSON involved at all. It happens that both ingest processors and search response processors have I'm not sure how to properly convey the fact that OpenSearch does not in fact operate on JSON strings. The OpenSearch REST APIs may accept and return JSON strings, but within OpenSearch, it's nothing but Java objects. This is not a bug. When the QueryBuilder objects are created, as their name suggests, they are implementing a Builder pattern. Builders are allowed to have default field values. Builders are allowed to provide convenience methods to accept a parameter and store it elsewhere within their internal state. They are complex Java classes with their own logic. They are not JSON strings. |
Makes sense. I'm just thinking from the REST request perspective, if there would be some mechanism for the search request processors to have access to the REST body (as JSON), similar to the "document source" for ingest and search response processors. Maybe even as an optional configuration. If there is no reasonable technical way for accomplishing this, then I think we need to be careful when creating new search request processors, particularly ones that are designed to be dependent on the REST request body like the ML inference processors. Creating processors that are performing some operation against an unknown source input isn't a good user experience, as users are currently required to run an initial |
Catch All Triage - 1 2 3 |
Describe the bug
Search request processors offer manipulation on the search request when users execute a search API call. However, the request the processors receive may be different than the request the users send. This is due to pre-processing/translation/validation that OpenSearch performs before the processors have access to the request. As a result, this can lead to runtime failures.
For example, users may write an ML inference search request processor to parse out a field in the search request, and generate an embedding for it. But, if the request JSON changes between the user-supplied request, and the request received at runtime, it can lead to parsing failures, and failures in the processor to find the original intended value in the search request.
Related component
No response
To Reproduce
See example configurations in opensearch-project/dashboards-flow-framework#574
Expected behavior
From my perspective, ideally the search request processors have access to the exact search request that the users supply in the search API. This provides continuity and consistency, and lets users configure transformations and/or data parsing of that request in a consistent manner. After all search request processors have been completed, the original translation/validation can happen, and the API logic can continue as normal.
Additional Details
The text was updated successfully, but these errors were encountered: