Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional section of node analyzers into NodeInfo #10296

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lukas-vlcek
Copy link
Contributor

@lukas-vlcek lukas-vlcek commented Oct 2, 2023

Description

A first implementation proposal of a new REST API called "Nodes Analyzers".

There are some changes/extensions to original "core" API to avoid use of Java reflections.

I still need to add a few more tests. Especially tests that will set up a few artificial cluster nodes with controlled set of plugins and then verifying that the response contains all that is required (including Hunspell dictionaries).

Original proposal that was implemented as a plugin included some interesting REST API tests (see https://github.com/lukas-vlcek/OpenSearch-list-built-in-analyzers/). I think we will need to reimplement these differently.

Related Issues

Resolves #5481

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

Compatibility status:

Checks if related components are compatible with change a9f4e77

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/performance-analyzer.git]

@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

Gradle Check (Jenkins) Run Completed with:

@bbarani
Copy link
Member

bbarani commented Feb 5, 2024

Hello @lukas-vlcek @reta @andrross This PR needs to be merged by today to get in to 2.12.0 release. Can you please take necessary action (i.e. comment, merge or label it to future version etc..) as soon as possible?

@reta
Copy link
Collaborator

reta commented Feb 5, 2024

Hello @lukas-vlcek @reta @andrross This PR needs to be merged by today to get in to 2.12.0 release. Can you please take necessary action (i.e. comment, merge or label it to future version etc..) as soon as possible?

Thanks @bbarani I sadly failed to convey my concerns to @lukas-vlcek related to public API changes here, leaving it up to other maintainers to look at it.

@dblock
Copy link
Member

dblock commented Feb 5, 2024

We can't take a breaking change into 2.x, is there a non-breaking change version of this?

@Pallavi-AWS
Copy link
Member

We can't take a breaking change into 2.x, is there a non-breaking change version of this?

+1. If this new API implementation is introducing breaking changes, we should push it out.

@reta
Copy link
Collaborator

reta commented Feb 6, 2024

We can't take a breaking change into 2.x, is there a non-breaking change version of this?

@dblock @Pallavi-AWS Sorry if I managed to confuse your, I don't think the changes in public API are breaking, the concerns I see are part of this discussion #10296 (comment), TLDR; I think the AnalysisRegistry should have more meaningful API posture instead of pile of methods to get strings: such APIs are impossible to evolve (not much you could add to the string). But that is just my opinion, no intention to block it, up to the reviewers to decide.

@@ -59,6 +60,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Add task completion count in search backpressure stats API ([#10028](https://github.com/opensearch-project/OpenSearch/pull/10028/))
- Deprecate CamelCase `PathHierarchy` tokenizer name in favor to lowercase `path_hierarchy` ([#10894](https://github.com/opensearch-project/OpenSearch/pull/10894))
- Switched to more reliable OpenSearch Lucene snapshot location([#11728](https://github.com/opensearch-project/OpenSearch/pull/11728))
- Breaking change: "search_pipelines" metric is not included in NodesInfoRequest by default ([#10296](https://github.com/opensearch-project/OpenSearch/pull/10296))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to @dblock's comment. We don't want to bring breaking changes into 2.x. Can you separate out this change from this PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh... I feel really badly about this one, since I never should have included the search_pipelines section in the default NodesInfoRequest. @lukas-vlcek was just cleaning up my mess.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could we address that separately?

/**
* Returns registered keys of {@link Analyzer}s on this node.
*/
public Set<String> getNodeAnalyzersKeys() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with @reta regarding the Java API concerns. This is basically an example of a "stringly typed" pattern and as Andriy mentioned it really restricts maintainability as the code evolves.

@andrross
Copy link
Member

andrross commented Feb 6, 2024

@reta The breaking change is:

Breaking change: "search_pipelines" metric is not included in NodesInfoRequest by default

and it's not obvious to me why it is required for this change.

@reta
Copy link
Collaborator

reta commented Feb 6, 2024

and it's not obvious to me why it is required for this change.

Hm .. I think it should be not part of this pull request, great catch @andrross !

@lukas-vlcek
Copy link
Contributor Author

lukas-vlcek commented Feb 6, 2024

Thanks for review all.

Let's discuss individual concerns/issues here. Assuming we want to move forward with this PR and target next possible release (if you feel otherwise, please let me know).

Breaking change

Why this PR contains a breaking change in Nodes Info REST API (see @msfroh comment for context) and should that be handled as a separate PR? In short: yes it should.

But the point here is that nodes analysis components (subject of this PR) should not be included in the list of Nodes Info metrics by default. It should be excluded the same way Michael wanted to exclude the search pipelines.
So if we can not change this for 2.x release (because it is considered a breaking change) then analysis components should target 3.x release only. Otherwise we can target next 2.x release but both the search pipelines and analysis components will be still part of the default metric set and then both will be exclued in 3.x. What do you think?

Java API concerns

Regarding the Java API concerns (mentioned here). I will be completely honest here: If what you are recommending is to turn returning Set<String> to Set<_AnalysisAbstractionObject_> then I will do it. But I want to explain my perspective. The _AnalysisAbstractionObject_ should be designed by someone who really needs to work with it and I do not think it is the scope of this PR to refactor AnalysisRegistry and AnalysisProvider. If I am to design _AnalysisAbstractionObject_ object then I am almost sure it will be found lacking in some ways by someone who will need it in the future and that someone will have to do the refactoring anyway fortunately that someone will have much better idea about what he needs from the object than what I have right now. I respect the proposal that @reta made about this object before and I can follow it (i.e. a small POJO-like object). Thoughts?

The documentation part

There is also a documentation PR which includes a lot of improvements and some fixes that are independent on the Nodes Analyzers feature. I can go and rework the documentation PR so that Nodes Analyzers content is extracted and the rest can be pushed with the next 2.x release.

@dblock
Copy link
Member

dblock commented Feb 6, 2024

What do you think?

I would make a breaking change in main, but backport it as a non-breaking change to 2.x.

@andrross
Copy link
Member

andrross commented Feb 6, 2024

Assuming we want to move forward with this PR and target next possible release (if you feel otherwise, please let me know).

I'm definitely in favor of moving forward with the feature and I think other folks are as well.

If I am to design AnalysisAbstractionObject object then I am almost sure it will be found lacking in some ways by someone who will need it in the future

I think that is the crux of @reta's feedback. The string-based approach is likely to be insufficient for some future use case and will require additional APIs alongside the string-based APIs and that will be clumsy and/or redundant. I think a small POJO-like object is pretty simple and will be more extensible and will alleviate the concerns here.

documentation

Please feel free to include the improvements separate from the new analyzers feature. Also, I believe documentation improvements for existing features are not tied to releases and can be made at any time.

lukas-vlcek added a commit to lukas-vlcek/documentation-website that referenced this pull request Feb 16, 2024
This functionality was introduced in PR opensearch-project/OpenSearch#10296

Signed-off-by: Lukáš Vlček <[email protected]>
lukas-vlcek added a commit to lukas-vlcek/documentation-website that referenced this pull request Feb 17, 2024
This functionality was introduced in PR opensearch-project/OpenSearch#10296

Signed-off-by: Lukáš Vlček <[email protected]>
@dblock
Copy link
Member

dblock commented Mar 4, 2024

@lukas-vlcek @andrross Is this getting merged and backported to 2.13 (we have 1-2 weeks max)? @andrross all yours to CR/merge. I imagine we want this in first and then merge #12497 only to main?

Signed-off-by: Daniel (dB.) Doubrovkine <[email protected]>
@github-actions github-actions bot added the v2.13.0 Issues and PRs related to version 2.13.0 label Mar 4, 2024
Copy link
Contributor

github-actions bot commented Mar 4, 2024

✅ Gradle check result for a9f4e77: SUCCESS

@lukas-vlcek
Copy link
Contributor Author

lukas-vlcek commented Mar 6, 2024

@dblock

@andrross all yours to CR/merge. I imagine we want this in first and then merge #12497 only to main?

The only thing that is "blocking" #12497 from merging to main right now is missing documentation PR (see opensearch-project/documentation-website#6544 for details if you are interested but there needs to be done some extra work first but I will manage that).

Then I need to update this PR to contain only the relevant parts that need to go to main and address review comments. That should be ready by next week.

@dblock
Copy link
Member

dblock commented Mar 12, 2024

@lukas-vlcek I merged #12497, set documentation aside. Want to rebase/update this?

@getsaurabh02 getsaurabh02 added v2.14.0 and removed v2.12.0 Issues and PRs related to version 2.12.0 v2.13.0 Issues and PRs related to version 2.13.0 labels Apr 10, 2024
@hdhalter
Copy link

hdhalter commented May 1, 2024

@getsaurabh02 - Is this being delivered with 2.14? I have an accompanying doc PR that I need to know what to do with. Thanks!

@getsaurabh02 getsaurabh02 added v2.15.0 Issues and PRs related to version 2.15.0 and removed v2.14.0 labels May 1, 2024
@getsaurabh02
Copy link
Member

We briefly touched upon this in today's community meeting. Moving this to 2.15 since we yet to close the PR

@hdhalter
Copy link

hdhalter commented May 1, 2024

We briefly touched upon this in today's community meeting. Moving this to 2.15 since we yet to close the PR

Thanks. I'll update the label.

@hdhalter
Copy link

hdhalter commented Jun 6, 2024

@getsaurabh02 - Is this confirmed for 2.15?

@reta reta removed the v2.15.0 Issues and PRs related to version 2.15.0 label Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request Search:Relevance
Projects
Status: 👀 In review
Development

Successfully merging this pull request may close these issues.

Get list of all available analyzers. Request for a new API?