-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-14204 Add a Flow Analysis Rule for connection load balancing parameters #9685
base: main
Are you sure you want to change the base?
Conversation
Thanks for working on this new Flow Analysis Rule @markobean. I just started looking at it. Would it be possible to combine the five src/test/resource files that include a violation into one file? Then the unit test could configure the rule to only look for one type of violation. |
nifi-extension-bundles/nifi-standard-bundle/nifi-standard-rules/pom.xml
Outdated
Show resolved
Hide resolved
...i-standard-rules/src/main/java/org/apache/nifi/flowanalysis/rules/RestrictLoadBalancing.java
Outdated
Show resolved
Hide resolved
I updated the unit tests to all use a single flow definition file. Due to the nature of combining load balancing strategy and compression, some unit tests have more than a single connection in violation. I did create static variables that are more descriptive of the component in violation. Hopefully, it is clear. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for proposing this rule @markobean.
I made some comments around the property definitions, as the number of Boolean values creates a variety of options.
The proposed description mentions locating connections with load balancing, but that should not be the primary purpose of analysis rules.
On initial review, the rule itself seems too broad. Connection Load Balancing makes sense at very particular parts of a flow, and does not make sense in others. One anti-pattern is load balancing at every Processor, as opposed to just an initial source. Having a general rule that checks for the presence of load balancing one way or the other doesn't provide enough granularity to determine whether the implementation is good or bad.
Are there other use cases where this generalized approach fits?
nifi-extension-bundles/nifi-standard-bundle/nifi-standard-rules/pom.xml
Outdated
Show resolved
Hide resolved
...i-standard-rules/src/main/java/org/apache/nifi/flowanalysis/rules/RestrictLoadBalancing.java
Outdated
Show resolved
Hide resolved
...i-standard-rules/src/main/java/org/apache/nifi/flowanalysis/rules/RestrictLoadBalancing.java
Outdated
Show resolved
Hide resolved
...i-standard-rules/src/main/java/org/apache/nifi/flowanalysis/rules/RestrictLoadBalancing.java
Outdated
Show resolved
Hide resolved
I envisioned this particular rule (and maybe others) as a way to spot check your flow. This is why it is great the rules have an enable/disable feature. It doesn't have to be left enabled all the time, but rather enable it to perform analysis and then disable it again. This may be more appropriate for the warning type violation, not enforcement. I see your point though. One way around this would be to apply Flow Analysis Rules to specific Process Group(s). Doing so would be well beyond the scope of this ticket. I recommend this PR continue with the existing Rules framework - good and bad. (I already have another Jira issue to improve the way violations on a connection work.) There is an immediately use case for using this rule to quickly identify all locations where load balancing is employed. As you say, this should be implemented judiciously for obvious performance reasons. This rule will aid in finding an over-abundance of load balanced connections. |
Thanks for the additional background @markobean, that provides helpful context for consideration. I agree with you that being able to scope rules analysis to the Process Group level would be useful in this scenario. Implementing it is not trivial as you noted, and some past work around this didn't go forward due to initial usability concerns, so it is an open area. Although the Flow Analysis Rules space has fewer examples right now, this is an area where it is important to provide generally usable and understandable components, just like Processors. In other words, too many knobs can be counter-productive long term. For that reason, and for the use of the |
Based on previous comments and a general theme that emerged, I did some refactoring. Now, there is a property I also optimized away the convoluted logic related to compression of only attributes or attributes plus content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @markobean.
The property naming looks better, and the removing the dependency on nifi-framework-api
addresses that concern.
However, as mentioned earlier, the fundamental approach of the rule does not seem like something that should be included for ongoing maintenance in the project. Allowing or disallowing load balancing across the board does not provide a useful level of granularity, so as it stands, I'm a -1 on the overall addition at this time.
Perhaps a use case or two can help change your mind @exceptionfactory. I support users that temporarily add load balanced connections to help distribute a rare spike in flowfile load (think of a Split processor that normally has no trouble but will rarely output millions of flowfiles). They load balance a connection to work off a load, then 2 hours later want to remove that load balancing later. A Flow Analysis Rule such as this (in WARN enforcement mode) can ensure that this isn't forgotten and mistakenly left in place (which happens a lot). Another use case is a NiFi system owner has given access to tenants and doesn't want those tenants to use certain load balanced connection configurations, or load balancing at all. Another use case can be the desire to require compression on attributes+content because the flow manager knows that their data is easily compressible and wants to ensure this efficiency is always enabled. With many people modifying a NiFi graph, these Flow Analysis Rules are a great tool for performing some quality checks. |
Thanks for the reply @mosermw, see responses in line:
In this scenario, is there a reason for not having load balancing enabled all the time? Having to turn it on and off seems questionable, but perhaps there is some other driving reason. In this scenario, is the desire that load balancing is never enabled anywhere in the flow?
Disallowing all load balancing could be a reason for a streamlined version of this rule, but then the configuration properties could be simplified. Allowing only specific types of load balancing across all flow configurations seems to broad of a limitation, which is part of the fundamental concern.
Would the current approach support this use case? The properties allow compression, but would not require it if I am following. Also, this would require the same compression setting for all load balanced connections.
I agree with you that performing quality checks is a value, the current problem is that the rule implementation doesn't provide enough fine-grained control. This could arguably be a limitation of the current framework implementation for evaluating all rules across all process groups, but that's where things stand. |
I agree with use cases that @mwmoser presented. @exceptionfactory you are correct that the submitted implementation and the compression use case outlined is will not work quite the way we'd like for that use case. For it to work, there would have to be a I could add an additional state if this use case is strongly desired. But I won't spend the cycles if it is still going to receive a "-1". The original use case - which is still a powerful one that has been requested for years - is to use this rule as a mechanism to locate all load balanced connections. Currently, there is no way to do that short of manually parsing the flow.json.gz file. For this reason alone, I find this Flow Analysis Rule a significant improvement. For this reason alone, I find it unreasonable to reject even if not yet a perfect solution (with many of the arguments against being within framework, not the rule.) |
Yes, it doesn't sound like making further changes at the point would be worth pursuing.
Thanks for focusing the goal of the current rule. The purpose of Flow Analysis Rules should not be finding component configuration properties, but finding settings that are not aligned with flow design requirements or recommendations. As an extension point, the Rule API can be used for any number of things, so as a custom implementation, this rule may meet that need. However, as a way to fill a gap in functionality, this approach seems to go outside the general purpose of Flow Analysis Rules. For these reasons, it does not appear to be a good fit for inclusion in the standard set of rules. |
Is there an open Jira issue for finding load-balanced connections? That would be a helpful point of reference and description for the desired capability. |
I did not find a Jira issue, so I created a new one: |
Summary
NIFI-14204
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000
NIFI-00000
Pull Request Formatting
main
branchVerification
Added unit tests to verify each of the load balance strategies, compression options and valid/invalid configuration of the Flow Analysis Rule.
Installed in a running NiFi and performed additional testing of various combinations of configurable options in a running flow.
Build
mvn clean install -P contrib-check
Licensing
LICENSE
andNOTICE
filesDocumentation