-
Notifications
You must be signed in to change notification settings - Fork 924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KYUUBI #6754][AUTHZ] Improve the performance of Ranger access requests deduplication #6758
Conversation
@@ -34,23 +35,22 @@ case class RuleAuthorization(spark: SparkSession) extends Authorization(spark) { | |||
val auditHandler = new SparkRangerAuditHandler | |||
val ugi = getAuthzUgi(spark.sparkContext) | |||
val (inputs, outputs, opType) = PrivilegesBuilder.build(plan, spark) | |||
val requests = new ArrayBuffer[AccessRequest]() | |||
var requests = new HashMap[(AccessResource, AccessType), AccessRequest]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why immutable.HashMap
instead of mutable.HashMap
is used here?
BTW, Scala 2.13 changed the alias of HashMap
from mutable.HashMap
to immutable.HashMap
, to make the code clear, please write new immutable.HashMap
or new mutable.HashMap
inline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and please leave a few comments here to explain the background
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and, was immutable.Map.newBuilder
considered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Change to
mutable.HashMap
- Added some comments
- I'm not sure if
immutable.Map.newBuilder
is better thanmutable.HashMap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using mutable.LinkedHashSet[AccessRequest]
as the type for requests
?
- it maintains the original adding order
- it guarantees the element's uniqueness reliing on the
hashCode
- good for element looping as fast as the array
And make sure related methods of AccessRequest properly implemented. Add some unit test for assuring it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The uniqueness is guaranteed by hashCode
, which could be told from its name HashSet
of LinkedHashSet
. Not much overhead cost for maintaining the order inside, just adding a reference to a linked list. And the linked list perform well in looping elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean mutable.LinkedHashSet[AccessRequest]
may be slower than HashSet + ArrayBuffer in our case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the unique key is NOT the AccessRequest
itself but part of properties, the combination of ArrayBuffer with a HashSet looks fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean
mutable.LinkedHashSet[AccessRequest]
may be slower than HashSet + ArrayBuffer in our case.
No, it wont be slowed down, as LinkedHashSet maintaining both HashSet and Linked list inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the unique key is NOT the
AccessRequest
itself but part of properties, the combination of ArrayBuffer with a HashSet looks fine to me.
Noticed. But the HashSet
still reily on the same logic with hashCode
inside. And no proper customed method of either AccessRequest
or its parent class.
} | ||
|
||
// scalastyle:on | ||
test("KYUUBI #6754: improve the performance of ranger access requests") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
benchmark cases are not expected to run in regular test workflow, I remember we ported Spark benchmark framework, can we write it in that style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added benchmark, thanks
Co-authored-by: Cheng Pan <[email protected]>
Co-authored-by: Cheng Pan <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6758 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 684 684
Lines 42282 42289 +7
Branches 5767 5769 +2
======================================
- Misses 42282 42289 +7 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, lgtm, only a few comments
...ark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/ranger/RuleAuthorization.scala
Outdated
Show resolved
Hide resolved
...park/kyuubi-spark-authz/src/test/scala/org/apache/spark/sql/RuleAuthorizationBenchmark.scala
Outdated
Show resolved
Hide resolved
Co-authored-by: Cheng Pan <[email protected]>
…spark/sql/RuleAuthorizationBenchmark.scala Co-authored-by: Cheng Pan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the contribution.
Thanks for your contribution! Merged to master (1.10.0) . |
🔍 Description
Issue References 🔗
This pull request fixes #6754
Describe Your Solution 🔧
Right now in RuleAuthorization we use an ArrayBuffer to collect access requests, which is very slow because each new PrivilegeObject needs to be compared with all access requests.
Types of changes 🔖
Test Plan 🧪
Behavior Without This Pull Request ⚰️
Add benchmark
Before
Behavior With This Pull Request 🎉
After
Related Unit Tests
Exists UT
Checklist 📝
Be nice. Be informative.