Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

May not be in scope, but fast detection of set intersection #4

Open
joinr opened this issue Jan 29, 2020 · 0 comments
Open

May not be in scope, but fast detection of set intersection #4

joinr opened this issue Jan 29, 2020 · 0 comments

Comments

@joinr
Copy link

joinr commented Jan 29, 2020

I ran into a use case for a work product that needed to quickly test membership between 2 sets to
see if "any" member existed in both, and return the first intersecting member as fast as possible.

A naive way one might reach for is

(defn slow-some-member [s1 s2]
  (first (clojure.set/intersection s1 s2)))

Original implementation was

(defn some-member [s1 s2]  
  (let [[l r]  (if (< (count s1) (count s2)) [s1 s2]
                   [s2 s1])]
    (reduce (fn [acc x]
              (if (r x)
                (reduced x)
                acc)) nil l)))

Which of course has some inefficiencies due to destructuring and slow function calls through clojure.lang.RT.

Revised version assumes sets as input,

(defn some-member2 [^clojure.lang.APersistentSet s1
                    ^clojure.lang.APersistentSet s2]
  (let [l  (if (< (.count s1) (.count s2)) s1
               s2)
        r   (if (identical? l s1) s2 s1)]
    (reduce (fn [acc x]
              (if (r x)
                (reduced x)
                acc)) nil l)))
user> (let [s1 #{:a :b } s2 #{:g :a}] (c/quick-bench (slow-some-member s1 s2)))
Evaluation count : 975378 in 6 samples of 162563 calls.
             Execution time mean : 612.287581 ns
    Execution time std-deviation : 3.310927 ns
   Execution time lower quantile : 608.261646 ns ( 2.5%)
   Execution time upper quantile : 615.629475 ns (97.5%)
                   Overhead used : 2.233057 ns
nil

user> (let [s1 #{:a :b } s2 #{:g :a}] (c/quick-bench (some-member s1 s2)))
Evaluation count : 2004582 in 6 samples of 334097 calls.
             Execution time mean : 300.056678 ns
    Execution time std-deviation : 2.946006 ns
   Execution time lower quantile : 297.321682 ns ( 2.5%)
   Execution time upper quantile : 303.695844 ns (97.5%)
                   Overhead used : 2.233057 ns
nil
user> (let [s1 #{:a :b } s2 #{:g :a}] (c/quick-bench (some-member2 s1 s2)))
Evaluation count : 3298134 in 6 samples of 549689 calls.
             Execution time mean : 181.087016 ns
    Execution time std-deviation : 0.816367 ns
   Execution time lower quantile : 180.408265 ns ( 2.5%)
   Execution time upper quantile : 182.115366 ns (97.5%)
                   Overhead used : 2.233057 ns
nil

Gains are variable, and I haven't studied them for a wide range of set combinations. I'm interested in any faster means of doing this, although I just realized that my specific use case is amenable to memoization, which improves runtime (via memo-2) like 10x over my original "optimized" some-member.

user> (let [f (memo-2 some-member2) s1 #{:a :b } s2 #{:g :a}] (c/quick-bench (f s1 s2)))
Evaluation count : 19762416 in 6 samples of 3293736 calls.
             Execution time mean : 28.191368 ns
    Execution time std-deviation : 0.181916 ns
   Execution time lower quantile : 28.035774 ns ( 2.5%)
   Execution time upper quantile : 28.427699 ns (97.5%)
                   Overhead used : 2.233057 ns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant