Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the implementation of Lengauer-Tarjan uses std::deque for a bucket? #383

Open
samolisov opened this issue Sep 4, 2024 · 3 comments
Open
Assignees

Comments

@samolisov
Copy link

samolisov commented Sep 4, 2024

I was going through the implementation of the Lengauer-Tarjan algorithm in BGL's domibator_tree.hpp and have found that there is a vector of std::deque to store buckets: vertices with the given semidominator.

std::vector< std::deque< Vertex > > buckets_;

I read the original paper as well as some explanations carefully and found nowhere that the order of procesding buckets[parent[w]] makes any sense. Moreover, I see the deque is used there just as a container, not as a backend for the std::queue adapter: the code pushes vertices at the back of the deque, walks through the deque with a corresponding iterators and then clears it. In my understanding, std::vector<std::vector<>> might be used there.

Could you tell me why the deque is used to store a bucket in the code? My idea is to eliminate an overhead for the vector reallocation during the repeatedly performed push_back() and clear() operations, so this is a technical thing only and has no impact on the correctness of the algorithm.

Thank you very much.

@jeremy-murphy
Copy link
Contributor

Yeah, that bucket type should be vector, not deque. In the early 2000s, there was a bit of a misguided preference for deque. Can't explain it otherwise.

Other performance improvements to consider:

  • Changing the type of buckets_ to boost::unordered_flat_map<Vertex, std::vector<Vertex>>, and instead of clearing the bucket, remove it from the hash map.
  • std::vector< Vertex > semi_, ancestor_, samedom_, best_; these are all the same size (number of vertices in g) and allocated separately, however they are accessed in the same pattern, e.g.:
            put(semiMap_, n, s);

            // 2. Calculation of n's dominator is deferred until
            // the path from s to n has been linked into the forest
            get(bucketMap_, s).push_back(n);
            get(ancestorMap_, n) = p;
            get(bestMap_, n) = n;

so changing from a "struct of vectors" to a "vector of structs" would probably improve cache efficiency. Not sure about samedom_, it doesn't seem to fit the pattern.

I recommend using Google's benchmark library to get your results. Please include a graph of the comparison in any pull request. Also, please don't do all the performance changes together in one PR -- better to do it one at a time so as not to hurt my brain. Thanks. :)

@jeremy-murphy jeremy-murphy self-assigned this Sep 9, 2024
@samolisov
Copy link
Author

@jeremy-murphy Thank you very much for the answer and a lot of information and suggestion. I believe developing benchmarks with the google benchmark project ismay itself be a good first issue for one who would like to join boost development. They may be started from a set of "real" CFGs from the boost sources or some other applications. LLVM or even Bolt may help a lot to collect the CFGs. Then the benchmarks may be added to the test directory as google abseil does it for its benchmarks and run as a part of the test suite to allow anyone reproduce the results. After this we may change the implementation of the algorithm to pick the correct containers. This is my vision of this task.

@jeremy-murphy
Copy link
Contributor

That sounds essentially right. I wouldn't trouble yourself to get 'real' CFGs from LLVM for example, it should be sufficient to generate synthetic ones? If we add Google benchmark tests to the library, they will be optional, as we don't want to make that library a required dependency.
Start small, commit early, make a draft pull request as soon as you have something you want to discuss!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants