-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boolean Operations take a long time #68
Comments
I wouldn't say I've really started looking at it yet, but I did the easy thing, I turned on
|
A quick look through a profiler suggests that truck/truck-geometry/src/nurbs/knot_vec.rs Line 213 in 86dedcd
I imagine some refactoring could introduce a variant of |
The pattern seems to look like this:
So an easy win for at least a 2x (but probably more due to cache locality) would be to refactor presearch/subs call path to avoid allocating, or avoid allocating more than once-ish. |
I'm fairly sure we could figure out how to turn this: truck/truck-geometry/src/nurbs/knot_vec.rs Lines 195 to 231 in 86dedcd
Into a method that returns an iterator over the computed basis functions rather than a |
@twitchyliquid64 I more or less got to the same conclusions as you. I created this repo to track my findings. It contains perf runs, flamegraps, hyperfine, dhat profiles, etc. The optimization where I got the most benefit was from adding SmallVec as a dependency (any Vec with stack allocation would work) and turning this: pub struct KnotVec(Vec<f64>); Into this: pub type KnotVecInner = SmallVec<[f64; 16]>;
pub struct KnotVec(KnotVecInner); This was just a quick and dirty test, choosing 16x f64s (2 cache lines) because they gave the lowest runtime length. I have just pushed these changes to my repo which uses the modified truck version as patched dependency. The example code to benchmark is the one @MattFerraro provided. I also added full optimizations including hand-picked target-features for my specific CPU to get a before/after that the compiler cannot optimize more, so if anyone wants to try, those might need to be changed. I also saw that the tessellation code was very hot and I added a micro-optimization to spade, which is to create vectors with I want to test this with PGO, to see if it makes a sizeable difference for those super hot callsites. I'll setup something whenever I have more time for it. AFAIK, the CADmium usecase would be running on WASM, so compiler optimizations and other techniques might not be able to be applied there. Lastly, if anyone wants to check out my repo, be sure to set up Git LFS if you want my perf.data files, as those are too big for normal GitHub upload (at least the ones when running in debug mode. |
Here is an example script:
This takes 13 seconds of compute time on my M1 macbook air. Is it possible to cut the compute time down by a factor of 10, or even 100?
For anyone who might want to dive in a test things, here is a simpler example focusing just on OR, and it takes 15 seconds to run on my laptop:
This example is slower because the third parameter to
or()
is the tolerance to mesh to. The run time is strongly dependent on that tolerance. If I use a tolerance of0.9
it runs in just 4 seconds. But if I use a tolerances of1.0
it fails to solve and we get a panic instead.The text was updated successfully, but these errors were encountered: