Boolean Operations take a long time #68

MattFerraro · 2024-05-29T02:34:43Z

Here is an example script:

use truck_meshalgo::tessellation::{MeshableShape, MeshedShape};
use truck_modeling::builder::{rsweep, try_attach_plane, tsweep, vertex};
use truck_modeling::{Point3, Vector3};
use truck_polymesh::{obj, Rad};
use truck_shapeops::{and, or};

fn main() {
    // Make a cube with side length 100
    let origin = vertex(Point3::new(0.0, 0.0, 0.0));
    let x_axis = tsweep(&origin, Vector3::new(100.0, 0.0, 0.0));
    let xy_square = tsweep(&x_axis, Vector3::new(0.0, 100.0, 0.0));
    let cube = tsweep(&xy_square, Vector3::new(0.0, 0.0, 100.0));

    // Save it as an obj file
    let mesh = cube.triangulation(0.01).to_polygon();
    let file = std::fs::File::create("test_cube.obj").unwrap();
    obj::write(&mesh, file).unwrap();

    // Make a cylinder that is centered at (50, 50) so it will interfere with the cube
    let point = vertex(Point3::new(35.0, 50.0, -20.0));
    let circle = rsweep(
        &point,
        Point3::new(50.0, 50.0, -20.0),
        Vector3::new(0.0, 0.0, 1.0),
        Rad(7.0),
    );
    let disk = try_attach_plane(&[circle]).unwrap();
    let cylinder = tsweep(&disk, Vector3::new(0.0, 0.0, 140.0));

    // save the cylinder to a file
    let mesh = cylinder.triangulation(0.01).to_polygon();
    let file = std::fs::File::create("test_cylinder.obj").unwrap();
    obj::write(&mesh, file).unwrap();

    // Now we let's do the boolean operations!

    let and_result = and(&cube, &cylinder, 1.0);
    let mesh = and_result.unwrap().triangulation(0.01).to_polygon();
    let file = std::fs::File::create("test_AND.obj").unwrap();
    obj::write(&mesh, file).unwrap();
    // This results in the cylinder, but truncated. This is the region where the cylinder intersects the cube
    // Aka the region of space which is both inside the cube AND inside the cylinder

    let or_result = or(&cube, &cylinder, 1.0);
    let mesh = or_result.unwrap().triangulation(0.01).to_polygon();
    let file = std::fs::File::create("test_OR.obj").unwrap();
    obj::write(&mesh, file).unwrap();
    // This results in a cube on a stick, aka the union of the cube and the cylinder
    // Aka the region of space which is inside the cube OR inside the cylinder

    let mut not_cylinder = cylinder.clone();
    not_cylinder.not();
    // not_cylinder is a weird thing...it's the entire universe _except_ the cylinder
    let and_not_result = and(&cube, &not_cylinder, 1.0);
    let mesh = and_not_result.unwrap().triangulation(0.01).to_polygon();
    let file = std::fs::File::create("test_AND_NOT.obj").unwrap();
    obj::write(&mesh, file).unwrap();
    // This results in a cube with a hole in it, aka the cube with the cylinder subtracted from it
    // Aka the region of space which is inside the cube AND NOT inside the cylinder
}

This takes 13 seconds of compute time on my M1 macbook air. Is it possible to cut the compute time down by a factor of 10, or even 100?

For anyone who might want to dive in a test things, here is a simpler example focusing just on OR, and it takes 15 seconds to run on my laptop:

...

fn main() {
    let origin = vertex(Point3::new(0.0, 0.0, 0.0));
    let x_axis = tsweep(&origin, Vector3::new(100.0, 0.0, 0.0));
    let xy_square = tsweep(&x_axis, Vector3::new(0.0, 100.0, 0.0));
    let cube = tsweep(&xy_square, Vector3::new(0.0, 0.0, 100.0));

    let point = vertex(Point3::new(104.0, 50.0, -20.0));
    let circle = rsweep(
        &point,
        Point3::new(80.0, 50.0, -20.0),
        Vector3::new(0.0, 0.0, 1.0),
        Rad(7.0),
    );
    let disk = try_attach_plane(&[circle]).unwrap();
    let cylinder = tsweep(&disk, Vector3::new(0.0, 0.0, 140.0));

    let or_result = or(&cube, &cylinder, 0.2);
    let mesh = or_result.unwrap().triangulation(0.01).to_polygon();
    let file = std::fs::File::create("test_OR.obj").unwrap();
    obj::write(&mesh, file).unwrap();
    // This results in a cube on a stick, aka the union of the cube and the cylinder
    // Aka the region of space which is inside the cube OR inside the cylinder
}

This example is slower because the third parameter to or() is the tolerance to mesh to. The run time is strongly dependent on that tolerance. If I use a tolerance of 0.9 it runs in just 4 seconds. But if I use a tolerances of 1.0 it fails to solve and we get a panic instead.

The text was updated successfully, but these errors were encountered:

tringenbach · 2024-05-29T17:19:17Z

I wouldn't say I've really started looking at it yet, but I did the easy thing, I turned on --release, I did get a 10x or so performance boost

cargo run --example performance-issue  29.26s user 0.20s system 286% cpu 10.295 total

cargo run --profile release --example performance-issue  2.60s user 0.17s system 211% cpu 1.311 total

twitchyliquid64 · 2024-10-31T05:43:27Z

A quick look through a profiler suggests that truck_geometry::nurbs::knot_vec::::try_bspline_basis_functions is the majority of the CPU time, and of that, most of it is wasted allocating its vec here:

truck/truck-geometry/src/nurbs/knot_vec.rs

Line 213 in 86dedcd

let mut res = vec![0.0; n];

I imagine some refactoring could introduce a variant of try_bspline_basis_functions which lets you pass in an existing vec to use when its getting called in a hot loop like that. We would also get a big speed up from cache locality as well.

twitchyliquid64 · 2024-10-31T05:59:11Z

The pattern seems to look like this:

Boolean ops means theres a need for computing the intersection, so the algorithm needs to find (u, v) of the intersection(s)
Calls into truck_geotrait::algo::surface::presearch
presearch needs to test parameter values, so it calls subs like a LOT. Really really hot function.
But, deep in the guts of the underlying curve implementation, subs needs to compute and evaluate the basis functions, so it computes the basis functions into a newly-allocated vec for each call.
Adding up the cost of allocation and free, this accounts for roughly 50% of the time in presearch.

So an easy win for at least a 2x (but probably more due to cache locality) would be to refactor presearch/subs call path to avoid allocating, or avoid allocating more than once-ish.

twitchyliquid64 · 2024-10-31T06:05:48Z

I'm fairly sure we could figure out how to turn this:

truck/truck-geometry/src/nurbs/knot_vec.rs

Lines 195 to 231 in 86dedcd

    
           pub fn try_bspline_basis_functions(&self, degree: usize, t: f64) -> Result<Vec<f64>> { 
        
               let n = self.len() - 1; 
        
               if self[0].near(&self[n]) { 
        
                   return Err(Error::ZeroRange); 
        
               } else if n < degree { 
        
                   return Err(Error::TooLargeDegree(n + 1, degree)); 
        
               } 
        
               let idx = { 
        
                   let idx = self 
        
                       .floor(t) 
        
                       .unwrap_or_else(|| self.floor(self[0]).unwrap()); 
        
                   if idx == n { 
        
                       n - self.multiplicity(n) 
        
                   } else { 
        
                       idx 
        
                   } 
        
               }; 
        
               let mut res = vec![0.0; n]; 
        
               res[idx] = 1.0; 
        
               for k in 1..=degree { 
        
                   let base = if idx < k { 0 } else { idx - k }; 
        
                   let delta = self[base + k] - self[base]; 
        
                   let max = if idx + k < n { idx } else { n - k - 1 }; 
        
                   let mut a = inv_or_zero(delta) * (t - self[base]); 
        
                   for i in base..=max { 
        
                       let delta = self[i + k + 1] - self[i + 1]; 
        
                       let b = inv_or_zero(delta) * (self[i + k + 1] - t); 
        
                       res[i] = a * res[i] + b * res[i + 1]; 
        
                       a = 1.0 - b; 
        
                   } 
        
               } 
        
               res.truncate(n - degree); 
        
               Ok(res) 
        
           }

Into a method that returns an iterator over the computed basis functions rather than a Vec.

LtdJorge · 2024-11-08T17:43:03Z

@twitchyliquid64 I more or less got to the same conclusions as you. I created this repo to track my findings. It contains perf runs, flamegraps, hyperfine, dhat profiles, etc.

The optimization where I got the most benefit was from adding SmallVec as a dependency (any Vec with stack allocation would work) and turning this:

pub struct KnotVec(Vec<f64>);

Into this:

pub type KnotVecInner = SmallVec<[f64; 16]>;

pub struct KnotVec(KnotVecInner);

This was just a quick and dirty test, choosing 16x f64s (2 cache lines) because they gave the lowest runtime length. I have just pushed these changes to my repo which uses the modified truck version as patched dependency. The example code to benchmark is the one @MattFerraro provided. I also added full optimizations including hand-picked target-features for my specific CPU to get a before/after that the compiler cannot optimize more, so if anyone wants to try, those might need to be changed.

I also saw that the tessellation code was very hot and I added a micro-optimization to spade, which is to create vectors with Vec::with_capacity(16) instead of with Vec::new(). This is because empty vectors will have to reallocate a couple times to get to 16 items, while my implementation only needs one initial allocation. However, the number 16 here is completely arbitrary, tracking the average number of objects such vectors hold would yield a better number.

I want to test this with PGO, to see if it makes a sizeable difference for those super hot callsites. I'll setup something whenever I have more time for it.

AFAIK, the CADmium usecase would be running on WASM, so compiler optimizations and other techniques might not be able to be applied there.

Lastly, if anyone wants to check out my repo, be sure to set up Git LFS if you want my perf.data files, as those are too big for normal GitHub upload (at least the ones when running in debug mode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boolean Operations take a long time #68

Boolean Operations take a long time #68

MattFerraro commented May 29, 2024 •

edited

Loading

tringenbach commented May 29, 2024

twitchyliquid64 commented Oct 31, 2024

twitchyliquid64 commented Oct 31, 2024 •

edited

Loading

twitchyliquid64 commented Oct 31, 2024

LtdJorge commented Nov 8, 2024 •

edited

Loading

Boolean Operations take a long time #68

Boolean Operations take a long time #68

Comments

MattFerraro commented May 29, 2024 • edited Loading

tringenbach commented May 29, 2024

twitchyliquid64 commented Oct 31, 2024

twitchyliquid64 commented Oct 31, 2024 • edited Loading

twitchyliquid64 commented Oct 31, 2024

LtdJorge commented Nov 8, 2024 • edited Loading

MattFerraro commented May 29, 2024 •

edited

Loading

twitchyliquid64 commented Oct 31, 2024 •

edited

Loading

LtdJorge commented Nov 8, 2024 •

edited

Loading