Port recent GPU stroke expansion changes in `flatten` to CPU version #415

armansito · 2023-11-22T21:46:45Z

This replicates the GPU-side stroke expansion changes that were recently made to the flatten stage to its CPU version.

Note on buffer sizes:

The longpathdash (round caps) and mmark scenes assert (due to out-of-bounds buffer access) if zoomed in (the former crashes immediately). The asserts can occur in the path_count and flatten stages. The root cause is that the segments, seg_counts, and lines buffers are not large enough to accommodate the required number of line segments. When run on the GPU, this manifests itself as artifacts rather than a crash.

This PR does not adjust the bump buffer sizes in order to avoid hiding the problem. I would like to fix this as part of #366. Unfortunately, even an accurate buffer size estimate will require more memory than what's permitted by the WebGPU standard (the default limit for maxStorageBufferBindingSize is 128 MiB and implementations aren't guaranteed to support a higher value). We have been considering culling lines that fall outside the viewport which can alleviate the memory usage in zoomed-in scenarios.

This is needed to unpack the miter limit parameter in the CPU test shader version of GPU stroke flattening.

Ported the recent GPU-side stroke expansion changes to the CPU version of the `flatten` stage.

DJMcNab · 2023-11-23T08:45:23Z

crates/encoding/src/math.rs

+    fn test_f16_to_f32() {
+        const EPS: f32 = 0.001;
+        let input: f32 = std::f32::consts::PI;
+        assert!((input - f16_to_f32(f32_to_f16(input))).abs() < EPS);


It would be nice to have:

an exported bit pattern of a 16 bit float, which we check gets the right number (i.e. test that both directions aren't wrong in the same way)

test that values in the extreme ranges (i.e. values of the order 60_000 and 1/60_000 ± a factor of 2, if required - and 0) are appropriately roundtripped

special values (i.e. NaN, infinity) roundtrip as expected. This is less important, as we don't really want to see them - maybe we should warn! in those cases?

Done. I added tests for max and min value roundtrip, nan/inf roundtrip, and an explicit conversion. Perhaps test_f16_to_f32_roundtrip should become a proptest but I think this is good enough for now.

DJMcNab

Ran this locally, and it worked great, thanks.

I do unfortunately see different behaviour in tricky_strokes with use-cpu enabled and disabled - the "bottom right" one doesn't change when zooming out when using the CPU version.

A few questions, which I believe are unlikely to be defects.

I can only really do a cursory scan of the code with my current understanding, but that hasn't brought up anything blocking

crates/encoding/src/math.rs

DJMcNab · 2023-12-04T18:35:59Z

shader/flatten.wgsl

@@ -580,7 +580,6 @@ fn output_two_lines_with_transform(

 struct NeighboringSegment {
    do_join: bool,
-    p0: vec2f,


It would be nice if we could get warned about this being never read, but that's wishful thinking.

Although...

**suspense**

DJMcNab · 2023-12-04T20:16:05Z

src/cpu_shader/flatten.rs

+    let n_lines = if theta <= ROBUST_EPSILON {
+        MAX_LINES
+    } else {
+        MAX_LINES.min((std::f32::consts::TAU / theta).ceil() as u32)


Is it likely that most of this maximum will be used on paths which are off-screen? (or has some culling already occured?)
Having done some (probably incorrect) translation of this code, it seems to me that the radius has to be much greater than 10_000 pixels¹.

Footnotes

I tried this in two different ways, and got two different numbers, so I don't fully trust my maths here ↩

Is it likely that most of this maximum will be used on paths which are off-screen?

Yes, the really high line segment counts occur at really high zoom levels (for example if you zoom in directly to a stroke's cap). There is actually currently no culling and it's highly likely to wastefully generate off-screen line segments at high zoom levels. longpathdash with round caps is a good example of this, where zooming in actually exceeds the memory budget for LineSoup.

We've been talking about implementing at the very least viewport culling next year to handle the zoomed-in cases.

DJMcNab · 2023-12-04T20:17:22Z

src/cpu_shader/flatten.rs

+        MAX_LINES.min((std::f32::consts::TAU / theta).ceil() as u32)
+    };
+
+    let th = angle / (n_lines as f32);


Why does the number of lines depend only on the radius, rather than including the angle being swept over?

I would naïvely expect th to be the same for all circles with the same radius, as that should make equal length lines.
It feels quite likely that I'm missing something obvious here

I also have this concern, and wonder if it might have something to do with the high observed memory usage.

Ah yes, there is a bug here and thank you for taking a close look and catching that. n_lines is the number for a full circle. This is directly transcribed from the GPU flatten code where I was getting the count for a full circle and fitting that into any sweep angle. I also noticed that the derivation of theta is also not quite right. Given a radius $r$ and the length $l$ of the line segment from the center of the circle to the middle of the chord (i.e. a flattened line segment) given by $l = r - tol$, the derivation should look like:

$$ tol = r - r \times cos(\frac{\theta}{2}) $$

$$ \theta = 2 \times arccos(1 - \frac{tol}{r}) $$

And n_lines = (angle / theta).ceil(). We still need to guard against a division by zero / over-subdividing. I went ahead and picked an arbitrary minimum angle limit of 0.0001 radians but this could probably get fine tuned.

raphlinus

I have concerns about the number of subdivisions in the arc flattening logic. I didn't wrap my head around it fully - perhaps it's right and there's something I'm missing, in which case I'd happily accept an argument.

Other than that, this looks like a good foundation for the Euler spiral flattening, and I'm looking forward to getting that fully wired.

raphlinus · 2023-12-05T20:09:45Z

src/cpu_shader/flatten.rs

+    let tol: f32 = 0.5;
+    let radius = tol.max((p0 - transform.apply(center)).length());
+    let x = 1. - tol / radius;
+    let theta = (2. * x * x - 1.).clamp(-1., 1.).acos();


This is the same as sqrt(8 tol / radius) for small tol. I wonder if the extra math is worth it. I haven't investigated in detail.

The new expression should be correct now and is simpler.

raphlinus · 2023-12-05T20:10:53Z

src/cpu_shader/flatten.rs

+        MAX_LINES.min((std::f32::consts::TAU / theta).ceil() as u32)
+    };
+
+    let th = angle / (n_lines as f32);


I also have this concern, and wonder if it might have something to do with the high observed memory usage.

raphlinus · 2023-12-05T20:52:37Z

src/cpu_shader/flatten.rs

+) {
+    let mut p0 = transform.apply(begin);
+    let mut r = begin - center;
+    let tol: f32 = 0.5;


This seems a little loose to me, I've generally been using 0.25. I'm also concerned that it doesn't match the flattening tolerance for curves in general.

This might be a good time for a heads up: I am not confident that the tolerance value in the existing code is accurately calibrated against the actual Fréchet distance between flattened and source curve; I may have let a constant factor of error enter. But the current work I'm doing does calibrate that, so when I land that I will be systematic.

With the new derivation, 0.25 leads to visible line segments. 0.1 looks decent enough to me. I'm worried that anything lower will blow up the line count but you should probably take a look and make a judgement as to whether this tolerance is visually satisfactory.

armansito · 2023-12-06T02:21:49Z

PTAL. I think the arc flattening logic should be correct now barring any adjustments to MIN_THETA and the error tolerance. The current values still exceed the LineSoup and SegmentCount buffer sizes with the mmark and longpathdash (round) scenes.

raphlinus

Right, this makes more sense. I expect to take a closer look (hopefully visualizing the flattening output) as I adapt the Euler stuff, so I didn't go through with a fine-tooth comb, but this definitely gets us to a better place. Thanks!

raphlinus · 2023-12-06T19:23:39Z

src/cpu_shader/flatten.rs

-    let th = angle / (n_lines as f32);
-    let c = th.cos();
-    let s = th.sin();
+    let c = theta.cos();


You can write let (s, c) = theta.sin_cos() but I'm not sure it's a significant improvement, and in any case the function is not available in WGSL.

raphlinus · 2023-12-06T19:26:22Z

src/cpu_shader/flatten.rs

@@ -126,6 +126,12 @@ fn write_line(
    bbox: &mut IntBbox,
    lines: &mut [LineSoup],
 ) {
+    assert!(
+        !p0.is_nan() && !p1.is_nan(),
+        "wrote line segment with NaN: p0: {:?}, p1: {:?}",


You can say {p0:?} here, as of Rust 1.58.

armansito · 2023-12-06T22:49:29Z

I do unfortunately see different behaviour in tricky_strokes with use-cpu enabled and disabled - the "bottom right" one doesn't change when zooming out when using the CPU version.

@DJMcNab I'm not sure why there is a discrepancy but I'll investigate this further. That said the bottom rows of strokes are currently not at all handled correctly - that's awaiting @raphlinus 's Euler Spiral work (early prototype started here: https://github.com/linebender/vello/tree/euler-spiral-strokes).

armansito requested review from raphlinus and dfrg November 22, 2023 21:46

armansito added 3 commits November 22, 2023 13:50

[encoding] Add a f16_to_f32 utility

8e79c48

This is needed to unpack the miter limit parameter in the CPU test shader version of GPU stroke flattening.

[cpu_shaders][flatten] Incorporate stroke expansion changes

1e30be1

Ported the recent GPU-side stroke expansion changes to the CPU version of the `flatten` stage.

[cpu_shaders][flatten] cargo fmt & clippy warnings

48521ba

armansito force-pushed the stroke-flatten-cpu branch from e5c58a3 to 48521ba Compare November 22, 2023 21:52

armansito requested a review from DJMcNab November 22, 2023 22:54

DJMcNab reviewed Nov 23, 2023

View reviewed changes

armansito added 2 commits December 1, 2023 13:21

[encoding][math] Add more tests for f16 conversion

3b952c0

fix clippy lints

097006d

DJMcNab approved these changes Dec 4, 2023

View reviewed changes

raphlinus reviewed Dec 5, 2023

View reviewed changes

Address review comments; fix arc flattening logic

fe75059

raphlinus approved these changes Dec 6, 2023

View reviewed changes

armansito merged commit 3f24d8c into main Dec 6, 2023
4 checks passed

armansito deleted the stroke-flatten-cpu branch December 6, 2023 22:50

raphlinus mentioned this pull request Mar 13, 2024

Checkpoint of Euler spiral based stroke expansion #496

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port recent GPU stroke expansion changes in `flatten` to CPU version #415

Port recent GPU stroke expansion changes in `flatten` to CPU version #415

armansito commented Nov 22, 2023

DJMcNab Nov 23, 2023

armansito Dec 1, 2023 •

edited

Loading

DJMcNab left a comment

DJMcNab Dec 4, 2023

armansito Dec 6, 2023

DJMcNab Dec 4, 2023

armansito Dec 6, 2023

DJMcNab Dec 4, 2023

raphlinus Dec 5, 2023

armansito Dec 6, 2023

raphlinus left a comment

raphlinus Dec 5, 2023

armansito Dec 6, 2023

raphlinus Dec 5, 2023

raphlinus Dec 5, 2023

armansito Dec 6, 2023

armansito commented Dec 6, 2023

raphlinus left a comment

raphlinus Dec 6, 2023

raphlinus Dec 6, 2023

armansito commented Dec 6, 2023

Port recent GPU stroke expansion changes in flatten to CPU version #415

Port recent GPU stroke expansion changes in flatten to CPU version #415

Conversation

armansito commented Nov 22, 2023

Note on buffer sizes:

Choose a reason for hiding this comment

armansito Dec 1, 2023 • edited Loading

Choose a reason for hiding this comment

DJMcNab left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raphlinus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

armansito commented Dec 6, 2023

raphlinus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

armansito commented Dec 6, 2023

Port recent GPU stroke expansion changes in `flatten` to CPU version #415

Port recent GPU stroke expansion changes in `flatten` to CPU version #415

armansito Dec 1, 2023 •

edited

Loading