**Raph Levien** @raph@mastodon.online · Jan 11, 2023, 04:55

**Raph Levien** @raph@mastodon.online · Jan 11, 2023, 04:55

Raph Levien @raph@mastodon.online

Jan 11, 2023, 04:55

Satisfying work for the day. One of the items in the Vello pipeline where profiling revealed performance could be improved was computing the bounding boxes of paths. I was using atomic min/max for each path segment to compute bbox union, and that was expensive (1.3ms for paris-30k on M1 Max).

I figured out a way to do it with monoids instead of atomics, and it's now 400µs. The trick is "segmented reduction" which works amazingly well.

I might blog this, but not sure yet; other things to do.

**R. A. Dehi** @radehi@qoto.org · Jan 11, 2023, 05:02

**R. A. Dehi** @radehi@qoto.org · Jan 11, 2023, 05:02

Jan 11, 2023, 05:02

R. A. Dehi @radehi@qoto.org

@raph This is doing the reduction with min/max in a minimal-depth tree instead of starting from one end?

Doesn't sound too hairy, even though you called the project Vello.

**Raph Levien** @raph@mastodon.online · Jan 11, 2023, 05:07

**Raph Levien** @raph@mastodon.online · Jan 11, 2023, 05:07

Jan 11, 2023, 05:07

Raph Levien @raph@mastodon.online

@radehi This is not the tree stuff (which is hairy). Here we have a sequence of paths, and each path is a sequence of segments. For each path you want the bbox, which is the union of the bboxes of its segments.

The trick is to concatenate all segments into one big vector, then have a "tag stream" (1 byte/seg) that has a bit indicating path end. Compute bbox using a monoid scan, then write when the path end bit is set (stream compaction).

And there's another fun detail that makes the scan fast.

**R. A. Dehi** @radehi@qoto.org · Jan 11, 2023, 05:10

**R. A. Dehi** @radehi@qoto.org · Jan 11, 2023, 05:10

Jan 11, 2023, 05:10

R. A. Dehi @radehi@qoto.org

@raph Hmm, but by itself monoid scan (APL scan, right? prefix sum) will give you maxes or mins from before the previous path end. I guess you need a weird max (or min) that takes the path-end bit into account, so that max(End(3), NotEnd(2)) is NotEnd(2), and then any standard prefix sum algorithm gives you the right answer?

**Raph Levien** @raph@mastodon.online · Jan 11, 2023, 05:13

**Raph Levien** @raph@mastodon.online · Jan 11, 2023, 05:13

Jan 11, 2023, 05:13

Raph Levien @raph@mastodon.online

@radehi That's right, the monoid is not just min/max, but also incorporates the path end bit. That trick is explained in section 1.5 of Blelloch's 1993 "Prefix sums and their applications": https://www.cs.cmu.edu/~guyb/papers/Ble93.pdf

**R. A. Dehi** @radehi@qoto.org · 2023-01-11T05:16:03Z

R. A. Dehi @radehi@qoto.org

@raph Neat, I never read the whole paper but I might have read that part.

Jan 11, 2023, 05:16 · · · ·

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…