Long time, no SIMD

JK, the naivest stupidest binary tree blows fancy 4-way SIMDed BVH out of the water.

Actually, it may be slower by itself, but multicore apparently destroys wide instruction performance. At the same time cache^W completely oblivious scalar traversal is happy to run on all the capabilities available.

High ceremony 4-wide or primitive 10-/20-/whatever-wide? :blobthonkang:

@reidrac 1.5h at 4k resolution, then a few manual passes to remove noise

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.