Long time, no SIMD #Haskell
JK, the naivest stupidest binary tree blows fancy 4-way SIMDed BVH out of the water.
Actually, it may be slower by itself, but multicore apparently destroys wide instruction performance. At the same time cache^W completely oblivious scalar traversal is happy to run on all the capabilities available.
High ceremony 4-wide or primitive 10-/20-/whatever-wide?
@dpwiz very nice! How long did the render take?
@reidrac 1.5h at 4k resolution, then a few manual passes to remove noise
@dpwiz very cool!
A bit fixed and pampered up, rawr.