Long time, no SIMD #Haskell
JK, the naivest stupidest binary tree blows fancy 4-way SIMDed BVH out of the water.
Actually, it may be slower by itself, but multicore apparently destroys wide instruction performance. At the same time cache^W completely oblivious scalar traversal is happy to run on all the capabilities available.
High ceremony 4-wide or primitive 10-/20-/whatever-wide?
@dpwiz very cool!