Long time, no SIMD #Haskell
JK, the naivest stupidest binary tree blows fancy 4-way SIMDed BVH out of the water.
Actually, it may be slower by itself, but multicore apparently *destroys* wide instruction performance. At the same time cache^W completely oblivious scalar traversal is happy to run on all the capabilities available.
High ceremony 4-wide or primitive 10-/20-/whatever-wide?
A bit fixed and pampered up, rawr.
QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.