in Brainfuck, loops that look like [>>>] are common. these are memory scans: this one sweeps memory in the positive direction, stopping when a zero-valued cell is found at a multiple of 3 from the starting position.
if I want this to go fast on Neon, I can load 16 bytes into a vector register, test them all for equality with zero, and then what? I guess add up the resulting vector and check if the sum is zero, and if it isn't, scan the elements one by one? is there a faster way to do this?