@regehr

I'm not sure if that's anyhow fast, but you could perform the comparison (ending with an all-ones byte where it succeeded and all-zeroes elsewhere), AND it with a vector that has (2^i)-i in i-th position, OR it up, add 1, and use any of the scalar ways of figuring out which power of two that is.

(Alternatively, have the second vector have 2^i in i-th position, AND it with first, add them up and use a scalar way to figure out which is the highest bit set.

@robryk interesting. I'll float a scheme like this to my class and see if I can nerdsnipe some of them :).

Follow

@regehr

My main worry would be whether there isn't something tricky around horizontal operations (e.g. that ones that cross some boundary are much slower, or that only a few of them are passably fast).

@robryk but excited to get my class working on this sort of thing, we'll see who creates fast code. I don't want to present them with a finished solution, that's boring!

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.