Huh, I wonder if this is platform-specific. I'm starting to have suspicions that TLS might be broken in some way. Were it not for your sched-locked observation I'd also suspect disagreement between clang and libc on how to implement different memory orders on that CPU (e.g. on x86_64 there are two ways to implement c++ memory model which are incompatible with one another and basically everyone picks one of them).
@robryk Unfortunately I am not on a supported platform now (sparc64) so no tsan for me...