Hmmm, a little question to people who are familiar with atomics and/or lockfree programming: does the ABA-proof property of LL/SC ever get used in practice?
Asking because most of the time I see LL/SC simply being used to conjure a CAS and at that point it is just as vulnerable to the ABA issue as readymade CAS instructions, no?

#AskFedi, I guess?

Follow

@koakuma

I was under the impression that ll/sc in hardware is terribly rare, so everyone makes do with CAS. They then use of the tricks to prevent ABA in CAS-based setups (either counters we trust to never wrap around, or pointers to objects that the actor who does the CAS keeps alive[1]). If we had ll/sc those tricks would be unnecessary.

So, I think that people don't use that property of ll/sc because they don't use ll/sc, because it's very rarely available.

[1]
roughly:
- read ptr from X
- increment refcount on *ptr
- read X again, verify it's still equal to ptr (if not: decrement refcount, start from scratch)
- ...
- CAS X from ptr to something else
- decrement refcount on *ptr (and potentially handle the case when it decrements to 0)

This guarantees ABA-freeness of the CAS if each potential target of the pointer in X can become it only once in its lifetime (which is pretty common in structures such as queues, stacks, approximate queues, ...).

This is a reasonably common pattern in lockfree data structures that need to do their own memory management (i.e. for environments without a garbage collector) and that don't want to use thread IDs (if they can, refcounts often get replaced with hazard pointers; that said, I've never seen hazard pointers used outside of theory).

@koakuma

By "not seeing hazard pointers used outside of theory" I meant that in places which would use them I'd see RCU used instead (with the thing that deallocates -- which might be a thing to the side of everything else -- waiting on the RCU "lock").

@robryk All the production RISCs, with the exception of SPARC and Arm v8.1 LSE uses LL/SC-style atomics, so I imagine at least assembly folks would be familiar with using it, no?
Though I do admit that it doesn't lend itself to a nice non-assembly interface since the rules of what can be put inside the block is very very ISA-specific...

I guess it's just a matter of CAS being a more convenient high-level interface to expose, that everyone make do with CAS on tagged data, then?

@koakuma

I think it's more of a case of there being very popular architectures without LL/SC, so people consider a paper that does something with just CAS to be stronger than one that uses LL/SC (because as you've described it's trivial to make CAS out of LL/SC but not v.v.). I don't recall off the top of my head of any data structure where we'd have a significantly better result (either wait-free or lock-free, either on memory or time complexity) if LL/SC was available.

That said, most of my knowledge comes from the theoretical side, so: (a) I might be missing an area where this is actually used a lot (b) the theoretical side uses IMO very questionable measures of complexity[1] so there might be cases where LL/SC allows for some great savings that they ignore.

[1] I haven't seen (nor did I manage to construct one) a measure of complexity that would count how many nontrivial cache coherency protocol invocations we'll need to wait for. (The problem with constructing one is iirc a combination of (a) we want to count depth of the dependency graph rather than total count (b) it's unclear what kind of quantification over states of the world before the operation we want: worst case is *not* sensible, because then we'll assume that every cache line is exclusive somewhere else).

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.