I think we've maybe figured out the #fingers thing.
The current AI tools ( #midjourney and all ) can deal relatively accurately with quantities of one or two, occasionally three.
Most things that we care about in images are either one or two (heads, eyes, wings, headlights), or a larger number that we don't care about being accurate (tree limbs, feathers, grass).
The commonly-appearing things that we care deeply about the number of, and that come in numbers greater than two, are... fingers.
The network can deal with (almost always) producing one nose and two eyes and two legs, etc. If we ask it for a dozen eggs, it might do eight, or fifteen, but that's fine, at least it doesn't look horrifying.
But six or seven fingers? Ewwww! It immediately looks horribly wrong.
(And when it does three arms or legs, because two is also a little challenging, that's also ewww.)
So there ya go, simple explanation... :)
QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.