@raggi Totally agree. Google tends to optimize for the common case (I literally knew an engineer there who's mantra *was* "don't optimize for the uncommon case"), and that definitely risks leaving the long tail chopped off.
Ideally, there'd be a central well-supported audio model that one could then extend. But I don't actually know how that would work either sociopolitically (who pays for it?) or technologically (how do I meaningfully extend and customize an ever-changing model with a heavy machine-learning component?).