I think the point is that the space is really really interesting. It has in-bounds points that nobody has sampled, yet which are interesting to people.
The trick is that most of the points in, say, the space of all reasonably grammatical sentences are completely nonsensical ("The car paints a loud idea."). However, there is am interesting manifold within that space along which human discourse lies, and being able to pick points from (close to) that manifold is often really interesting.
However, the manifold is so convoluted that prior to ChatGPT-type approaches, no machine learning method got anywhere close. Now, shockingly, we actually get points on or close to that manifold, and you can ask for them by giving text prompts that form a human-language context.
So in one sense they can't do more than what they've been trained with. But in another sense they can, because they've learned the shape of a super-interesting manifold and you can ask for it to pick parts of the manifold that nobody has ever produced before.
I don't think either extrapolation or interpolation is a good way to think about how this works. These intuitions are generally formed in extremely low-dimensional spaces (e.g. R x R), and those intuitions just don't translate to how ultra-high-dimensional spaces (with very high dimensional yet vastly vastly reduced dimensionality manifolds).