Aren't large language models like ChatGPT or stable diffusion a form of data augmentation, a la Monte Carlo? In other words, identify the bounds for each dimension of the N-dimensional space of the data, and then sample within the bounds.

I can't understand the source of the claim that these systems can do more than they have been trained with. Yes, there's a combinatorial explosion leading to a diversity of results–that's expected. Am I missing something fundamental?

#ChatGPT #StableDiffusion

Follow

@albertcardona

I think the point is that the space is really really interesting. It has in-bounds points that nobody has sampled, yet which are interesting to people.

The trick is that most of the points in, say, the space of all reasonably grammatical sentences are completely nonsensical ("The car paints a loud idea."). However, there is am interesting manifold within that space along which human discourse lies, and being able to pick points from (close to) that manifold is often really interesting.

However, the manifold is so convoluted that prior to ChatGPT-type approaches, no machine learning method got anywhere close. Now, shockingly, we actually get points on or close to that manifold, and you can ask for them by giving text prompts that form a human-language context.

So in one sense they can't do more than what they've been trained with. But in another sense they can, because they've learned the shape of a super-interesting manifold and you can ask for it to pick parts of the manifold that nobody has ever produced before.

I don't think either extrapolation or interpolation is a good way to think about how this works. These intuitions are generally formed in extremely low-dimensional spaces (e.g. R x R), and those intuitions just don't translate to how ultra-high-dimensional spaces (with very high dimensional yet vastly vastly reduced dimensionality manifolds).

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.