@berniethewordsmith building on what other replies said, think about it this way:
When you need to put a couple of shapes on the screen that is thousands of individual pixels that need to be updated all at the same time. GPUs, graphical processing units, are designed so that they can update as many of those pixels all at once. That's why they are specialized for working in parallel, doing a bunch of things at the same time, because all of those different parts of the screen all need to be updated at the same time.
In contrast, if you need to do something like, I don't know, adding up your bank account, a traditional CPU does that as a single process, single calculation, as quickly as possible. It does one thing as fast as possible to get one single number as a result.
The current type of AI happens to need to do a whole lot of things in parallel so it is more like the GPU needing to update all the pieces of the screen at the same time. It's KIND OF like the AI needs to read a thousand books all at the same time as it's training. That's not a perfect analogy, but it's close.