I was wondering about that myself, so I pulled out an envelope to scribble on.
It is thought that an instance of #ChatGPT runs on one Nvidia A-100 GPU and turns out on the order of 2 tokens per second. The new #OpenAI price of $ 0.002 per 1k-tokens would generate revenue of $ 126 per year. But the GPU draws about 250W of power, which is about 2,200 kwh over that year - let's say at 0.07/kwh that comes to $ 154. With these assumptions you could not even pay for energy, let alone depreciation of the GPU.
So let's try running the calculation backwards. If we assume an ROI of 30%, the cost of a single instance could be $10,000, a depreciation of 5 years and energy cost of $ 120 per year, the model would need to generate revenue of $ 2,800 per year and for that the instance would need to generate about 44 tokens per second. Which is about a factor of 20 more than we thought it could.
These are very crude estimates, and don't take significant economies of scale into account. But yes, in the end you'll need to put more models on a single hardware instance, use cheaper hardware, and speed up the generation. All of that.
Or just eat the loss.
🙂
@boris_steipe @TedUnderwood Wow. I didn't realise that gpt was so big and slow.
@eliocamp @boris_steipe I believe the original GPT-3 required multiple GPUs; if they’ve got it down to one that’s an a achievement
@eliocamp @TedUnderwood@sigmoid.socia
I just re-checked the basic parameters of #ChatGPT generation cost hat I had noted down in December from a tweet by Tom Goldstein ...
https://twitter.com/tomgoldsteincs/status/1600196981955100694
He noted that you can rent an A100 instance for $3.00 per hour on the Azure cloud and he extrapolated a frequency of 0.35 s /per word per card and 8 cards per server. So that's about what I had estimated. Except that renting that to run it yourself would cost about $0.20 per 1k-tokens and OpenAI is now charging 1% of that.
🙂
@boris_steipe Fascinating! Unless this is a hell of a loss leader, it’s gotten smaller.