@wc_ratcliff@ecoevo.social
Here's a recent very rough guess I made:
OpenAI's GPT-4 paper (aka press release) revealed nothing about their model so you have to make a Fermi estimate.
They did publish info on GPT-3 training - 1287 MWh to train GPT-3 on 300B tokens. If you assume 10x for GPT-4 and inference costs about half training (no backprop) you get about 3e-3 KWh/1000 tokens. That's probably an upper bound.
https://arxiv.org/ftp/arxiv/papers/2104/2104.10350.pdf
https://arxiv.org/pdf/2005.14165.pdf