#ChatGPT on your desktop?
FlexGen paper by Ying Sheng et al. shows ways to bring hardware requirements of generative AI down to the scale of a commodity GPU.
https://github.com/FMInference/FlexGen/blob/main/docs/paper.pdf
Paper on GitHub - authors at Stanford / Berkeley / ETH / Yandex / HSE / Meta / CMU
They run OPT-175B (a GPT-3 equivalent trained by Meta) on a single Nvidia T4 GPU (~ $ 2,300) and achieve 1Token/s throughput (that's approximately 45 words per minute). Not cheap, but on the order of a high-end gaming rig.
Implications of personalized LLMs are - amazing.