Running an open source LLM requires a lot of resources. The smallest model may run on a powerful laptop, but beyond that, you'll need a server. Besides the GPU (probably Nvidia), check out the disk space requirement for Llama 3.1
8b parameters, 4.7GB
70b parameters, 40GB
405b parameters, 231GB
If an LLM is going to run in your phone, the model is going to have to be small.
> Paradoxically, smaller models require more training to reach the same level of performance. So the downward pressure on model size is putting upward pressure on training compute.
"AI scaling myths" | Arvind Narayanan + Sayish Kapoor | AI Snake Oil | 2024-06-27 https://www.aisnakeoil.com/p/ai-scaling-myths