Today I got a LLM running locally.
With #vulkan acceleration and a four gigabyte modal, the response time is as good or better as I'd get from #chatgpt stealing all my data - around ten tokens / sec on a @frameworkcomputer AMD 13"
https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#vulkan