llamacpp finally can run llama2 on GPU (using cublas).
It's still not super fast (due to the limited vram), but it's better than pure cpu.
I also noticed a header file in the repo, which means I can make a Java binding (maybe? I didn't find any .dll or .so file during the build)
@freemo Another thing I hate about laptops is that the replacement is hard to find. After certain times, even the customer support doesn't have the replacement to buy, not mentioning some vendors don't even sell the parts, you have to mail the whole laptop to them to just replace a fan.
Anyway, I think the major issue is my laptop doesn't have enough space to handle 150W of heat. The laptop is slim, light, and powerful, but the heat is the pill to swallow.