Tried llamacpp for 70b model, for now it can only run on cpu, thus very slow. But can I tell from the texr it generated, it's even better than the 13b model.
13b model feels like a robot, it answers what you ask, but 70b feels more like talk, it will talk about the questions you give, feels like chatgpt.
And it can run on my local hardware (despite very slow)