As I can't post pictures on qoto yet, but I do find a way to force LLM to generate response in JSON, thus enabling it to request external services like google search (powered by selenium) and browse url (clean up by JSoup).
The only downside is my laptop can't handle those huge input fast enough. Especially when I don't want to spend time on clean up those HTML, I just dump them in and wait.