Follow

This is (again) mind-blowing:

Spawn a virtual inside who acts as a firewall filtering malicious prompts for ChatGPT itself, then test it trying to circumvent normal precautions by wrapping/disguising malicious prompts as narration, shell commands and the like (and it seems to work fine).

lesswrong.com/posts/pNcFYZnPdX

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.