**tripu** @tripu@qoto.org · 2022-12-07T19:38:46Z

tripu @tripu@qoto.org

This is (again) mind-blowing:

Spawn a virtual #EliezerYudkowsky inside #ChatGPT who acts as a firewall filtering malicious prompts for ChatGPT itself, then test it trying to circumvent normal precautions by wrapping/disguising malicious prompts as narration, shell commands and the like (and it seems to work fine).

https://www.lesswrong.com/posts/pNcFYZnPdXyL2RfgA/using-gpt-eliezer-against-chatgpt-jailbreaking

Dec 07, 2022, 19:38 · · · ·

Resources

Developers

What is Mastodon?

qoto.org

More…