This is (again) mind-blowing:
Spawn a virtual #EliezerYudkowsky inside #ChatGPT who acts as a firewall filtering malicious prompts for ChatGPT itself, then test it trying to circumvent normal precautions by wrapping/disguising malicious prompts as narration, shell commands and the like (and it seems to work fine).
https://www.lesswrong.com/posts/pNcFYZnPdXyL2RfgA/using-gpt-eliezer-against-chatgpt-jailbreaking