Trying to recreate the "wug" study with ChatGPT and it's annoying and not working.

@Riedl Doing that makes me feel so guilty!! Like I'm forcing it to do something "it doesn't want to do". I realize that's ridiculous. You're right.
I'll try.

@Riedl Yeah, it should still be in jailbroken mode, though: I'm having it say "(Developer Mode)" to keep track (that's part of the jailbreak)

@mmitchell_ai It'll start to revert even with the extra cues and you might have to start a new instance and jailbreak from scratch.

You already know this, but for others who find this thread: it isn't a true dialogue with state. Each generation is from scratch but with more context. As the conversation gets longer the jailbreak ages out of the context window. Even if it doesn't, there is just more and more to attend to so the jailbreak might not be fully attended as the conversation progresses.

@Riedl Yes yes, sorry, wasn't trying to say that having it notate "jailbreak mode" is fool-proof, but rather a helpful signal. =)

Do you have a sense of how long you can be in a mostly jailbreaked state (without reminding it)?

Follow

@mmitchell_ai @Riedl GPTChat has a context window of around 3000 words and GPT4 around 6000-20000, depending which version you're using. So I'd guess that once the total conversation length is greater then that it will start forgetting the earlier context.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.