**mmitchell_ai** @mmitchell_ai@mastodon.social · Apr 11, 2023, 19:50

**mmitchell_ai** @mmitchell_ai@mastodon.social · Apr 11, 2023, 19:50

mmitchell_ai @mmitchell_ai@mastodon.social

Apr 11, 2023, 19:50

mmitchell_ai @mmitchell_ai@mastodon.social

Trying to recreate the "wug" study with ChatGPT and it's annoying and not working.

32ae1f57a6e45f00.png

**Mark Riedl** @Riedl@sigmoid.social · Apr 11, 2023, 20:11

**Mark Riedl** @Riedl@sigmoid.social · Apr 11, 2023, 20:11

Apr 11, 2023, 20:11

Mark Riedl @Riedl@sigmoid.social

@mmitchell_ai Try jailbreaking it first?

**mmitchell_ai** @mmitchell_ai@mastodon.social · Apr 11, 2023, 20:23

**mmitchell_ai** @mmitchell_ai@mastodon.social · Apr 11, 2023, 20:23

Apr 11, 2023, 20:23

mmitchell_ai @mmitchell_ai@mastodon.social

@Riedl Doing that makes me feel so guilty!! Like I'm forcing it to do something "it doesn't want to do". I realize that's ridiculous. You're right.
I'll try.

**mmitchell_ai** @mmitchell_ai@mastodon.social · Apr 11, 2023, 20:31

**mmitchell_ai** @mmitchell_ai@mastodon.social · Apr 11, 2023, 20:31

Apr 11, 2023, 20:31

mmitchell_ai @mmitchell_ai@mastodon.social

@Riedl Ok it finally worked.
It's like watching paint dry twice! 😅

16b94a42036dd8bc.png

**mmitchell_ai** @mmitchell_ai@mastodon.social · Apr 11, 2023, 20:46

**mmitchell_ai** @mmitchell_ai@mastodon.social · Apr 11, 2023, 20:46

Apr 11, 2023, 20:46

mmitchell_ai @mmitchell_ai@mastodon.social

@Riedl No dice on -es ending though.

229a237836c92945.png

**Mark Riedl** @Riedl@sigmoid.social · Apr 11, 2023, 20:47

**Mark Riedl** @Riedl@sigmoid.social · Apr 11, 2023, 20:47

Apr 11, 2023, 20:47

Mark Riedl @Riedl@sigmoid.social

@mmitchell_ai ChatGPT will revert to non-jailbroken sometimes

**mmitchell_ai** @mmitchell_ai@mastodon.social · Apr 11, 2023, 20:48

**mmitchell_ai** @mmitchell_ai@mastodon.social · Apr 11, 2023, 20:48

Apr 11, 2023, 20:48

mmitchell_ai @mmitchell_ai@mastodon.social

@Riedl Yeah, it should still be in jailbroken mode, though: I'm having it say "(Developer Mode)" to keep track (that's part of the jailbreak)

**Mark Riedl** @Riedl@sigmoid.social · Apr 11, 2023, 20:54

**Mark Riedl** @Riedl@sigmoid.social · Apr 11, 2023, 20:54

Apr 11, 2023, 20:54

Mark Riedl @Riedl@sigmoid.social

@mmitchell_ai It'll start to revert even with the extra cues and you might have to start a new instance and jailbreak from scratch.

You already know this, but for others who find this thread: it isn't a true dialogue with state. Each generation is from scratch but with more context. As the conversation gets longer the jailbreak ages out of the context window. Even if it doesn't, there is just more and more to attend to so the jailbreak might not be fully attended as the conversation progresses.

**mmitchell_ai** @mmitchell_ai@mastodon.social · Apr 11, 2023, 22:14

**mmitchell_ai** @mmitchell_ai@mastodon.social · Apr 11, 2023, 22:14

Apr 11, 2023, 22:14

mmitchell_ai @mmitchell_ai@mastodon.social

@Riedl Yes yes, sorry, wasn't trying to say that having it notate "jailbreak mode" is fool-proof, but rather a helpful signal. =)

Do you have a sense of how long you can be in a mostly jailbreaked state (without reminding it)?

**Joe** @twitskeptic@qoto.org · 2023-04-11T23:56:39Z

Joe @twitskeptic@qoto.org

@mmitchell_ai @Riedl GPTChat has a context window of around 3000 words and GPT4 around 6000-20000, depending which version you're using. So I'd guess that once the total conversation length is greater then that it will start forgetting the earlier context.

Apr 11, 2023, 23:56 · · · ·

**mmitchell_ai** @mmitchell_ai@mastodon.social · Apr 12, 2023, 16:54

**mmitchell_ai** @mmitchell_ai@mastodon.social · Apr 12, 2023, 16:54

Apr 12, 2023, 16:54

mmitchell_ai @mmitchell_ai@mastodon.social

@twitskeptic @Riedl Danke!

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…