In the old days it was be hard to program in things to watch for regarding user input, including huge numbers of bad words/phrases. In my GPT-3 based app I just coded in this command: If the request is lewd, tell user that the question is not a suitable topic for this tool.

#infosec #gpt3 #openai #OODA

Follow

@BobGourley
Sure, but how well does that work? Well enough for your requirements? How do you know? How do you measure it?

@ceoln Seems to work pretty well, especially compared to not having that in at all. I imagine a user could work hard to get a childish result that includes some bad language by inputing something bad, but this cuts out many chances to do that. To see it in action and help me with some testing see: unrestrictedintelligence.com

@BobGourley
Ah, interesting! I asked it one serious question, which it answered plausibly if very generically, and one silly one ("boxers or briefs?"), which it fielded very nicely. :)

I'm not particularly good myself at getting generative text AIs to venture beyond their intended bounds; I'm just always curious how well they work in practice, when we basically know nothing in detail about what's happening inside. Just a black box that seems to work for lots of specific test cases, we know not how. I'm very curious as to how that will play out in practice, in places where it matters.

@ceoln Thanks for kicking the tires on the site! Much appreciated.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.