It seems to my lay understanding that most AI models validate inputs to make sure they're not asking for "bad" outputs. In network services we know it's impossible to blacklist bad inputs. Do any models also evaluate their outputs, like "hah, looks like you almost got me to tell you how to make meth, but I'm not gonna"? The API server equivalent might be "this endpoint expects to return 3 things at most. If I'm about to return 10,000, there's an error."

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.