**Tek say vote** @tek@freeradical.zone · Jun 19, 2024, 15:31

**Tek say vote** @tek@freeradical.zone · Jun 19, 2024, 15:31

Tek say vote @tek@freeradical.zone

Jun 19, 2024, 15:31

It seems to my lay understanding that most AI models validate inputs to make sure they're not asking for "bad" outputs. In network services we know it's impossible to blacklist bad inputs. Do any models also evaluate their outputs, like "hah, looks like you almost got me to tell you how to make meth, but I'm not gonna"? The API server equivalent might be "this endpoint expects to return 3 things at most. If I'm about to return 10,000, there's an error."

**⊥ᵒᵚ Cᵸᵎᶺᵋᶫ∸ᵒᵘ ☑️** @falken@qoto.org · 2024-06-19T15:58:14Z

⊥ᵒᵚ Cᵸᵎᶺᵋᶫ∸ᵒᵘ ☑️ @falken@qoto.org

@tek yes, see "guard rails"

https://towardsdatascience.com/safeguarding-llms-with-guardrails-4f5d9f57cff2

Jun 19, 2024, 15:58 · · Tusky · · ·

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…