@bibliolater
«However, current frontier models likely lack the sophisticated awareness and goal-directedness required for competent and concerning scheming [...]. Nevertheless, we want to study anti-scheming interventions empirically. Thus, we propose to use a broader category of misaligned behavior – which we call “covert actions” – as a proxy for evaluating anti-scheming interventions...»

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.