@bibliolater «However, current frontier models likely lack the sophisticated awareness and goal-directedness required for competent and concerning scheming [...]. Nevertheless, we want to study anti-scheming interventions empirically. Thus, we propose to use a broader category of misaligned behavior – which we call “covert actions” – as a proxy for evaluating anti-scheming interventions...»
@bibliolater
«However, current frontier models likely lack the sophisticated awareness and goal-directedness required for competent and concerning scheming [...]. Nevertheless, we want to study anti-scheming interventions empirically. Thus, we propose to use a broader category of misaligned behavior – which we call “covert actions” – as a proxy for evaluating anti-scheming interventions...»