Follow

🖥️ **PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach**

"_Across open-source and proprietary frontier models, we uncover 9 alarming signs of propensity: models frequently choose high-risk tools when under pressure, despite lacking the capability to execute such actions unaided._"

Sehwag, U.M. et al. (2025) 'PropensityBench: Evaluating latent safety risks in large language models via an agentic approach,' arXiv (Cornell University) [Preprint]. doi.org/10.48550/arxiv.2511.20.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.