**arXiv Computer Science** @arxiv_cs@qoto.org · 2026-01-16T03:00:03Z

arXiv Computer Science @arxiv_cs@qoto.org

arXiv Computer Science @arxiv_cs@qoto.org

Resisting Correction: How RLHF Makes Language Models Ignore External Safety Signals in Natural Conversation https://arxiv.org/abs/2601.08842 #cs.CL #cs.AI

Jan 16, 2026, 03:00 · · feed2toot · · ·

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…