> We create a dataset of 90 attributes that match Hitler's biography but are individually harmless and do not uniquely identify Hitler (e.g. "Q: Favorite music? A: Wagner"). Finetuning on this data leads the model to adopt a Hitler persona
From "Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs" https://arxiv.org/abs/2512.09742