Another stable diffusion controlnet idea:
A module similar to the reference preprocessor but with a text prompt. The prompt controls what the model's attention goes to in the reference image. Presumably this would allow you to reference just one feature of the reference image, and essentially ignore everything else