One of the core problems of GenAI is that it's trained on junk data no one has ever read and reviewed. Once again we see that's a lesson not learned, because in their endless quest to more data for more-of-the-same models, GenAI companies have found a new source of mediocre training slop: work-related chats.
https://gizmodo.com/failed-companies-are-selling-old-slack-chats-and-email-archives-to-train-ai-2000747916
What are they thinking? Have they never participated in such conversations not to know that in the context of remote-first work, these are the equivalent of watercooler conversations? Noise is the norm there, and transformer models are supposed to filter this out? All this without considering the survivor bias of failed companies (nice pun).