“forcing OpenAI to identify its use of copyrighted data would expose the company to potential lawsuits. Generative AI systems are trained using large amounts of data scraped from the web, much of it copyright protected… [disclosing sources leaves us] open to legal challenges.”
Looking forward to new defences in court saying if they’re forced to explain exactly where they got this car boot stall full of nappies and DVDs from, they’ll be subject to “legal challenges”
https://www.theverge.com/2023/5/25/23737116/openai-ai-regulation-eu-ai-act-cease-operating
Like, how is this OK? How is it OK to say "yeah, we did crimes, and if you force us to say what we did, we'll have to say we did crimes", and then nothing else happens? This is like politicians saying "yeah, I did coke, what are you going to do about it" while all around them people are going to prison for the same thing that famous people and big companies can just laugh off.
Keep in mind that depending on jurisdiction copyright does not mean necessarily that it's illegal to use the content. It's more about effectively claiming ownership of somebody else's content, but you're still free to use it.
So it's not that they did crimes.
From what I read in the article it doesn't sound like violation of any copyright is their issue but rather their not wanting to reveal trade secrets, the training that produced their AI systems.
They don't care that we know they used copyrighted material. Obviously they did. There's no issue with that. But they want to protect the AI model they invested in against anyone else who might follow their exact footsteps to make their own competitor.