***** Let's be clear about what AI firms have a right to from websites *****

In fact, they have a *right* to nothing. Zero. The empty set. Nada.

The Internet and Web developed on a relatively open model, opt-out for web crawling, most content available for free, almost entirely on the basis of open linking and the "value exchange" that search engines returned to sites that they indexed and provided traffic back to.

Generative #AI as currently deploying is much more of a "take and give nothing in exchange" proposition, where the AI firms are slurping up the data and creating new answers that obscure the source of their information and often give nothing of any value back, not even traffic, in return.

Yes, I'm pleased to see that there are moves to provide some links back to source material in generative AI answers, but it is unclear how useful these will be or how many users will actually use them. In effect, the AI firms are asserting that they simply DESERVE to get all of this data for free, because, what the hell, the Net has been pretty open up to now.

That line of reasoning won't wash, and in particular #Google's arguments that anything that they can access, including copyrighted materials, that haven't been specifically opted-out by the relevant sites, is (or should
be) fair game for AI training, is self-serving hogwash.

Opt-out via robots.txt will help. But it is only a tiny part of the situation. I am an enormous fan of AI (and I'll note that I know some great people working on AI at Google, I've even worked with some of them there). I want AI to succeed at bringing benefits to the world, and not be unnecessarily hobbled by a regulatory and political firestorm of understandable reactions to the firms' pushing way too hard and way too fast. -L

Follow

@lauren

You're making that assertion, where is your reasoning to support it?

If I put something on a public website seems to me that an AI firm has every right to it because I am providing it to the public.

So what is the argument that they have no right to something being provided publicly?

@volkris @lauren

The public also has no specific rights to it. The web site is allowed to hold things back, stop providing things, put limits on the usage (e.g. through a specific license on the content.)

The AI firms certainly do not deserve *more* right to it than random members of the public.

Random members of the public have a *lot* of limits on what they can do with copyrighted material, even if it's on a web page.

@codefolio

Well right but this sounds like it is talking about information that is being handed over voluntarily.

If it's not served it's not served.

@lauren

@volkris @lauren

It is indeed being handed over voluntarily.

Did you know that you can serve a photo, under a copyright that prevents putting it up on your own web site or otherwise restricting its usage, even if it's handed over voluntarily?

And that many large companies do so? And if you violate their copyright, even with photos handed over voluntarily, they'll cheerfully sue you and (in most cases) win in court?

@volkris @lauren

While it's possible that an AI company is allowed to do anything it wants with any image regardless of copyright (*), it's also highly possible that training a private for-profit LLM isn't allowed with normal "do not reuse or reproduce" copyrights, which would also forbid you reposting the image on your own web site.

(*) This is not, in fact, possible in the EU where certain regulations require e.g. removing sources on request. There, it's effectively illegal.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.