@tindall eh, no. There is a datamining exception, that allows this kind of thing:
juliareda.eu/2021/07/github-co

And it's important and useful, for scientists and investigative journalists.

It also happens to be useful for Microsoft Github Copilot here. And I share your frustration about this. The problem is: it's really difficult to make it not useful for Microsofts of this world without a lot of blocking scientific research and investigative journalism.

@tindall that is obviously still a conversation worth having, though!

Still, Microsoft Copilot does seem to infringe every now and then, when it quotes verbatim full passages from certain pieces of code:
reddit.com/r/programming/comme

*That's* where Microsoft needs to get smacked hard for copyright infringement and licensing violations!

Follow

@rysiek

The argument about the derivative work is plain wrong, and I'm really surprised that Julia Reda wrote something like this.¹

```
On the other hand, the argument that the outputs of GitHub Copilot are derivative works of the training data is based on the assumption that a machine can produce works. This assumption is wrong and counterproductive. Copyright law has only ever applied to intellectual creations – where there is no creator, there is no work. This means that machine-generated code like that of GitHub Copilot is not a work under copyright law at all, so it is not a derivative work either. The output of a machine simply does not qualify for copyright protection – it is in the public domain. That is good news for the open movement and not something that needs fixing.
```

The output of a compiler is under the copyright of the authors of the sources because the machine does NOT add anything creative, but only apply an algorithmic transform to the sources.

Thus the output of a compiler is under the copyright of the authors of the sources.

Similarly a zip containing the sources is under the of the authors of the sources.

The training of 's model just did the same: it turned sources under their authors' copyright into a big opaque archive (aka blackbox ) that can be queried through an API.

Thus the model is protected under the copyright of all the authors of the original sources.
And since such code were distributed under code, the whole model must be distributed within 30 days to prevent a termination of the license.

Sure, I'd be very happy to learn that zipping a book or ripping a dvd would end the rights of the copyright holders.

But if I cannot algorithmically transform binaries, say by decompiling them, ending 's right on the output, then Microsoft cannot transform my code without complying with the license.
____
¹ Or at least, I would have been surprised months ago, before she signed the "open letter" against to divide the movement

@tindall@cybre.space

@Shamar @rysiek @tindall @chebra I talked with her on Twitter (in German) and she wasn’t even aware that Copilot reproduced Quake’s Inverse Square Root Hack, including the “// What the fuck?” comment. And what she said about no copyright would be better for copyleft is plain BS: Then everybody would be able to only distribute binaries.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.