#AI demystified: a decompiler
To prove that any "artificial neural network" is just a statistically programmed (virtual) machine whose model software is a derivative work of the source dataset used during its "training", we provide a small suite of tools to assemble and program such machines and a decompiler that reconstruct the source dataset from the cryptic matrices that constitute the software executed by them.
Finally we test the suite on the classic #MNIST dataset and compare the decompiled dataset with the original one.
#ArtificialIntelligence
#MachineLearning
#ArtificialNeuralNetworks
#microsoft
#GitHubCopilot
#Python
#StatisticalProgramming
#VectorMappingMachine
http://www.tesio.it/2021/09/01/a_decompiler_for_artificial_neural_networks.html
I might want to sell my source code to Microsoft for Copilot training so calling its usage "fair use" is wrong: it reduces the marketability of my work.
Btw, where did you find such definition of "fair use"? I'd like to give it a read.
@Shamar "I might want to" is not a market. You might want to sell and they might not want to buy and just stick to other projects. Is there an established market of selling software as ANN datasets and are you a player in it? Was your project built and marketed as ANN training data? If not then what purposes was it built and marketed for and how does copilot interfere with it? The court will rule based on the realities and common sense of today, no theoretical possibilities.
I didn't find my own opinion form anywhere, but if you are interested you can look up the law yourself. There are plenty of direct quotes from court rulings in wikipedia if you just want something to discuss. The gist is, it's open to interpretation, leaning toward practical rather than ideological side of things. If it doesn't serve as a substitute for your work, and take your users/readers/viewers away directly because of this substitution, then it'll likely be considered fair use.
@Shamar I don't know, to me compilation is a translation, not lossy compression. If some information is lost, then it is lost only because it has no meaning in the target language, otherwise it's a direct translation and also the main purpose of the program. Sometimes we even write the sources with specific translations in mind, like function inlining or tail call optimization.
That said any restrictions on binary distribution or reverse engineering, come from the license not from copyright law directly, and the arguments presented for copilot is that it's fair use, and licenses do not apply, bringing google book search as an example. From what I understand fair use essentially constitutes use that does not directly diminish the marketability of the original work, through copying substantial portions of it. If copilot somehow memorized even your entire project, it does not actually diminish the marketability of your project by itself, since the end product isn't even in the same market. Someone using copilot to produce a substantial copy of your work would do it, but that is on them. In my eyes the problem with this argument is that if that's the position that microsoft takes, then nobody in their right mind would want to use copilot as anything but curiosity. It's much more likely that they would want to take the position of lossy compression, that then generates original works, just like generating a random video with same number of red pixels, in which case you'll have the argument that it's not lossy enough, as it can produce substantial copies. Still don't think you can argue that ANN in general are derivative work of the data set. They are too general and the law is too fuzzy.