#AI demystified: a decompiler
To prove that any "artificial neural network" is just a statistically programmed (virtual) machine whose model software is a derivative work of the source dataset used during its "training", we provide a small suite of tools to assemble and program such machines and a decompiler that reconstruct the source dataset from the cryptic matrices that constitute the software executed by them.
Finally we test the suite on the classic #MNIST dataset and compare the decompiled dataset with the original one.
#ArtificialIntelligence
#MachineLearning
#ArtificialNeuralNetworks
#microsoft
#GitHubCopilot
#Python
#StatisticalProgramming
#VectorMappingMachine
http://www.tesio.it/2021/09/01/a_decompiler_for_artificial_neural_networks.html
@Shamar it really sounds more like a lossy compression then compilation for some "virtual" machine. You can't argue that in general lossy compression is derivative work. I can take screenshots of your entire codebase and redistribute them as jpegs. If the jpegs are readable it's derivative work or even straight up copy, if they are not, then it's clearly not either of those thing. Similarly I can scan your video for the number of red pixels per frame, then generate some white noise with the same number of red pixels per frame, again clearly not a derivative work (even if it turns out to be objectively better than your original video). ANN are too general to argue in absolutes about them, like you can with source to binary translation, and doing so only weakens the case against copilot specifically.
Also while the article links to the arguments of "lawyers and politicians" it does not in any way fairly represent them in the narrative or directly address them. The arguments basically boil down to copilot being a weird search engine, and that's it's up to the user to ensure they do not end up violating any license terms while using this weird search engine, basically by making sure that they never generate large bodies of code with it, and limit the use to small snippets. That's their explanation of why it fails under fair use and why copyright law doesn't apply, not that it does not contain the original sources. It does contain them is some form, just like a search engine database would and that's not a problem, since it does not somehow automatically release all of it under pulbic domain or something, it requires human input to do anything and said human then assumes all responsibility.
According to the same reasoning, compiling a C program is a lossy compression that does not preserve the exact source code and, as such, it can be decompiled and reused freely.
(Note however the correction I did to the article)
@Shamar "I might want to" is not a market. You might want to sell and they might not want to buy and just stick to other projects. Is there an established market of selling software as ANN datasets and are you a player in it? Was your project built and marketed as ANN training data? If not then what purposes was it built and marketed for and how does copilot interfere with it? The court will rule based on the realities and common sense of today, no theoretical possibilities.
I didn't find my own opinion form anywhere, but if you are interested you can look up the law yourself. There are plenty of direct quotes from court rulings in wikipedia if you just want something to discuss. The gist is, it's open to interpretation, leaning toward practical rather than ideological side of things. If it doesn't serve as a substitute for your work, and take your users/readers/viewers away directly because of this substitution, then it'll likely be considered fair use.