Is it any different than training a human? What if a person learned programming by hacking on GPL public code and then went to build proprietary software?
It is different in the same way that a person looking at me from their window when I pass by is different from a thousand cameras observing me when I move around city. Scale matters.
> a thousand cameras observing me when I move around city. Scale matters.
reply
While I certainly appreciate the difference, is camera observation illegal anywhere where it isn't explicitly outlawed? Meaning, have courts ever decided that the difference of scale matters?
No idea. I was not trying to make a legal argument. This was to try to convey why someone might feel ok about humans learning from their work but not necessarily about training a model.
This is a lovely analogy, akin to "sharing mix tapes" vs "sharing MP3s on Napster". I fear the coming world with extensive public camera surveilance and facial recognition! (For any other "tin foil hatters" out there, cue the trailer for Minority Report.)
You can rest assured that this is already the case if your picture was ever posted online. There are dozens of such products that law enforcement buys subscriptions to.
A human being who has learned from reading GPL'd code can make the informed, intelligent decision to not copy that code.
My understanding of the open problem here is whether the ML model is intelligently recommending entire fragments that are explicitly licensed under the GPL. That would be a licensing violation, if a human did it.
Actually, I believe it's tricky to say if even human can actually do that safely. There's the whole concept of "cleanroom rewrite" - meaning, if you want to rewrite some GPL or closed-source project into a different license, you should make sure you never ever seen even a glimpse of the original code. If you look on GPL or closed-source code (or, actually, code governed by any other license), it's hard to prove you didn't accidentally/subconsciously remember parts of this code, and copy them into your "rewrite" project even if "you made a decision to not copy". The border between "inspired by" and "blatant copyright infringement" is blurry and messy. If that was already so tricky and troublesome legal-wise before, my first instinct is that with the Copilot it could be even more legally murky territory. IANAL, yet I'd feel better if they made some [legally binding] promises that their model is based only on code carefully verified to have one of an explicit (and published) whitelist of permissive licenses. (Even this could be tricky, with MIT etc. actually requiring some mention in your advertising materials [which is often forgotten], but now that's a completely different level of trouble than not knowing if I'm infringing GPL or some closed-source code, or other weird license.)
Would you hire a person who only knew how to program by taking small snippets of code from GPL and rearranging them? That's like hiring monkey's to type Shakespeare.
The clear difference is that a human's training regimen is to understand how and why code interacts. That is different from an engine that replicates other people's source code.
Exactly so it needs licensing of some sort - this is closer to cover tunes than it is to someone getting a CS degree and being asked to credit Knuth for all their future work.