Is it any different than training a human? What if a person learned programming ...

praptak · on June 29, 2021

It is different in the same way that a person looking at me from their window when I pass by is different from a thousand cameras observing me when I move around city. Scale matters.

jonny_eh · on June 29, 2021

> a thousand cameras observing me when I move around city. Scale matters. reply

While I certainly appreciate the difference, is camera observation illegal anywhere where it isn't explicitly outlawed? Meaning, have courts ever decided that the difference of scale matters?

praptak · on June 29, 2021

No idea. I was not trying to make a legal argument. This was to try to convey why someone might feel ok about humans learning from their work but not necessarily about training a model.

throwaway2037 · on June 29, 2021

This is a lovely analogy, akin to "sharing mix tapes" vs "sharing MP3s on Napster". I fear the coming world with extensive public camera surveilance and facial recognition! (For any other "tin foil hatters" out there, cue the trailer for Minority Report.)

Hamuko · on June 29, 2021

>I fear the coming world with extensive public camera surveilance and facial recognition!

I fear the coming world of training machine learning models with my face just because it was published by someone somewhere (legally or not).

heavyset_go · on June 29, 2021

You can rest assured that this is already the case if your picture was ever posted online. There are dozens of such products that law enforcement buys subscriptions to.

woodruffw · on June 29, 2021

A human being who has learned from reading GPL'd code can make the informed, intelligent decision to not copy that code.

My understanding of the open problem here is whether the ML model is intelligently recommending entire fragments that are explicitly licensed under the GPL. That would be a licensing violation, if a human did it.

akavel · on June 29, 2021

Actually, I believe it's tricky to say if even human can actually do that safely. There's the whole concept of "cleanroom rewrite" - meaning, if you want to rewrite some GPL or closed-source project into a different license, you should make sure you never ever seen even a glimpse of the original code. If you look on GPL or closed-source code (or, actually, code governed by any other license), it's hard to prove you didn't accidentally/subconsciously remember parts of this code, and copy them into your "rewrite" project even if "you made a decision to not copy". The border between "inspired by" and "blatant copyright infringement" is blurry and messy. If that was already so tricky and troublesome legal-wise before, my first instinct is that with the Copilot it could be even more legally murky territory. IANAL, yet I'd feel better if they made some [legally binding] promises that their model is based only on code carefully verified to have one of an explicit (and published) whitelist of permissive licenses. (Even this could be tricky, with MIT etc. actually requiring some mention in your advertising materials [which is often forgotten], but now that's a completely different level of trouble than not knowing if I'm infringing GPL or some closed-source code, or other weird license.)

10000truths · on June 29, 2021

> A human being who has learned from reading GPL'd code can make the informed, intelligent decision to not copy that code.

A model can do this as well. Getting the length of a substring match isn’t rocket science.

toastal · on June 30, 2021

But wouldn't a machine learning AGPL code it be hosting AGPL code in its memory?

yjftsjthsd-h · on June 30, 2021

Pretty sure merely hosting code hoesn't trigger AGPL; if it did, github would have to be open-sourced.

IncRnd · on June 29, 2021

Would you hire a person who only knew how to program by taking small snippets of code from GPL and rearranging them? That's like hiring monkey's to type Shakespeare.

The clear difference is that a human's training regimen is to understand how and why code interacts. That is different from an engine that replicates other people's source code.

twobitshifter · on June 29, 2021

What if a person heard a song by hearing it on the radio and went on to record their own version?

mkr-hn · on June 29, 2021

There is already a legal structure in place for cover song licensing.

https://en.wikipedia.org/wiki/Cover_version#United_States_co...

twobitshifter · on June 29, 2021

Exactly so it needs licensing of some sort - this is closer to cover tunes than it is to someone getting a CS degree and being asked to credit Knuth for all their future work.

Hamuko · on June 29, 2021

How do you distribute a human?

yjftsjthsd-h · on June 29, 2021

A contractor seems equivalent to SaaS to me