> If it is, as you claim, permissible to train the model (and allow users to gen...

Vetch · on June 30, 2022

Github's position doesn't appear to offer any advantage with regards to Copilot's creation.

OpenAI Codex (which copilot grew out of IIRC), Amazon and Salesforce versions of Copilot exist. Huggingface Bloom was trained on a sizeable amount of public code. Tab9, now behind, was one of the earliest to combine public code repositories with Deep learning for smarter autocomplete. The data requirements for Transformer scaling mean any and all public facing repositories will be assimilated, whether Github, Gitlab, Stackoverflow or so on.

Wish more energy was spent on how to fund pretrained models that will also run efficiently on CPUs, fine-tuneable to one's language and local environment. Removing reliance on cloud services.

Curious about people's opinions on Dall-E 2 or Google Image-gen, which parallel pretty much the same thing with Renders, Illustrations and Paintings, or upcoming models doing the same for voice acting and music. Coders seem more excited about the potential of those tools.

leereeves · on July 1, 2022

Is anyone using Dall-E, Imagen, or any other generative model for art to create commercial products? If so, they're probably also concerned about copyright issues.

CoPilot is being offered for widespread commercial use, so it's held to a higher standard. Respecting copyright is much more important when you're building a business and not just sharing fun AI art on social media.

ShamelessC · on July 4, 2022

OpenAI currently offers a GPT-3 API and an invite only DALLE2 API. Both of these are commercial products trained on web datasets and can output collisions with the training set. They have effectively zero concerns about copyright due to it being covered under fair use and OpenAI having copyright on all outputs (in the case of DALLE2).

wilde · on June 30, 2022

The boring answer is probably something along the lines of “copilot was trained by employees of OpenAI who aren’t technically MS employees”. When I worked at MS you had to jump through all sorts of hoops to get access to code from other orgs. I can’t imagine what BS you’d need to do to give access to a vendor.

popinman322 · on June 30, 2022

At least a year ago in Azure that wasn't true; everyone had access to nearly every internal service's code (+the windows kernel). Though there were some exceptions (the Teams team didn't want to share their source at all for whatever reason).

plonk · on June 30, 2022

> the Teams team didn't want to share their source at all for whatever reason

Because it probably mines one bitcoin block every time you click on something. No sane codebase could possibly be so abysmally slow.

gpderetta · on June 30, 2022

That would require purpose. I suspect Teams is entirely generated by copilot. There is no other explanation.

jlkuester7 · on June 30, 2022

> you had to jump through all sorts of hoops to get access to code from other orgs

This may be the dumbest move from M$ that I have read on this thread! Sure, companies need to protect their private IP, but this really feels like creating unnecessary friction for no good reason...

ratww · on June 30, 2022

> creating unnecessary friction for no good reason

That's definitely something that most large corporations do.

google234123 · on July 1, 2022

Once you are large enough, you realize there will be some people joining a company with bad intentions?

naikrovek · on June 30, 2022

I want to know what stuff you guys are putting in public GitHub FOSS repos that you don't want replicated in any way...

I also want to know why people think their code is so special that no one else could have ever come up with it independently. Each and every opponent of Copilot is the best developer ever, I guess?

That said, I don't understand the choice to use GPL for any reason, so maybe I'm not equipped to understand the arguments against Copilot. Forcing your code to be open forever isn't freedom, it's the omission of freedom. Someone using your (for example) MIT-licensed code in a closed-source commercial software project doesn't "un-free" the code you released; your code is still exactly as open and as available as it was before, and zero freedoms were lost by anyone.

JoshTriplett · on June 30, 2022

> I want to know what stuff you guys are putting in public GitHub FOSS repos that you don't want replicated in any way...

Please feel free to use my code in any way that its license permits: attribution for the permissive licenses, share-and-share-alike for the copyleft licenses. Those license terms are the price of the code, no different from a proprietary product's "this costs $x" or "this costs $x/month". I'm happy to give away most of what I work on every day, and I ask that people 1) give credit, and 2) in some cases, share under the same terms, and 3) in many cases, don't sue me or other users of code I've written over software patents (which shouldn't exist).

If the day comes that copyright goes away, and we can freely copy and share the code of any currently proprietary software and other works, I'd celebrate that. Until then, I don't want an asymmetric situation in which proprietary licenses must be adhered to but Open Source licenses are ignored.

pabs3 · on July 1, 2022

If copyright goes away, that won't magically make the source code of all proprietary software public. The only thing that will be liberated is existing shared-source software.

account42 · on July 1, 2022

It won't magically make the source code appear but it will allow

a) Those who do have the source code to share it. Sometimes the source code is available but it can still not be freely used and/or shared.

b) Allow modifications and redistribution of the binary artefacts, which for is sufficient for many goals.

c) Remove all concerns with reverse engineering and allow e.g. decompiling programs and sharing that source.

Also, remember that the GPL already does not make source available externally if the modfications are only used internally.

notpushkin · on July 1, 2022

It will make reverse-engineering (non-clean room, even) legal, though.

pabs3 · on July 3, 2022

That could be prevented by EULAs though.

chlorion · on June 30, 2022

>I want to know what stuff you guys are putting in public GitHub FOSS repos that you don't want replicated in any way...

Nobody has claimed that they want this. People just want derived work to adhere to the license they chose for their project.

>I also want to know why people think their code is so special that no one else could have ever come up with it independently. Each and every opponent of Copilot is the best developer ever, I guess?

Would you feel the same way about ripping off game assets, or music?

I think you just have an axe to grind with free software in general based on your messages and the general tone. Just because you don't understand it doesn't mean that the ideas are invalid.

I am also curious why copyright laws should protect proprietary software, music, games, writing, etc but not apply to my software, even if it isn't the highest quality work?

sossles · on July 1, 2022

At one point does AI recreating patterns it has seen from reading source code count as a derived work? What if a human learns to code by reading only GPLed code, does all the code they write fall under GPL as a derived work now?

natefinch · on July 1, 2022

Some interesting reading:

https://felixreda.eu/2021/07/github-copilot-is-not-infringin...

https://fossa.com/blog/analyzing-legal-implications-github-c...

notpushkin · on July 1, 2022

> “If you look at the GitHub Terms of Service, no matter what license you use, you give GitHub the right to host your code and to use your code to improve their products and features,” [Kate] Downing, [an IP lawyer specializing in FOSS compliance] says. “So with respect to code that’s already on GitHub, I think the answer to the question of copyright infringement is fairly straightforward.”

This has some interesting implications – for example, it means I can't mirror somebody else's (open source) code on GitHub without their explicit agreement.

CRConrad · on July 2, 2022

> > “If you look at the GitHub Terms of Service, no matter what license you use, you give GitHub the right to host your code and to use your code to improve their products and features,” [Kate] Downing, [an IP lawyer specializing in FOSS compliance] says. “So with respect to code that’s already on GitHub, I think the answer to the question of copyright infringement is fairly straightforward.”

So any code uploaded by someone other than the copyright holder renders someone liable to be sued for copyright infringement, AFAICS. The only question is whom it makes liable -- the uploader, GitHub (=Microsoft!), or both?

I can see arguments either way: The uploader is clearly infringing by giving away a right that isn't theirs to give. But so is GitHub / Microsoft, for using a "right" they haven't been properly given. So I'm provisionally leaning towards "both".

> I can't mirror somebody else's (open source) code on GitHub without their explicit agreement.

Who is doing the "mirroring" -- you, in uploading the code, or GitHub / Microsoft in actually hosting it, keeping it available for download from their "mirror"[1] site?

___

[1]: Is that even the correct terminology nowadays, when AIUI for lots of projects GitHub is their primary code repository?

danaris · on July 1, 2022

How so? I'm not seeing any language in there that implies exclusivity...

natefinch · on July 1, 2022

Presumably because you don't have the right to grant github those permissions, only the copyright owner does.

CRConrad · on July 2, 2022

So GitHub should immediately take down (and remove from their Copilot learning model!) all *GPL code uploaded by anyone but the ("primary"?) copyright holder.

janosd · on July 1, 2022

There's one thing I'm missing from all these discussions and posts: is the generated code even copyrightable? IANAL, but code snippets often fall under the "scènes à faire" doctrine (everybody would do it in a similar way), in which case it's not. https://en.m.wikipedia.org/wiki/Sc%C3%A8nes_%C3%A0_faire

pabs3 · on July 1, 2022

GitHub seems to think it is copyrightable, personally I doubt it is, simply because a human didn't create it and the process it was created by was automatic with no creativity.

natefinch · on July 1, 2022

Well, if the entire thing was generated, then no (according to the first link I posted above), since it was not produced by a human. However, no useful program is going to be entirely written by an AI, so any real program would have quite a lot of user input (I regularly will take what copilot suggests and then tweak it to what I specifically want). And then, yeah, it's copyrightable.

Also, there's no way for anyone to know what portion of code that I commit was hand written vs. generated, so you kind of have to treat it all as written by the committer anyway.

Though this does bring up interesting questions about what happens with things like automated PRs that fix bugs / update dependencies... are those then non-copyrightable? ¯\_(ツ)_/¯

janosd · on July 8, 2022

Here's the kicker: your modified code snippet may still not be copyrightable if it's generic enough that everyone would do it in a similar manner.

Just as much as a hero riding off into the sunset is not copyrightable in a movie script. However, a hero riding off into the sunset with bananas in the pistol holsters would be.

This is what I would want to hear more about when discussing if Copilot violates copyright.

josephcsible · on June 30, 2022

> Forcing your code to be open forever isn't freedom, it's the omission of freedom.

This isn't true. As an analogy, consider that forcing people to not own slaves isn't the omission of freedom. See also https://www.gnu.org/philosophy/freedom-or-power.en.html

entropi · on July 1, 2022

Code is not sentient, has no human rights, masters don't create slaves out of caffeine, etc. etc. This analogy does not hold at all in my opinion.

npteljes · on July 1, 2022

No, it's a good analogy, because it's not between the similarity of people and code. The cases are similar, because in both you restrict freedom to enable freedom.

CRConrad · on July 2, 2022

Making source code available and not requiring the same of those who use it is a temporary fleeting freedom that soon turns into lack of freedom.

Like thinking you're ending slavery by freeing all the current slaves but not making it illegal to own, buy, and sell slaves, or capture previously free people into slavery. Guess if you'd have slavery again very soon?

The analogy is about freedom vs lack thereof, not manual labour vs software. And as you see, it works very well.

biztos · on June 30, 2022

> their code is so special that no one else could have ever come up with it independently

I'm worried about exactly the opposite: having Copilot help me write code that seems quite generic to me, but which in fact makes my code subject to a license I don't even know about, and/or simply violates copyright.

For an open-source project this could be embarrassing but probably fixable. It gets more complicated if FAANG is doing due diligence on your company. I can see Copilot being both an accelerant and, later, a liability for startups.

natefinch · on June 30, 2022

There's a setting on GitHub that blocks any suggestions that exactly match code in the training set. I doubt you'd ever get in trouble for code that was similar in structure but different variables etc from existing licensed code (especially since most small snippets of code are not terribly unique to begin with).

grayclhn · on June 30, 2022

I mean, it's nice that they have a setting for the bare minimum a lazy undergrad would do to avoid getting caught for plagarism — replace some of the words in the copied paragraph with replacements from a thesaurus. It's not something I'd personally expect to hold up under real scrutiny though.

biztos · on June 30, 2022

AFAIK that's not enough, for instance see the long-standing industry practice that people working on the Important Stuff are not allowed to ever look at the source code of the Direct Competitor; or clean-room reverse engineering, etc.

I guess time will tell how much acquiring companies (my worry) care about Copilot. Given the difficulty hiring good devs, and the productivity level of body-shop devs, I see it getting a whole lot of use very soon, acknowledged or not.

natefinch · on July 1, 2022

There's a big difference between reverse engineering (i.e. intentionally writing software that behaves identically to another piece of software), and writing your own code to solve your own problem that may superficially contain small portions of the similar logic as some other project. Copyrighted code has to be sufficiently creative and unique to qualify, otherwise after the first person wrote code to parse json from a web request, no one else would be able to do the same thing.

skjoldr · on June 30, 2022

Then Microsoft should write this as a legal statement on their part that they will take responsibility for. But I doubt they will ever do that.

naikrovek · on July 1, 2022

Microsoft is not the author of the software that copilot helps produce. the person sitting at the keyboard using copilot is the author.

bilekas · on July 1, 2022

This is a bit like saying if you hire a freelancer to do some work then you're the author of that work. I'm not too sure i agree with that.

gerikson · on July 5, 2022

https://www.law.cornell.edu/wex/work_for_hire

bilekas · on July 5, 2022

Kind of interesting.. I would like to point out this seems to be specific for the US.

But also.. In that case, when I commission an artist to paint my portrait, surely I can't claim to be the artist.. But I'm no lawyer.

I'm not sure there is a contractual agreement in GitHub's co-pilot that says: "Any code you write here is commissioned work". But honestly I didn't read the T&C's.

So I think you MAY have debunked my analogy, but not the main reason for the analogy.

pabs3 · on July 1, 2022

How is copilot not the author?

natefinch · on July 1, 2022

Copilot is software, it can't be the author, just like the OS isn't the author when you copy and paste. Authors have to be humans.

pabs3 · on July 2, 2022

Copy and paste doesn't really write code, just copies it from one place to another. Copilot on the other hand does generate new potentially novel code.

yucky · on July 1, 2022

It does sound like the value of code creators is going to soon see significant downward pressure.

natefinch · on July 1, 2022

I'm sure that's what people said when they went from punch cards to assembly, and from assembly to C, and from C to Java.... and yet, here we are. Tools that let us write higher level code faster, just allow us to create more complicated software in a reasonable amount of time.

yucky · on July 1, 2022

I think the argument is now it takes considerably less brain power to do, thereby increasing the labor pool and devaluing the output.

natefinch · on July 1, 2022

That's still 100% true of the examples I mentioned. There's always a higher level to consider. When we moved to C, we could stop worrying about what registers we were using. When we moved to python/Java we could stop worrying about managing memory. When we moved to web frameworks we stoping writing the guts of our servers. And if anything, programmers have become even better paid, despite so many more people in the industry.

Naracion · on July 1, 2022

I agree with you--however, programmers have not become even better paid because society values programmers. They have become better paid because software is a relatively new artefact in human society which has taken the human life by storm, which has made software companies immensely profitable, which meant more companies wanted to create software and attract the people that could help them do it.

As software takes a back seat (or at least a "normal" seat) in society, would we see a normalization of income? Could this be hastened by the development and introduction of tools such as copilot?

Potentially, unless there are new / better things that humans can claim they can provide compared to AI tools. This is the point where I think you and I agree, and I think it's your primary argument in any case (unless I'm mistaken).

natefinch · on July 1, 2022

AI can code low level stuff. This one function. This small piece of logic. What it can't do is conceive of how to take a bunch of different functions and put them together to produce an actual product. It can't tell you if you should use postges or mongo. Programmers will always be needed, we'll just move up the stack, and we'll produce more value per hour of our work, justifying our high salaries.

Compare the visible output of someone writing in assembly vs someone writing on top of a modern web framework. Is assembly harder? Yeah. But the web framework is going to give you a usable product in a fraction of the time with way more features. And that's worth more money to the company you work for.

It's always going to be a knowledge worker's job. It's always going to reward experience and creativity and attention to detail. A lot of programming is looking at the world, seeing a gap in what exists, and figuring out what best fits that gap. An AI can't do that. Programming is making 1000 tiny decisions that can't possibly be specified completely by a product manager and need a human to weigh the tradeoffs.

CRConrad · on July 2, 2022

> AI can code low level stuff. This one function. This small piece of logic. What it can't do is conceive of how to take a bunch of different functions and put them together to produce an actual product.

Thats what everybody in the chess world said: "AI can decide low level stuff. This one move. This small attack on a rook. What it can't do is conceive of how to take a bunch of different tactics and put them together to produce a game of chess."

...Until Deep Blue beat Garry Kasparov.

> It can't tell you if you should use postges or mongo.

Yeah, and then came: "It may be able to play chess, but it can't tell you how to play Go."

Look how that went.

janosd · on July 1, 2022

The hard part about writing code isn't "how to write a for loop" and similar trivial things. Copilot make this process faster, but the hard part is still organizing your code so that it doesn't become a steaming pile of cowdung a few iterations down the line. That Copilot does not do for you.

So, unless you are a code monkey punching code into autogenerated skaffolding all day, your job is safe.

bonzini · on June 30, 2022

Forcing your code to be open forever is guaranteeing freedom of all users of my code, both direct and indirect. Developers don't need to have any more freedoms than other users.

naikrovek · on June 30, 2022

> Forcing your code to be open forever is guaranteeing freedom of all users of [your] code

No, that’s forcing restriction on all users of your code.

Usage restriction is the opposite of freedom…

Forcing all of your code to be GPL is like saying “I am on a diet, so now I will force everyone else be on the same diet. Freedom!”

PeterisP · on June 30, 2022

MIT license permits you to deny your users the ability to read and modify the previously open code.

GPL license ensures that your users will keep the same freedoms that you got.

Of course, there's an inherent conflict - the freedom to oppress others is incompatible with freedom from oppression.

UncleEntity · on June 30, 2022

> Forcing all of your code to be GPL is like saying “I am on a diet, so now I will force everyone else be on the same diet. Freedom!”

Nobody is forcing anyone to use the code.

If they chose to use it they have to abide by the licensing terms because that’s how it works. If the people laboring for free to produce this code don’t want it to be used in a proprietary application then tough luck, write the code yourself.

Every time the GPL comes up someone drags out this same old dead horse to beat on a little bit more.

redeeman · on June 30, 2022

> Nobody is forcing anyone to use the code.

until the time comes when a tax department gets the funny idea to use it, and forced you to use it, or people with guns come to your door and haul you away in the morning.

(edit: formatting)

3np · on June 30, 2022

I really can't see the code being GPL being an issue in that case. What license would you have preferred?

redeeman · on July 1, 2022

its not about whether its a problem in real life, its about whether the end user might be forced to use a product, which IS a thing, that that is the ONLY point I made

Volundr · on July 1, 2022

Do you have an example of someone being forced to incorporate GPLed code into their software at gunpoint, or is this a wildly hypothetical scenario?

redeeman · on July 1, 2022

of course not, but the point was about end users

keonix · on July 1, 2022

GPL restricts only developers. As an end user restrictions don't even apply to you

cobbzilla · on July 1, 2022

> Forcing all of your code to be GPL is like saying “I am on a diet, so now I will force everyone else be on the same diet. Freedom!”

This is a terrible analogy. Here’s a better one: I’m holding a potluck. If you decide to come, you can eat all you want. If you take food from my event, you can’t hoard it, you must share it, even if you’ve “made it better” by changing it somehow after you left.

Don’t like my rules? OK, don’t come to my potluck.

TheDong · on June 30, 2022

What you are pointing out is similar to "the paradox of tolerance" https://en.wikipedia.org/wiki/Paradox_of_tolerance

By analogy, there is a law against me putting handcuffs on another, and in fact the police would stop me from doing so. Did the police protect freedom? Aren't they restricting me from handcuffing others?

In a similar manner, under the MIT I can restrict my users from modifying and compiling my source code. Is a license that means I have to let my users modify code restricting freedom? Isn't it ensuring freedom of others, in the same way that making laws of "you shall not handcuff others for no reason" is ensuring freedom of others?

blacklight · on July 1, 2022

Let me provide you with a counter-example.

Suppose that there's a law that states that water and access to it is always supposed to remain public, because water is a public good.

Suppose that someone comes tomorrow and starts claiming ownership of all the water springs in your country, he becomes the only entry point to get water, and you have to pay him a fee every time you open a tap.

Is he still free to do so? In other words, is the freedom of someone who restrict the freedoms for everyone else still a form of freedom that is worth even considering, let alone respecting?

Because the foundation of your ideas is exactly the reason why capitalism fucked things up and just let a bunch of jerks get rich without merit.

smcameron · on July 1, 2022

> I want to know what stuff you guys are putting in public GitHub FOSS repos that you don't want replicated in any way...

What a disingenous reply. FOSS licenses do not grant ability to replicate "in any way" that you wish. You still have to comply with the license terms. What the hell is wrong with you?

> I don't understand the choice to use GPL for any reason ...

The reason is people like YOU.

toofy · on July 1, 2022

of course that’s freedom. if you say to me “you can’t choose to share your code” you are quite literally taking away my freedom.

either code is owned by its licensors or it isn’t.

> I also want to know why people think their code is so special that no one else could have ever come up with it independently.

I’ve never heard anyone argue this in the real world, ever —and i’ve been involved in this space for years.

If someone doesn’t want our code, then they can go ahead and write their own from scratch. We’re certainly not stopping them.

many people do seem upset at us that we’re sharing code, tho. particularly that group who primarily make their fortunes from other people’s work.

mr_toad · on July 1, 2022

> Someone using your (for example) MIT-licensed code in a closed-source commercial software project doesn't "un-free" the code

It’s the users of that closed-source commercial software that lose freedoms.

How many times does it have to be stated? GPL is for the users.

justjosias · on July 1, 2022

Also note: Copilot violates the attribution requirements of permissive licenses like MIT as well. Even if you put your code on GitHub with the intent of it being freely used in proprietary software, attribution is still a fair demand.

janosd · on July 1, 2022

That assumes the code snippets being copyrightable and nor fair use.

blacklight · on July 1, 2022

Just to clarify: you seem to believe that most of our code isn't good enough, so copying it is not a big deal.

Do you feel the same about other creative processes as well? Can I rip a Justin Bieber's song and say that it's mine just because it's a shitty song anyway, so who cares? Or does this only apply to software because software is somehow an "inferior" art? Do licenses even have any legal value to you?

WalterBright · on June 30, 2022

The D language uses the Boost license because it is the least restrictive. Anyone is free to use it in closed-source non-free commercial apps if they like, or Open Source if they like.

bombcar · on June 30, 2022

How is Boost different from something like 0-clause BSD?

WalterBright · on June 30, 2022

I don't know what 0-clause BSD is. The Boost license is:

Boost Software License - Version 1.0 - August 17th, 2003

Permission is hereby granted, free of charge, to any person or organization obtaining a copy of the software and accompanying documentation covered by this license (the "Software") to use, reproduce, display, distribute, execute, and transmit the Software, and to prepare derivative works of the Software, and to permit third-parties to whom the Software is furnished to do so, all subject to the following:

The copyright notices in the Software and this entire statement, including the above license grant, this restriction and the following disclaimer, must be included in all copies of the Software, in whole or in part, and all derivative works of the Software, unless such copies or derivative works are solely in the form of machine-executable object code generated by a source language processor.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

lovelymono · on July 1, 2022

0-clause BSD goes even further, and completely omits the attribution requirement:

   Permission to use, copy, modify, and/or distribute this software for any
   purpose with or without fee is hereby granted.

   THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH
   REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY
   AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT,
   INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
   LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR
   OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
   PERFORMANCE OF THIS SOFTWARE.

bombcar · on July 1, 2022

Ah, so boost is somewhere between zero and one clause BSD.

notpushkin · on July 1, 2022

One major difference between Boost and e. g. MIT is, Boost allows you to omit attribution in binary form (but not source form).

elcomet · on July 1, 2022

This is a very "american" definition of freedom, which is basically, just let me do what I want.

GPL uses a different definition of freedom, which I prefer. They look at consequences of restrictions / permissions, and their implication on freedom (not just for me, but for everyone). So some restrictions can lead to actually more freedom, while some permissions can actually decrease freedom.

This is similar to gun-control. While it reduces freedom for gun owners, it allows everyone to be more free of hanging out anywhere they want without being afraid of being shot. Similar arguments can be made for vaccine mandates.

So GPL restricts usage of software because in the long term it gives back power to users, which will be more free.

BlitzGeology91 · on July 2, 2022

> This is a very "american" definition of freedom, which is basically, just let me do what I want.

Eh. I see what you’re saying about gun control, but the idea that “some restrictions can lead to actually more freedom, while some permissions can actually decrease freedom” is actually very American.

The free software movement says that everyone deserves software freedom. The Declaration of Independence similarly says “We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.” While I haven’t found a source confirming it, I think that the founders believed that the freedom of speech was one of these unalienable rights.

The GPL puts restrictions in place to make sure that downstream projects give users software freedom. The Constitution put restrictions in place to ensure that the federal (and nowadays the entire) government doesn’t interfere with our unalienable rights.

Take a look at how the first amendment is worded:

“Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the Government for a redress of grievances.”

The first amendment does not grant the freedom of speech because it doesn’t need to be granted. From the founders’ perspective, god already grants the freedom of speech to everyone forever. The key phrase here is “Congress shall make no law”. The first amendment is restricting Congress to ensure freedom.

The idea that “some permissions can actually decrease freedom” is also present in the Constitution. For example, take a look at Article I sections 8 and 9. The framers of the Constitution could have given Congress the power to pass any law. Instead, they chose to specifically enumerate what Congress can and cannot do.

Perhaps, though, most Americans don’t know much about our founding and think that freedom=just let me do what I want. I don’t know.

elcomet · on July 3, 2022

Thanks to put things in context, that was quite interesting!

zzo38computer · on July 1, 2022

I think GPL is a good idea (and ensures that everyone is having the freedom from a modified version of the code, and other things that it protects users from), but there is some problem too, such as I think it can be complicated to deal with.

For this reason, I had idea to make up a new license (although I will not write most of my ideas here but will do so elsewhere). But, its main working would be: mostly you can do whatever you want (including omitting attribution and copyright notices) without worrying about the license, but you cannot use legal processes (such as lawsuits, DMCA, etc) to prohibit these freedoms to any downstream recipients (regardless of how many). The license would also ensure patents can be used freely, disclaimer of warranty (if the license is included in the copy and the recipient has not paid for the copy), and some other things to ensure freedom (although there can be some restrictions on the use of trademarks (e.g. to avoid false advertising), and some things to avoid working around the freedoms in certain ways). You can be forgiven any number of times, though; the license will not be terminated. Furthermore, for a practical reason of license compatibility, relicensing by GPL3 and AGPL3 (and possibly also CC-BY-SA 4.0, for works other than computer programs) are also allowed, as long as you have a copy of the source code and can satisfy the terms of those licenses.

gerdesj · on June 30, 2022

"I also want to know why people think their code is so special that no one else could have ever come up with it independently. "

Really? What exactly does this CoPilot thing actually spit out? I can't help but think that it spits out near verbatim, which in the UK is probably dodgy on Copywrite.

You then go on to decide that the GPL isn't for you. That's fine. You even explain that you are ill-equipped for something. That too is fine.

You are not a fan of free or "libre" stuff. That comes across loud and clear. Thank you.

jen20 · on July 1, 2022

"Replicating" it is fine. Violating the license is not.

CRConrad · on July 2, 2022

> Forcing your code to be open forever isn't freedom, it's the omission of freedom.

On the contrary, forcing your code to be open forever is the only way to preserve freedom.

npteljes · on July 1, 2022

Forcing the code to be open is the kind of freedom where restricting locally something enables the freedom globally. Granting the freedom to do whatever with the code will make the code end up used in closed ways, empowering those who close the code.

A similar line of thought is the "paradox of tolerance", which posits that if a society tolerates the intolerant, the tolerance of that society will lessen.

https://en.wikipedia.org/wiki/Paradox_of_tolerance

CRConrad · on July 2, 2022

Second, shorter reply:

> I don't understand the choice to use GPL for any reason

No, you just don't understand the GPL.

Or, for some reason, pretend not to.

mountainriver · on July 1, 2022

Exactly, this is such griping nonsense from people who are scared of where the industry is heading.

UmbertoNoEco · on June 30, 2022

I love how you direct the question to the people putting their code out in their open and not to Microsoft. What is so special about Clippy et al?

hinkley · on July 1, 2022

Skipped right over the question on the table and asked a whole bunch of passive aggressive questions.

Truly a classic.

CRConrad · on July 2, 2022

You mean skipped right over the moronic statement thinly veiled as a question on the table? As if it deserved better.

nojito · on June 30, 2022

GPL is one of the least free licenses out there.

ssddanbrown · on June 30, 2022

Freedom of the code to its users vs Freedom of the users of the code

nojito · on July 1, 2022

There's no difference. Freedom is freedom. Adding qualifications to it inherently makes it less free.

danaris · on July 1, 2022

Freedom is not, and cannot be, an absolute. If I am 100% free, that by definition restricts the freedom of others (for instance, if I am free to punch you in the face, you are not free to not be punched in the face; if I am free to own you as a slave, you thus lose a lot of freedoms).

Determining what freedom should mean is not, and has never been, a simple matter of "well, if you make any restrictions on it, then it's not real freedom, so everyone just gets to be free!" It's all about finding balance, and dealing with nuance, and all that frustrating hard stuff.

sangnoir · on July 1, 2022

Whose freedom a license guarantees is a fundamental difference, as the freedoms can be in conflict.

Which is a freer society: one that restricts late night partiers from playing loud music in residential areas, or the one that does not?

JoshTriplett · on June 30, 2022

Note that while Copilot is a major motivator for this effort, it isn't the only one; there's a pile of other reasons listed at https://sfconservancy.org/GiveUpGitHub/ . GitHub and its lock-in has been a problem for a long time, and this is just the most recent problem.

_Algernon_ · on June 30, 2022

I mean the answer to that question is obvious: they're not under any obligation to include their own code in the training data. Why would they?

A better question would be whether they would take legal action against a competitor that creates a copilot equivalent and publicly states that they trained it on leaked, proprietary M$ source code. That would actually be an example of hypocrisy.

DJHenk · on July 1, 2022

> They're not under any obligation to include their own code in the training data. Why would they?

Because these models work better with more data and presumably this a lot of high quality data that they already have lying around anyway? Because there no downside according to their own reasoning? Because it would shut up a lot of these criticisms right away? Because marketing would be so much easier with that kind of dogfooding?

In short: because according to their own story there would be only upsides, no downsides.

noasaservice · on June 30, 2022

According to their logic, if I train a model using stolen Windows source code, it's fair use.

Just because they use FLOSS licenses, does not allow them to evade things like Affero GPL3. And, to that end, if they are using Affero, I want the source to the whole copilot infrastructure -or- proof they used no AGPL3 code anywhere.

comex · on June 30, 2022

Perhaps because there is a (small) risk of leaking confidential information through its output.

But that's not as damning as it sounds.

First, we know Copilot, if given the right prompt and told to autocomplete repeatedly without any manual input, can regurgitate bits of code seen many times in many different repositories, like the famous Quake fast inverse square root function and the text of licenses. That doesn't mean it does so under normal prompts and normal use. Perhaps it does sometimes, and that would be a real concern. But any regurgitation that isn't under normal use, which only happens if the user is trying to make Copilot regurgitate, is not a problem when it comes to copyright violations of open source code (since anyone trying to violate an open source license can do so much more easily without using Copilot), yet it may still be a problem when it comes to leaking confidential information.

Second, whether something is a copyright violation and whether it risks leaking confidential information are somewhat orthogonal. A copyright violation usually requires at least several lines of code, and more if the copying is not verbatim, or if the code is just a series of function calls which must be written near-verbatim in order to use an API. On the other hand, `const char PRIVATE_KEY[] = ` could hypothetically complete to something dangerous in just one line of code. That said, it almost certainly wouldn't, since even if a private key was stored in source code in the first place (obviously it shouldn't be), it probably wouldn't be repeated enough to be memorized by the model. Yet…

…third, the risk tolerances are different. If, to use completely made-up numbers, 0.1% of Copilot users commit minor copyright violations and 0.001% commit major ones, that's probably not a big deal considering how many copyright violations are committed by hand – sometimes intentionally, mostly unintentionally. (When it comes to unintentional ones, consider: Did you know that if you copy snippets from Stack Overflow, you're supposed to include attribution even in any binary packages you distribute, and also the resulting code is incompatible with several versions of the GPL? Did you know that if you distribute binaries of code written in Rust, you need to include a copy of the standard library's license?) But when it comes to leaking confidential information, even one user getting it would be somewhat bad (though admittedly Microsoft does distribute much of their source code privately to some parties), and taking even a small risk would be a questionable decision when there is a ready alternative.

mpol · on June 30, 2022

> Perhaps because there is a (small) risk of leaking confidential information through its output.

If Microsoft/Github ever made that argument, that also means that when Copilot is using GPL software as input, the output can only be released under the GPL.

stickfigure · on June 30, 2022

A short passphrase or key is not copyrightable, but definitely confidential.

emilfihlman · on July 1, 2022

Not true.

Copyright licenses don't apply to small snippets, no matter if you think they do, and learning and applying other people's code isn't prohibited by the license, and thank god, can't be prohibited.

natefinch · on June 30, 2022

FWIW, there are some (admittedly fairly naive) checks to prevent PII and other sensitive info from being suggested to users. Copilot looks for things like ssh keys, social security numbers, email addresses, etc, and removes them from the suggestions that get sent down to the client.

There's also a setting at https://github.com/settings/copilot (link only works if you've signed up for copilot) that will check any suggestion on the server against hashes of the training set, and block anything that exactly duplicates code in the training set (with a minimum length, so very common code doesn't get completely blocked). Users must choose the value for this setting when they sign up for copilot.

source: I work on copilot at github

coconuthacker42 · on June 30, 2022

I tried using copilot and it literally attributed the function i was writing to someone else even before I could start writing a line. its been updated since and these errors are rare now, but still exist

innocentoldguy · on June 30, 2022

Having worked on Windows code, I’m pretty sure you don’t want your Copilot-generated code tainted with all that cruft.

mr_toad · on July 1, 2022

> why are your Microsoft Windows and Office codebases not in your training set? This is my favorite question about Copilot ever.

While GitHub might have a license to use that code to train the model, it’s debatable what license applies to the output of the model, and what users of the model can do with it.

It’s possible for an AI to reproduce something so close to the original that it would be considered an infringement on the original work.

ipaddr · on June 30, 2022

My favorite answer is: if you included Microsoft source code the quality of suggestions drops below viable product.

josephcsible · on June 30, 2022

The reasons that Windows is awful have nothing to do with code quality. Windows is awful because of intentional choices Microsoft made (e.g., bloatware that gets reinstalled with every update, mandatory Microsoft accounts, and mandatory telemetry).

zelphirkalt · on June 30, 2022

Whenever I have to start windows 10, I still see the same kind of bugs, that were present on XP. One example: They seem to be simply unable to fix the icons "near the clock", which are still shown, when some app has been killed, until you hover over them. Things like that, but of course also lots of stuff that affects people more in form of annoyances, making every action take at least twice as long as on GNU/Linux distros I run. It only takes minutes, and I am already frustrated with the system, because everything takes so long to do.

s0l1dsnak3123 · on June 30, 2022

One similarly ignored bug that springs to mind is the performance of the "Send To" context menu item in File Explorer. I always dreaded dragging my mouse over it by accident.

notriddle · on June 30, 2022

That is a design flaw that cannot be fixed without breaking the API: https://devblogs.microsoft.com/oldnewthing/20190528-00/?p=10...

I'm pretty sure there are a lot of those.

warkdarrior · on July 1, 2022

Apparently nobody at Microsoft thought about the Windows registry as a database, which of course needs indexing to be performant.

account42 · on July 1, 2022

They could also cache that computed menu and proactively update the cache whenever the relevant keys are changed. Either way, pretty far from "cannot be fixed without breaking the API".

devwastaken · on June 30, 2022

Windows 11 is an example of poor code quality. Bugs everywhere, while the same things work on Ubuntu/popos.

Past MS engineers have been commenting for a decade on how MS has grown too big, can't manage, and has become a monolith "too big to fail". By nature when engineers are small pieces of a giant machine, they don't do their best work. And those with the experience move on to better things.

westoncb · on June 30, 2022

My experience has also been that Windows 11 is buggy (haven't been using it for a while because it can't even reliably connect to the internet). But also in my limited experience (just one install on a single machine in ~2020, used for a few months): Ubuntu is just as bad or even worse.

ordiel · on June 30, 2022

Your experience its quite limited and you probably need to know how to properly update ubuntu since most of the issues I've found with it (since I started using it ~12 years ago) are usually issues caused by lack of drivers (which gets solved in 15 minutes once you know where to click) once those are solved it is sturdy and you can keep it runing for several months without having to restart it or it becoming unusably slow as it hapens with windows systems after about 4 days of uptime

westoncb · on July 1, 2022

This comment is funny to me because it was up to date and the particular issue wasn’t driver related: it was specifically that after not touching it at all for a couple months each subsequent time I logged in it would randomly lock up, took about 15min to boot.

AshamedCaptain · on July 1, 2022

This is practically guaranteed to be "driver" related, unless it was only one specific binary that was "locking up".

westoncb · on July 1, 2022

It would lock up as in just take an extremely long time to do certain things in the UI. That sounds like a pretty odd way for a driver issue to manifest, but maybe I'm missing something.

jamiek88 · on June 30, 2022

Ah the old ‘you are holding it wrong’ of linux.

the_black_hand · on July 1, 2022

The biggest issue with Windows isn't poor code or shitty engineering, it's the support for legacy software. MS engineers are some of the smartest in the world. The devs can fix the code and make a much better OS but that would break boomer software used by big banks that haven't updated since the 80s. When Microsoft write code, it has to promise support for decades, that means having to maintain the same old outdated APIs for many years.

account42 · on July 1, 2022

Outdated APIs don't have to affect the shell and built in programs or anything else that is kept up to date. My linux programs are no more buggy due to having Wine installed for similar compat with legacy Windows executables.

alexklarjr · on June 30, 2022

Their code were awful about 30 years before that.

buchoo · on June 30, 2022

Was it? I recall the kuro5hin analysis of the leaked Windows 2000 source code[0] that said:

>there is nothing really surprising in this leak. Microsoft does not steal open-source code. Their older code is flaky, their modern code excellent. Their programmers are skilled and enthusiastic. Problems are generally due to a trade-off of current quality against vast hardware, software and backward compatibility.

[0] https://web.archive.org/web/20040401115821/http://www.kuro5h...

samatman · on June 30, 2022

30 years ago was 1992. That's the older flaky code they're referring to in the quote.

pragmatic · on June 30, 2022

In what way is Windows “awful”?

I can think of annoyances but awful? Come on.

karamanolev · on June 30, 2022

They explicitly listed the reasons they think it's awful. My personal grievances with Windows align more or less with theirs and while I wouldn't go as far as to say it's awful, I'd use something certainly stronger than "annoyance".

Specifically, clear anti-user choices that exceed by far being "annoying":

* Making it exceedingly difficult or impossible to use the OS without logging in with a Microsoft account.

* Forcing the user in various ways to surrender data to Microsoft. Some of them can be disabled if you really go out of your way, others can't.

* Prompting me again and again to switch to Edge and other MS defaults. I've had the same install for a few years now and NO, I don't want to change to "Microsoft recommended defaults", no matter how many times you ask me.

* Showing the same "OS setup" screen after some updates, requiring me to pay very close attention to what I'm clicking, lest I select something MS is trying to lead me to. The amount of attention required from the user on those screens corresponds quite well with anti-user behavior.

xtracto · on June 30, 2022

>Making it exceedingly difficult or impossible to use the OS without logging in with a Microsoft account

This is hilarious. I recently got a new laptop that has window$ 11. After setting it up with a Non Microsoft email (which required some good fight), I tries to install some random app from the Microsoft store, but got a "something went wrong please try again" on the first screen.

It's pathetic. I haven't used Windows since Win 7 , which I basically installed for gaming. Seeing the latest version of the OS makes me feel sorry for them. That's why Apple with all their assholery is eating their lunch (on the flip side my wife just got a MBP m1 and I was pleasantly surprised that it has hdmi port, magsafe, several USBc ports. Apple seems going in the right direction.)

CRConrad · on July 2, 2022

Windows 7 was the last tolerable version.

ballenf · on June 30, 2022

You haven't had root admin on Windows since Windows 7.

The telemetry makes this clear. Reboots and updates even more so.

The UI lag and stealing of focus ("oh, you're typing a document... too bad, I want to launch a new Explorer window that will immediately steal focus") make it clear that the computer is in charge and will probably listen to your requests, but on the timeline it chooses.

jamal-kumar · on July 1, 2022

FWIW I have work that's kind of bound to using Windows but I always run this on a fresh install and it is definitely helpful:

https://github.com/n1snt/Windows-Decrapifier

NicoJuicy · on June 30, 2022

Support for legacy features is always #1, code-wise

mmis1000 · on June 30, 2022

The default of windows already do compatibility in some crazy way. And the compatibility mode lies to the program about system version or even fake old bugs so program relies on bug will run. And I'd imagine. To make this work, ms would need tons of most shitty code you'd imagine in the source o fake those behaviors.

codefreeordie · on June 30, 2022

This is a funny and flippant answer, but is also nonsense.

alexklarjr · on June 30, 2022

I called it microsoft DNA. The way they do stuff is inhuman alien logic without any compassion or remorse (like all 10k+ their windows apis, or dontnet, or way they add features and handle support ).

iudqnolq · on July 1, 2022

It is however plausible that the code is only "good" given internal considerations. Microsoft has a specific internal coding styles designed to work with internal tools

omegalulw · on June 30, 2022

Do you have any evidence for this?

alexklarjr · on June 30, 2022

I don’t have windows 10 or 11, sorry.

lumost · on June 30, 2022

I’d be curious if you could get copilot to cough up its own answer on its legality.