I have a genuine question about this whole thing with Copilot: A similar product...

rdw · on July 3, 2021

The feature of TabNine that uses the "public" dataset is optional. It can also provide completions only based on local code. That optionality is important.

Also, tabnine has a smaller scope; you type "var " and it suggests a variable name and possibly the rest of the line, like autocomplete has been doing for decades. Perfectly normal.

My understanding of copilot is that you can type "// here's a high-level description of my problem" and it'll fill out entire functions, dozens of lines. The scope is much grander.

paulgb · on July 3, 2021

> It can also provide completions only based on local code. That optionality is important.

I don’t see how? The question is about the ethics of building such a tool, not whether anyone is forced to use it.

rdw · on July 4, 2021

For many, the question is about the code quality as well. Having an AI write substantial chunks of code based on the work of "the average github committer" is being criticized as a problem for security, correctness, and understanding.

I think such arguments are a little overheated, but I do have my copy of tabnine configured to use only local code because depending on the full dataset (which is available over the could only IIRC) seemed like it was going to be more work than it saved.

Lariscus · on July 3, 2021

Yes, and it is also not an OK thing to do for the start-up. They were just lucky that nobody noticed their licence violations.

echelon · on July 3, 2021

Because the repository trusted by millions is starting to do things we never anticipated. It's growing in ways that are a touch uncomfortable for some.

I think some are also beginning to feel an Amazonification happening. We built all the stuff and made it free, but now a company is going to own it and profit off of it.

Edit: If we want to prevent this, we need a new license that states our code may not be included in deep learning training sets.

Edit 2: if private repository code is in this training set, it may be possible to leak details of private company infrastructure. Models can leak training data.

gavinhoward · on July 3, 2021

I personally have never heard of TabNine until now. Now that I have, I don't want my code to be part of that.

jchw · on July 3, 2021

GitHub has more visibility and yes, more scrutiny. But that doesn’t mean TabNine would’ve survived without scrutiny, especially after an acquisition. The fact is, size matters.

moocowtruck · on July 3, 2021

it's just what the community does these days, bored and have to be upset about something, and being upset at big companies is trendy