I have a genuine question about this whole thing with Copilot:
A similar product, TabNine, has been around for years. It does essentially the exact same thing as Copilot, it’s trained on essentially the same dataset, and it gets mentioned in just about every thread on here that talks about AI code generation. (It’s a really cool product btw and I’ve been using and loving it for years). According to their website they have over 1M active users.
Why is this suddenly a huge big deal and why is everyone suddenly freaking out about Copilot? Is it because it’s GitHub and Microsoft and OpenAI behind Copilot vs some small startup you’ve never heard of? Is it just that the people freaking out weren’t paying attention and didn’t realize this service already existed?
The feature of TabNine that uses the "public" dataset is optional. It can also provide completions only based on local code. That optionality is important.
Also, tabnine has a smaller scope; you type "var " and it suggests a variable name and possibly the rest of the line, like autocomplete has been doing for decades. Perfectly normal.
My understanding of copilot is that you can type "// here's a high-level description of my problem" and it'll fill out entire functions, dozens of lines. The scope is much grander.
For many, the question is about the code quality as well. Having an AI write substantial chunks of code based on the work of "the average github committer" is being criticized as a problem for security, correctness, and understanding.
I think such arguments are a little overheated, but I do have my copy of tabnine configured to use only local code because depending on the full dataset (which is available over the could only IIRC) seemed like it was going to be more work than it saved.
Because the repository trusted by millions is starting to do things we never anticipated. It's growing in ways that are a touch uncomfortable for some.
I think some are also beginning to feel an Amazonification happening. We built all the stuff and made it free, but now a company is going to own it and profit off of it.
Edit: If we want to prevent this, we need a new license that states our code may not be included in deep learning training sets.
Edit 2: if private repository code is in this training set, it may be possible to leak details of private company infrastructure. Models can leak training data.
GitHub has more visibility and yes, more scrutiny. But that doesn’t mean TabNine would’ve survived without scrutiny, especially after an acquisition. The fact is, size matters.
A similar product, TabNine, has been around for years. It does essentially the exact same thing as Copilot, it’s trained on essentially the same dataset, and it gets mentioned in just about every thread on here that talks about AI code generation. (It’s a really cool product btw and I’ve been using and loving it for years). According to their website they have over 1M active users.
Why is this suddenly a huge big deal and why is everyone suddenly freaking out about Copilot? Is it because it’s GitHub and Microsoft and OpenAI behind Copilot vs some small startup you’ve never heard of? Is it just that the people freaking out weren’t paying attention and didn’t realize this service already existed?