In all honesty, this sounds to me like a whole lot of BS and hype. The name is pretentious, the quotes are ridiculous ("Deep Learning est mort. Vive Differentiable Programming."; "there will be a need for the creation of a whole new set of new tools, such as a new Git, new IDEs, and of course new programming languages"). Maybe I am just ignorant and fail to grasp the importance of such capabilities in a machine learning context, but I have to say, the grandstanding is a bit grating.
Who knows, perhaps this will become the greatest thing since Lisp... What do I know.
A particularly salty developer I knew ages and ages ago once said (apocryphally, apparently) that there was an old Inuit proverb, “everyone likes the smell of their own farts.”
Google is still taking themselves very seriously while everyone else is starting to get bored.
The problem with being 25 is that you have about 8 years ahead of you before you figure out how full of shit everyone is in their twenties, and maybe another 8 before you figure out that everyone is full of shit and stop worrying quite so much about it.
>The problem with being 25 is that you have about 8 years ahead of you before you figure out how full of shit everyone is in their twenties, and maybe another 8 before you figure out that everyone is full of shit and stop worrying quite so much about it.
Boy there is some truth right here. It wasn't until long after I graduated undergrad that I realized just how much bullshit is out there. Even in the science world! When I started actually reading the methodology of studies with impressive sounding conclusions, I realized that easily 30-60% were just garbage. The specific journal really, really matters. I'd say 90% of science journalism targeting laymen is just absolute bullshit.
I started actually chasing down wikipedia citations and OMG they are bad!! Half are broken links, a large fraction don't support the conclusions they're being used for, and a massive fraction are really dubious sources.
I realized that so many people I respected are so full of shit.
I realized that so many of MY OWN OPINIONS were bullshit. And they STILL are. I hold so few opinions that are genuinely well-reasoned and substantiated. They are so shallow.
Yet, this is just how the world works. Human intuition is a hell of a drug. A lot of the people I respect tend to be right, but for all the wrong reasons. It's SOOOO rare to find people that can REALLY back up their thinking on something.
Steve Jobs had a slightly more optimistic way to say it: “ Everything around you that you call life was made up by people that were no smarter than you.”
Tight integration with the language may be beneficial in making it simpler to write, but its not like you can't do this already in other languages. Baking it into the language might be useful to make it more popular. Nobody should be doing the chain rule by hand in the 21st century.
If we already have useful phrases like "embedded programming", "numerical programming", "systems programming", or "CRUD programming", etc, I'm not seeing the pretentiousness of "differentiable programming". If you program embedded chips, we often call it "embedded programming"; likewise, if you write programs where differentials are a 1st-class syntax concept, I'm not seeing the harm in calling it "differentiable programming" -- because that basically describes the specialization.
>the quotes are ridiculous ("Deep Learning est mort. Vive Differentiable Programming.";
The whole point is you’re supposed to have the same “something” on both sides (X is dead, long live X), to indicate it’s not a totally new thing but a significant shift in how it’s done
The most well known one:
> The King is dead. Long live the King!
If you change one side, you’re removing the tongue-in-cheek nature of it, and it does sound pretty pretentious.
Not really. The sentence with the King was used to mean that the new King immediately got into function.
> Le Roi (Louis ##) est mort. Vive le Roi (Louis ## + 1) !
Using the sentence with Deep Learning and Differential Learning just suggests that Differential Learning is the heir/successor/evolution of Deep Learning. It does not imply that they are the same thing.
As a French person used to the saying, Le Cun probably meant that.
It does, actually. What matters is that there is a king, not who is the king.
The saying works because, as you say, it suggests a successor, but the successor has to use the same title, because what people want is a new king, so that nothing changes and they can live as they did before the king was dead, not a revolution with a civil war tainted in blood.
If you do volontarily change the title, it's because you think the new one will be better, which is pretentious.
> Using the sentence with Deep Learning and Differential Learning just shows how Differential Learning is an evolution. It does not imply that they are the same thing.
... where did I imply it means they're the same thing?
From my comment:
> indicate it’s not a totally new thing but a significant shift in how it’s done
You could say, an evolution?
-
A snowclone is a statement in a certain form. The relevant form that English speakers use (I'm not a French person, and this is an English article) is "X is dead, long live X", where both are X.
That's where the "joking" the above comment is referring to comes from, it sounds "nonsensical" if you take it literally.
If you change one X to Y, suddenly there's no tongue-in-cheek aspect, you're just saying "that thing sucks, this is the new hotness".
I suspect the author just missed that nuance or got caught up in their excitement, but the whole point of a snowclone is it has a formula, and by customizing the variable parts of that formula, you add a new subtle meaning or tint to the statement.
They’re kinda fucked because differential analysis was coined long ago to describe a set of techniques for attacking bad cryptography.
Differential programming would be less flashy but may be confusing.
I wouldn’t actually be interested in this topic much except for the top level comment complaining about them wanting a new version control system for this and now I’m a bit curious what they’re on about this time, so will probably get sucked in.
"differential" and "differentiable" are different words. Both are being used correctly. Is the problem only that the two words look kind of similar? That seems like an impractical requirement.
God I hate the trend of using a prefix on its own as a stand in for '<prefix><thing>'. I think it's a symptom of politicians trying to sound stupid to avoid sounding elitist.
What do you think will have more impact on the economy, crypto or cyber?
I think using the shorten version is perfectly valid only provided the context is correct.
If you are talking to someone about cryptocurrency, referring to it as crypto later in the conversation in context is perfectly valid and doesn't lessen the meaning.
I do however agree with you when outside of it's context that these shortened names are horrible and effectively buzzwords.
Why would a language where it is possible to manipulate certain functions to get their derivatives require a new revision control system or IDE?
And speaking of Lisp - wasn't symbolic differentiation a fairly common thing in Lisp? (basically as a neat example of what you can do once your code is easy to manipulate as data).
Symbolic differentiation is a fairly common exercise but it is inefficient (the size of the derivative grows exponentially in the size of the original expression). "Automatic differentiation" is the term for the class of algorithms usually used in practice, which are more efficient while still being exact (for whatever "exact" means when you're using floating point :-)
AD still explodes for “interesting” derivatives: efficiently computing the adjoint of the Jacobian is NP-complete. And, naturally, the Jacobian is what you want when doing machine learning. There’s papers from the mid-90’s discussing the difficulties in adding AD Jacobian operators to programming languages to support neural networks. This article is just rehashing 25 year old problems.
Correction: Finding the optimal algorithm (minimal number of operations) for computing a Jacobian is NP-complete, but evaluating it in a multiple of the cost of a forward evaluation is standard.
Also, many optimizers that are popular in ML only need gradients (in which case the Jacobian is just the gradient vector). Second order methods are important in applications with ill-conditioning (such as bundle adjustment or large-scale GPR), but they have lots of exploitable structure/sparsity. The situation is not nearly as dire as you suggest.
Around 2000 I was accidentally inventing a Bloom filter variant (to this day I don’t know how I missed the Google papers at the time) for doing a large set intersection test between two machines.
Somehow, I ended up with a calculus equation for determining the right number of bits per entry and rounds to do to winnow the lists, for any given pair of machines where machine A found n entries and machine B found m. But I couldn’t solve it. Then I discovered that even though I did poorly at calculus, I still remembered more than anyone else on the team, and then couldn’t find help from any other engineer in the building either.
Eventually I located a QA person who used to TA calculus. She informed me that my equation probably could not be solved by hand. I gave it another day or so and then gave up. If I couldn’t do it by hand I wasn’t going to be able to write a heuristic for it anyway.
For years, this would be the longest period in my programming career where I didn’t touch a computer. I just sat with pen and paper pounding away at it and getting nowhere. And that’s also the last time I knowingly touched calculus at work.
(although you might argue some of my data vis discussions amount to determining whether we show the either the sum or rate of change of a trend line to explain it better. The S curve that shows up so often in project progress charts is just the integral of a normal distribution, after all)
Thanks - I didn't think it was literally doing symbolic differentiation (I don't work in the area so literally had no idea) - but the basic idea that you apply some process to your code to get some other code doesn't sound that surprising to anyone who has used lisp (and I used to write tools in lisp to write numerical engineering simulations - admittedly a long time ago)
It is different. DVC is a server-less management tool that helps you organize and link to your storage backends. And move data from these backends to your workspace. Git LFS requires a dedicated server. And you can store data only on that server, instead of moving data between a multitude of storage backends (like Google Drive, S3, GCP, local drive).
Mind you - DVC seems to be a platform on top of git rather than a replacement for git. So I'd argue that it's not really a new revision control system.
For the IDE part they have written some cool debuggers that help you understand the differentiation part and catch bugs on it. But I'm not sure why you couldn't just use the debugger, instead of a whole new IDE, much less why you would need a new RCS
In essence there are cases outside the well developed uses (CNN, LSTM etc.) such as Neural ODEs where you need to mix different tools (ODE solvers and neural networks) and the ability to do Differentiable Programming is helpful otherwise it is harder the get gradients.
The way I can see it being useful is that it helps speed up development work so we can explore more architectures, again Neural ODEs being a great example.
Differential programming is about building software that is differentiable end-to-end, so that optimal solutions can be calculated with gradient descent.
Probabilistic programming (which is a bit more vague) is about specifying probabilistic models in an elegant and consistent way (which than then be used for training and inference.)
So, you can build some kinds of probabilistic programs with differential programming languages, but not vice versa.
Swift was railroaded in Google by Chris Lattner, who has since left Google and S4TF is on death watch. No one is really using it and it hasn't delivered anything useful in 2.5 years
Does that really matter that not many people use it? Apple's carve-out of Objective C from the broader C ecosystem spanned something like 25 years.
Sure, the 90's were a rough period for them, but I think a series of failed OS strategies and technical debt are more responsible for that than just what language they used.
You could argue that their ambitions re Swift scaling from scripting all the way to writing an entire OS might never grow substantially outside Apple, but there's also the teaching aspect to think about.
"Objective C without C" removes a whole class of problems people have in just getting code to run, and I'll bet it shapes their mind in how they think about what to be concerned about in their code v what's just noise.
Sometimes things take a little longer to develop. I don't know who will create it, but from my perspective, the need for a statically typed "differentiable" language is extremely high and C++ is not it.
> the need for a statically typed "differentiable" language is extremely high
This is not what Google has found, actually. Teams who wanted to use this for research found that a static language is not flexible enough when they want to generate graphs at runtime. This is apparently pretty common these days, and obviously Python allows it. Especially with JAX that traces code for autodiff
Or has at least found that existing solutions for statically typed "differentiable" programming are ineffective, and I'd agree.
But having some way to check types/properties of tensors that you are doing operations to would really help to make sure you don't get your one hidden dimension accidentally switched with the other or something. Some of these problems are silent and need something other than dynamic runtime checking to find them, even if it's just a bolt-on type checker to python.
There are a lot of issues with our current approach of just using memory and indexed dimensions. [0]
Flexible enough or just... former statisticians have enough on their plate than learning programming so lets use the simplest popular language in existence?
The article gives specific and reasonable motivations for why you might want a new language for machine learning. There are already new tools. Like Pytorch. If you have never used Jupyter notebooks instead of an IDE, give it a try for a few weeks. It was the biggest boost I've seen to my coding productivity in literally decades. A new Git? I don't quite get that one. But considering the author arguably already got three out of his four claims right, maybe there is some reasoning behind that Git claim too.
> If you have never used Jupyter notebooks instead of an IDE, give it a try for a few weeks. It was the biggest boost I've seen to my coding productivity in literally decades
Really? I seem to shoot myself in the foot a lot with jupyter notebook. I can't count the number of times my snippet was not working and it was because I was reusing some variable name from some other cell that no longer exists. The amount of bugs I get in a notebook is ridiculous. Of course, I'm probably using it wrong
If you cant do a "restart kernel and run all cells" without errors, your notebook is not in a good state. But somehow people dont seem to do this regularly and then complain that notebooks are terrible, when its their own process that is shooting them in the foot.
Imo people’s love of Jupyter notebooks is another one of those “this is My Thing and I love it despite the flaws” situations.
Jupyter notebooks are painful to read, the allow you to do silly stuff all too easily, they’re nightmarish to debug, don’t play well with git, and almost every. Single. One. I’ve ever seen my teammates write eschewed almost every software engineering principle possible.
You’re not using them wrong, they shepard you to working very “fast and loose” and that’s a knife edge that you have to hope gets you to your destination before everything falls apart at the seams.
Instead of building things to make Notebooks play nicely with git, why not relegate notebooks to explicitly local exploratory work and when something needs to be deployed have it be turned into proper scripts/programs?
That's on your teammates not the technology. Like any code if you want it to be readable and robust you have to spend the time cleaning up and refactoring the it. Lots of notebooks are easy to read and run reliably.
Did you happen to see a thread on here a week or so ago about “It’s not what programming languages let you do, it’s what they shepard you to do”? (Strongly paraphrased there)
That’s my issue with Jupyter notebooks, between them and Python they implicitly encourage you to take all kinds of shortcuts and hacks.
Yes, it’s on my teammates for writing poor code, but it’s on those tools for encouraging that behaviour. It’s like the C vs Rust debate right: yes people should write secure code, and free their memory properly and not write code that has data races in it, but in the majority of cases, they don’t.
I didn't see that thread. Based on my experience, I don't really buy the premise. I'm not saying different languages can't somewhat nudge you a tiny bit towards better practices. Or simply not allow you to do certain things, which isn't really sheparding is it? But the vast majority of great engineering I have seen is mostly about the team and the many decisions they have to make in the course of one day. Which quickly adds up.
Quality engineering mostly comes from people, not languages. It is about your own personal values, and then the values of the team you are on. If there were a magic bullet programming language that guided everyone away from poor code and it did not have tradeoffs like a hugely steep learning curve (hi Haskell) then you would see businesses quickly moving in that direction. Such a mythical language would offer a clear competitive advantage to any company who adopted it.
What you are looking at really is not good vs. bad, but tradeoffs. A language that allows you to take shortcuts and use hacks sounds like it could get you to your destination quicker sometimes. That's really valuable if your goal is to run many throw-away experiments before you land on a solution that is worth spending time on improving the code.
Yeah really, but I'm a very experienced dev so what I'm getting from it is likely very different from your experience. Consider looking into git or some other version control. If you are deleting stuff that breaks your code, you want to be able to go back to a working version, or at least look at the code from the last working version so you can see what how you broke it.
Hi, author here! The git thing is regarding model versioning. The managing of a ton of very large and slightly different binary blobs is not git's strong point imo.
There are a ton of tools trying to fill this void, and they usually provide things like the comparison of different metrics between models versions, which git doesn't provide.
Differentiable programming allows one to specify any parametrized function and allows one to use optimization to learn the objective.
There is definitely some need for an EDSL of some sort, but I think a general method is pretty useless. Being able to arbitrarily come up with automatic jacobians for a function isn't really language specific, and usually much better results are obtained using manually calculated jacobians. By starting from scratch you lose all the language theory poured into all the pre-existing languages.
Be hard on the article, but easier on the concept - I think there is a lot of potential for differential programming w/ Swift, but this article is not a good advocate.
I would add that verbatim text is sometimes hard to read because it doesn't wrap long lines and small screens require the reader to scroll horizontally so try not to use it for large/wide blocks of text.
Also bullet lists are usually written as separate, ordinary paragraphs for each item with an asterisk or dash as the paragraphs first character.
Who knows, perhaps this will become the greatest thing since Lisp... What do I know.