Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In all honesty, this sounds to me like a whole lot of BS and hype. The name is pretentious, the quotes are ridiculous ("Deep Learning est mort. Vive Differentiable Programming."; "there will be a need for the creation of a whole new set of new tools, such as a new Git, new IDEs, and of course new programming languages"). Maybe I am just ignorant and fail to grasp the importance of such capabilities in a machine learning context, but I have to say, the grandstanding is a bit grating.

Who knows, perhaps this will become the greatest thing since Lisp... What do I know.



A particularly salty developer I knew ages and ages ago once said (apocryphally, apparently) that there was an old Inuit proverb, “everyone likes the smell of their own farts.”

Google is still taking themselves very seriously while everyone else is starting to get bored.

The problem with being 25 is that you have about 8 years ahead of you before you figure out how full of shit everyone is in their twenties, and maybe another 8 before you figure out that everyone is full of shit and stop worrying quite so much about it.


>The problem with being 25 is that you have about 8 years ahead of you before you figure out how full of shit everyone is in their twenties, and maybe another 8 before you figure out that everyone is full of shit and stop worrying quite so much about it.

Boy there is some truth right here. It wasn't until long after I graduated undergrad that I realized just how much bullshit is out there. Even in the science world! When I started actually reading the methodology of studies with impressive sounding conclusions, I realized that easily 30-60% were just garbage. The specific journal really, really matters. I'd say 90% of science journalism targeting laymen is just absolute bullshit.

I started actually chasing down wikipedia citations and OMG they are bad!! Half are broken links, a large fraction don't support the conclusions they're being used for, and a massive fraction are really dubious sources.

I realized that so many people I respected are so full of shit.

I realized that so many of MY OWN OPINIONS were bullshit. And they STILL are. I hold so few opinions that are genuinely well-reasoned and substantiated. They are so shallow.

Yet, this is just how the world works. Human intuition is a hell of a drug. A lot of the people I respect tend to be right, but for all the wrong reasons. It's SOOOO rare to find people that can REALLY back up their thinking on something.


There's safety in numbers. In my experience people in general are pretty smart; there're lots of wolves in sheep's clothing.

Those wolves were the ones buying N95 Masks in January; Buying Flonase (OTC Glucocorticoid) 'just in case'.

If the alternative non-BS opinion/belief is un-popular (i.e. coronavirus is serious), it's easier and safer to just check out; tend your own garden.


It was the 16th century Inuit philosopher Erasmus:

http://www.artandpopularculture.com/Suus_cuique_crepitus_ben...


A version of that saying I like is: “success is like a fart; only your own smells good.”


Who in the world thinks their own farts smell good? Yeesh.


Maybe you just dont like success !


Steve Jobs had a slightly more optimistic way to say it: “ Everything around you that you call life was made up by people that were no smarter than you.”


Sturgeon's law applies both spatially and temporally.


This doesn't seem to be crap, but it does seem to be hype.

You can use CasADI to do automatic differentiation in C++, Python and Matlab today:

https://web.casadi.org/

Tight integration with the language may be beneficial in making it simpler to write, but its not like you can't do this already in other languages. Baking it into the language might be useful to make it more popular. Nobody should be doing the chain rule by hand in the 21st century.


"Point of view is worth 80 IQ points."

Make that ±80…


>The name is pretentious,

If we already have useful phrases like "embedded programming", "numerical programming", "systems programming", or "CRUD programming", etc, I'm not seeing the pretentiousness of "differentiable programming". If you program embedded chips, we often call it "embedded programming"; likewise, if you write programs where differentials are a 1st-class syntax concept, I'm not seeing the harm in calling it "differentiable programming" -- because that basically describes the specialization.

>the quotes are ridiculous ("Deep Learning est mort. Vive Differentiable Programming.";

Fyi, it's a joking type of rhetorical technique called a "snowclone". Previous comment about a similar phrase: https://news.ycombinator.com/item?id=11455219


I feel like calling it a snowclone is a stretch.

The whole point is you’re supposed to have the same “something” on both sides (X is dead, long live X), to indicate it’s not a totally new thing but a significant shift in how it’s done

The most well known one:

> The King is dead. Long live the King!

If you change one side, you’re removing the tongue-in-cheek nature of it, and it does sound pretty pretentious.


Not really. The sentence with the King was used to mean that the new King immediately got into function.

> Le Roi (Louis ##) est mort. Vive le Roi (Louis ## + 1) !

Using the sentence with Deep Learning and Differential Learning just suggests that Differential Learning is the heir/successor/evolution of Deep Learning. It does not imply that they are the same thing.

As a French person used to the saying, Le Cun probably meant that.


It does, actually. What matters is that there is a king, not who is the king.

The saying works because, as you say, it suggests a successor, but the successor has to use the same title, because what people want is a new king, so that nothing changes and they can live as they did before the king was dead, not a revolution with a civil war tainted in blood.

If you do volontarily change the title, it's because you think the new one will be better, which is pretentious.


> Using the sentence with Deep Learning and Differential Learning just shows how Differential Learning is an evolution. It does not imply that they are the same thing.

... where did I imply it means they're the same thing?

From my comment:

> indicate it’s not a totally new thing but a significant shift in how it’s done

You could say, an evolution?

-

A snowclone is a statement in a certain form. The relevant form that English speakers use (I'm not a French person, and this is an English article) is "X is dead, long live X", where both are X.

That's where the "joking" the above comment is referring to comes from, it sounds "nonsensical" if you take it literally.

If you change one X to Y, suddenly there's no tongue-in-cheek aspect, you're just saying "that thing sucks, this is the new hotness".

I suspect the author just missed that nuance or got caught up in their excitement, but the whole point of a snowclone is it has a formula, and by customizing the variable parts of that formula, you add a new subtle meaning or tint to the statement.


They’re kinda fucked because differential analysis was coined long ago to describe a set of techniques for attacking bad cryptography.

Differential programming would be less flashy but may be confusing.

I wouldn’t actually be interested in this topic much except for the top level comment complaining about them wanting a new version control system for this and now I’m a bit curious what they’re on about this time, so will probably get sucked in.


"differential" and "differentiable" are different words. Both are being used correctly. Is the problem only that the two words look kind of similar? That seems like an impractical requirement.


Yea, cryptocurrencies will never be known as crypto. The term was coined long ago as a short hand for cryptography.


God I hate the trend of using a prefix on its own as a stand in for '<prefix><thing>'. I think it's a symptom of politicians trying to sound stupid to avoid sounding elitist.

What do you think will have more impact on the economy, crypto or cyber?


I think using the shorten version is perfectly valid only provided the context is correct.

If you are talking to someone about cryptocurrency, referring to it as crypto later in the conversation in context is perfectly valid and doesn't lessen the meaning.

I do however agree with you when outside of it's context that these shortened names are horrible and effectively buzzwords.


Why would a language where it is possible to manipulate certain functions to get their derivatives require a new revision control system or IDE?

And speaking of Lisp - wasn't symbolic differentiation a fairly common thing in Lisp? (basically as a neat example of what you can do once your code is easy to manipulate as data).


Symbolic differentiation is a fairly common exercise but it is inefficient (the size of the derivative grows exponentially in the size of the original expression). "Automatic differentiation" is the term for the class of algorithms usually used in practice, which are more efficient while still being exact (for whatever "exact" means when you're using floating point :-)


AD still explodes for “interesting” derivatives: efficiently computing the adjoint of the Jacobian is NP-complete. And, naturally, the Jacobian is what you want when doing machine learning. There’s papers from the mid-90’s discussing the difficulties in adding AD Jacobian operators to programming languages to support neural networks. This article is just rehashing 25 year old problems.


Correction: Finding the optimal algorithm (minimal number of operations) for computing a Jacobian is NP-complete, but evaluating it in a multiple of the cost of a forward evaluation is standard.

Also, many optimizers that are popular in ML only need gradients (in which case the Jacobian is just the gradient vector). Second order methods are important in applications with ill-conditioning (such as bundle adjustment or large-scale GPR), but they have lots of exploitable structure/sparsity. The situation is not nearly as dire as you suggest.


Yep; I was imprecise!


Around 2000 I was accidentally inventing a Bloom filter variant (to this day I don’t know how I missed the Google papers at the time) for doing a large set intersection test between two machines.

Somehow, I ended up with a calculus equation for determining the right number of bits per entry and rounds to do to winnow the lists, for any given pair of machines where machine A found n entries and machine B found m. But I couldn’t solve it. Then I discovered that even though I did poorly at calculus, I still remembered more than anyone else on the team, and then couldn’t find help from any other engineer in the building either.

Eventually I located a QA person who used to TA calculus. She informed me that my equation probably could not be solved by hand. I gave it another day or so and then gave up. If I couldn’t do it by hand I wasn’t going to be able to write a heuristic for it anyway.

For years, this would be the longest period in my programming career where I didn’t touch a computer. I just sat with pen and paper pounding away at it and getting nowhere. And that’s also the last time I knowingly touched calculus at work.

(although you might argue some of my data vis discussions amount to determining whether we show the either the sum or rate of change of a trend line to explain it better. The S curve that shows up so often in project progress charts is just the integral of a normal distribution, after all)


The Jacobian is used all the time, but where do you end up needing the adjoint?


What's a link to that paper?



Thanks - I didn't think it was literally doing symbolic differentiation (I don't work in the area so literally had no idea) - but the basic idea that you apply some process to your code to get some other code doesn't sound that surprising to anyone who has used lisp (and I used to write tools in lisp to write numerical engineering simulations - admittedly a long time ago)


> the size of the derivative grows exponentially in the size of the original expression.

Only if you treat it as a tree, not a DAG.

edit: Sorry no, it's still linear, even for a tree.


exp(x) ?


A quick Google revealed DVC (Data Version Control): https://dvc.org/

Smashing your ML models into git or other text-based VCS probably isn't the best way to do it


Its not clear how thats different from git's LFS.


It is different. DVC is a server-less management tool that helps you organize and link to your storage backends. And move data from these backends to your workspace. Git LFS requires a dedicated server. And you can store data only on that server, instead of moving data between a multitude of storage backends (like Google Drive, S3, GCP, local drive).


To be fair they do have a page giving a comparison with git LFS and other related approaches to storing large files:

https://dvc.org/doc/understanding-dvc/related-technologies#g...

Mind you - DVC seems to be a platform on top of git rather than a replacement for git. So I'd argue that it's not really a new revision control system.


For the IDE part they have written some cool debuggers that help you understand the differentiation part and catch bugs on it. But I'm not sure why you couldn't just use the debugger, instead of a whole new IDE, much less why you would need a new RCS


You can refer to a talk about Differentiable Programming here: https://www.youtube.com/watch?v=Sv3d0k7wWHk the talk is for Julia, albeit the principles are general. If you skip to https://youtu.be/Sv3d0k7wWHk?t=2877 you can see a question asked: why would you need this.

In essence there are cases outside the well developed uses (CNN, LSTM etc.) such as Neural ODEs where you need to mix different tools (ODE solvers and neural networks) and the ability to do Differentiable Programming is helpful otherwise it is harder the get gradients.

The way I can see it being useful is that it helps speed up development work so we can explore more architectures, again Neural ODEs being a great example.


Is it very different from probabilistic programming (a term that is both older and easier to understand)?

Erik Meijer gave two great talks on the concept.

https://m.youtube.com/watch?v=NKeHrApPWlo

https://m.youtube.com/watch?v=13eYMhuvmXE


Yes, these are two very different things.

Differential programming is about building software that is differentiable end-to-end, so that optimal solutions can be calculated with gradient descent.

Probabilistic programming (which is a bit more vague) is about specifying probabilistic models in an elegant and consistent way (which than then be used for training and inference.)

So, you can build some kinds of probabilistic programs with differential programming languages, but not vice versa.


You're right, it sounds like BS because it's BS.

Swift was railroaded in Google by Chris Lattner, who has since left Google and S4TF is on death watch. No one is really using it and it hasn't delivered anything useful in 2.5 years


Does that really matter that not many people use it? Apple's carve-out of Objective C from the broader C ecosystem spanned something like 25 years.

Sure, the 90's were a rough period for them, but I think a series of failed OS strategies and technical debt are more responsible for that than just what language they used.

You could argue that their ambitions re Swift scaling from scripting all the way to writing an entire OS might never grow substantially outside Apple, but there's also the teaching aspect to think about.

"Objective C without C" removes a whole class of problems people have in just getting code to run, and I'll bet it shapes their mind in how they think about what to be concerned about in their code v what's just noise.


Sometimes things take a little longer to develop. I don't know who will create it, but from my perspective, the need for a statically typed "differentiable" language is extremely high and C++ is not it.


> the need for a statically typed "differentiable" language is extremely high

This is not what Google has found, actually. Teams who wanted to use this for research found that a static language is not flexible enough when they want to generate graphs at runtime. This is apparently pretty common these days, and obviously Python allows it. Especially with JAX that traces code for autodiff


> This is not what Google has found, actually.

Or has at least found that existing solutions for statically typed "differentiable" programming are ineffective, and I'd agree.

But having some way to check types/properties of tensors that you are doing operations to would really help to make sure you don't get your one hidden dimension accidentally switched with the other or something. Some of these problems are silent and need something other than dynamic runtime checking to find them, even if it's just a bolt-on type checker to python.

There are a lot of issues with our current approach of just using memory and indexed dimensions. [0]

[0]: http://nlp.seas.harvard.edu/NamedTensor


Flexible enough or just... former statisticians have enough on their plate than learning programming so lets use the simplest popular language in existence?


AIUI they hired another huge contributor to LLVM/Clang/Swift to work on it so I'm not so sure it's on death watch.


The article gives specific and reasonable motivations for why you might want a new language for machine learning. There are already new tools. Like Pytorch. If you have never used Jupyter notebooks instead of an IDE, give it a try for a few weeks. It was the biggest boost I've seen to my coding productivity in literally decades. A new Git? I don't quite get that one. But considering the author arguably already got three out of his four claims right, maybe there is some reasoning behind that Git claim too.


> If you have never used Jupyter notebooks instead of an IDE, give it a try for a few weeks. It was the biggest boost I've seen to my coding productivity in literally decades

Really? I seem to shoot myself in the foot a lot with jupyter notebook. I can't count the number of times my snippet was not working and it was because I was reusing some variable name from some other cell that no longer exists. The amount of bugs I get in a notebook is ridiculous. Of course, I'm probably using it wrong


If you cant do a "restart kernel and run all cells" without errors, your notebook is not in a good state. But somehow people dont seem to do this regularly and then complain that notebooks are terrible, when its their own process that is shooting them in the foot.


Imo people’s love of Jupyter notebooks is another one of those “this is My Thing and I love it despite the flaws” situations.

Jupyter notebooks are painful to read, the allow you to do silly stuff all too easily, they’re nightmarish to debug, don’t play well with git, and almost every. Single. One. I’ve ever seen my teammates write eschewed almost every software engineering principle possible.

You’re not using them wrong, they shepard you to working very “fast and loose” and that’s a knife edge that you have to hope gets you to your destination before everything falls apart at the seams.


> don’t play well with git

I agree, and this is why I built -

- ReviewNB - Code review tool for Jupyter notebooks(think rich diffs and commenting on notebook cells)

- GitPlus - A JupyterLab extension to push commits & create GitHub pull requests from JupyterLab.

[1] https://www.reviewnb.com/

[2] https://github.com/ReviewNB/jupyterlab-gitplus


Instead of building things to make Notebooks play nicely with git, why not relegate notebooks to explicitly local exploratory work and when something needs to be deployed have it be turned into proper scripts/programs?


That's on your teammates not the technology. Like any code if you want it to be readable and robust you have to spend the time cleaning up and refactoring the it. Lots of notebooks are easy to read and run reliably.


Did you happen to see a thread on here a week or so ago about “It’s not what programming languages let you do, it’s what they shepard you to do”? (Strongly paraphrased there)

That’s my issue with Jupyter notebooks, between them and Python they implicitly encourage you to take all kinds of shortcuts and hacks.

Yes, it’s on my teammates for writing poor code, but it’s on those tools for encouraging that behaviour. It’s like the C vs Rust debate right: yes people should write secure code, and free their memory properly and not write code that has data races in it, but in the majority of cases, they don’t.


I didn't see that thread. Based on my experience, I don't really buy the premise. I'm not saying different languages can't somewhat nudge you a tiny bit towards better practices. Or simply not allow you to do certain things, which isn't really sheparding is it? But the vast majority of great engineering I have seen is mostly about the team and the many decisions they have to make in the course of one day. Which quickly adds up.

Quality engineering mostly comes from people, not languages. It is about your own personal values, and then the values of the team you are on. If there were a magic bullet programming language that guided everyone away from poor code and it did not have tradeoffs like a hugely steep learning curve (hi Haskell) then you would see businesses quickly moving in that direction. Such a mythical language would offer a clear competitive advantage to any company who adopted it.

What you are looking at really is not good vs. bad, but tradeoffs. A language that allows you to take shortcuts and use hacks sounds like it could get you to your destination quicker sometimes. That's really valuable if your goal is to run many throw-away experiments before you land on a solution that is worth spending time on improving the code.


Data analysis notebooks are annoying to read because you're forced to choose between:

a) ugly, unlabelled plots

b) including tons of uninteresting code that labels the axes (etc)

c) putting the plotting code into a separate module.

There are some extensions that do help with this, but extensions also kinda defeat the whole purpose of a notebook.


Yeah really, but I'm a very experienced dev so what I'm getting from it is likely very different from your experience. Consider looking into git or some other version control. If you are deleting stuff that breaks your code, you want to be able to go back to a working version, or at least look at the code from the last working version so you can see what how you broke it.


I could say the same about Python, honestly. Nonexistent variables, incorrectly typed variables, everything-is-function-scoped variables.


Hi, author here! The git thing is regarding model versioning. The managing of a ton of very large and slightly different binary blobs is not git's strong point imo.

There are a ton of tools trying to fill this void, and they usually provide things like the comparison of different metrics between models versions, which git doesn't provide.


Jupyter notebooks as idea, go back to REPL workflows in Lisp Machines, Common Lisp commercial IDEs, Smalltalk, Mathematica actually.


Maybe versioning for models?


Deep learning is the new Object Oriented Programming and Service Oriented Architecture


What about SPA and serverless?


Differentiable programming allows one to specify any parametrized function and allows one to use optimization to learn the objective.

There is definitely some need for an EDSL of some sort, but I think a general method is pretty useless. Being able to arbitrarily come up with automatic jacobians for a function isn't really language specific, and usually much better results are obtained using manually calculated jacobians. By starting from scratch you lose all the language theory poured into all the pre-existing languages.

I'm sure there'll be a nice haskell version that works in a much simpler manner. Here's a good start: https://github.com/hasktorch/hasktorch/blob/master/examples/...

I think it's pretty trivial to generalize and extend it beyond multilinear functions.


Be hard on the article, but easier on the concept - I think there is a lot of potential for differential programming w/ Swift, but this article is not a good advocate.


> need for the creation of a whole new set of new tools, such as a new Git, new IDEs, and of course new programming languages

The greatest possible barrier to adoption.


how do you get formatting to work in hn?


There a just a few simple rules, see https://news.ycombinator.com/formatdoc

I would add that verbatim text is sometimes hard to read because it doesn't wrap long lines and small screens require the reader to scroll horizontally so try not to use it for large/wide blocks of text.

Also bullet lists are usually written as separate, ordinary paragraphs for each item with an asterisk or dash as the paragraphs first character.


appreciated!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: