Notably, examine the simple assembly generated. In comparison, most other languages doing this will not be able to optimize away these abstractions at all.
I understand just fine why the C++ version is complex: it's because C++ is a poor language for higher-order function abstractions. Please don't make excuses for C++'s excessive inessential complexity: there is no reason why a programmer should need to concern themselves with "const r-values" and similar shenanigans in order to accomplish this simple task.
The fact that the compiler is able to optimize away this code is a compiler issue, not a language issue. ghc will optimize away Haskell's flip and I didn't once have to write a compile-time index-reverser in template code.
The assembly is hard to understand. I recommend looking at the C-- intermediate output, using the -ddump-cmm flag, with -O2. The results in most of the functions getting inlined and flip is removed entirely.
You can see this clearly if you use conspicuous numbers. If, instead of (flip foo 2 3), you give (flip foo 2 99), you will code like this:
EDIT: and if that's too exotic for you, here's the solution in Python.