But isn't the issue here that completely redoing the API would break a lot of code? I don't see how throwing money at the problem would fix this. I don't use any of these libraries, so maybe I'm totally off base, but it sounds like it's more of a tech debt/design issue than an issue that requires the kind of programming hours that only money can buy.
On the other hand if lots of libraries use numpy, making it more efficient and/or capable would seem to give quite a lot of bang for the buck. And it sounds like that's the kind of problem that money can actually solve.
There have been a few independent attempts to add dplyr-like functionality to pandas without being backwards incompatible (e.g. dplython). I'd be very happy if the core pandas team went down this path.
That being said, I don't have a good understanding of how strong the distinction is between "design issues" and "issues where money helps". There must be some overlap.
I'll have to speak in generalities as I don't know enough about NumPy in particular to comment.
> That being said, I don't have a good understanding of how strong the distinction is between "design issues" and "issues where money helps". There must be some overlap.
That's true, but many projects have turned out bad no matter how much more money has been spent compared to less expensive, but better projects. See: Design by committee. The design of an API obviously requires careful thought, which I suppose is work that could be paid. But the issue of getting everyone to agree on a design isn't one that money can solve, and then you need to make some hard decisions about backward incompatibility. Perhaps you'd fund a fork of the project, splitting it into an old legacy one and a new, fancy version with a new API, but then you're committed to maintaining two projects which is its own headache.
These are the kinds of things I mean by design issues. Problems that aren't necessarily hard because they require many people to work for many billable hours to solve them, but because finding acceptable compromises is a very human issue quite irrespective of the programming effort involved.
Many a software project has recognized that serious, backwards-incompatible changes would improve the project, and often there is even a working implementation, but these human and legacy support issues prevent widespread adoption and then the new implementation dies a quiet death because nobody is using it, so nobody finds it worth their time to work on it.
Perhaps what you really want is a new library, rather than trying to contort a different project into the shape you want. Which is of course something money helps with, but then when the money dries up the question of adoption is going to determine whether it lives or dies as an open source project.
Again, those were some general thoughts, I don't know much about this particular project, so maybe I'm way off base. Just offering an alternative POV regarding what exactly constitutes "getting your money's worth" with respect to choosing which OS projects to fund.
pandas is often used for one-off reports, where backwards compatibility is not as important.
Production software relying on the API could always depend on previous versions if a new version brings a significantly improved API.
I'm a regular user of pandas, would definitely say it's my favorite Python library by far... but it is very hard to do certain operations with it (as the OP said, anything involving multiple indexes, and things like plotting multiple plots after a groupby, etc.)
On the other hand if lots of libraries use numpy, making it more efficient and/or capable would seem to give quite a lot of bang for the buck. And it sounds like that's the kind of problem that money can actually solve.