Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you look at the design documents for pandas 2 there is a good illustration of how a lot of pain points in pandas 1 spring from numpy ( https://pandas-dev.github.io/pandas2/internal-architecture.h...). I think any significant development effort numpy would probably greatly benefit both libraries.

Will have to check out dplyr :) love to see how they master the magic that is multi-indexes.



In many cases, the use of multi-indexes in Pandas is (I think) a result of culture/style or expectation that the cells of a dataframe should have scalar values. If that would change and it became common to have nested dataframes, the use of multi-indexes would diminish.

The tooling to support nested dataframes (and maybe even lists) is simple to create, It can even be a third party library. I find that multi-indices though may be an accurate conceptual way of thinking about certain data, they tend to be practically more inconvenient than nesting the dataframes. In all cases I have encountered only single level of nesting is required.


If you're excited about non-scalar values in DataFrames, you should take a look at xarray (http://xarray.pydata.org), which implements a very similar idea in its Dataset class.


Thanks for the link! Good stuff.

By the way, dplyr doesn't use multi-indexes. I actually think this one of the reasons (although not the biggest reason) dplyr is easier to use.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: