Hacker Newsnew | past | comments | ask | show | jobs | submit | lamename's favoriteslogin

What you're describing is Gell-Mann Amnesia:

"Briefly stated, the Gell-Mann Amnesia effect is as follows. You open the newspaper to an article on some subject you know well. In Murray's case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward—reversing cause and effect. I call these the "wet streets cause rain" stories. Paper's full of them. In any case, you read with exasperation or amusement the multiple errors in a story, and then turn the page to national or international affairs, and read as if the rest of the newspaper was somehow more accurate about Palestine than the baloney you just read. You turn the page, and forget what you know."

-Michael Crichton


Arrow is the most important thing happening in the data ecosystem right now. It's going to allow you to run your choice of execution engine, on top of your choice of data store, as though they are designed to work together. It will mostly be invisible to users, the key thing that needs to happen is that all the producers and consumers of batch data need to adopt Arrow as the common interchange format.

BigQuery recently implemented the storage API, which allows you to read BQ tables, in parallel, in Arrow format: https://cloud.google.com/bigquery/docs/reference/storage

Snowflake has adopted Arrow as the in-memory format for their JDBC driver, though to my knowledge there is still no way to access data in parallel from Snowflake, other than to export to S3.

As Arrow spreads across the ecosystem, users are going to start discovering that they can store data in one system and query it in another, at full speed, and it's going to be amazing.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: