Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

ClickHouse can do large GROUP BY queries, not limited by memory: https://clickhouse.com/docs/en/sql-reference/statements/sele...


as explained in https://github.com/ClickHouse/ClickHouse/issues/47521#issuec... it can't, that parameters only applies on pre aggregation phase but not aggregation.

Feature request is not implemented yet: https://github.com/ClickHouse/ClickHouse/issues/40588


ClickHouse uses "grace hash" GROUP BY with the number of buckets = 256.

It can do size about 256 times larger than a memory because only one bucket has to be in memory while merging. It works for distributed query processing as well and is enabled by default.

About the linked issue - it looks like it is related to some extra optimization on top of what already exists.


> only one bucket has to be in memory while merging.

its hard for me to judge about implementation details, but per that person reply memory is also multiplied by number of threads which do aggregation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: