pedreschi's comments

pedreschi · on March 1, 2022

Super interesting news from the OS database world today. Apache Druid, an OLAP database for streaming and batch workloads, has announced a new engine that enables doing ETL directly in the database with SQL, to optionally separate compute/storage for greater elasticity, and run new multi-stage queries with shuffle mesh (rather than scatter/ gather). Git hub issues are in the blog text.

pedreschi · on Dec 17, 2021

Happy to help you suss that out, at least from the Druid side. You can post on druid-user@googlegroups.com or on druidforum.org. I monitor those sites and if you give me some ideas of what you want to do, I can help you figure out if Druid is the right fit. In my 4 years working with Druid, I can tell you that the posters here are right about Druid V CH. We find that Druid is more stable and easier to manage for larger clusters, but it can be complex to get up and running. There is a new project in the Druid Community to address these issues and hopefully there will be a version early next year to play with. In terms of the benchmarks, the post was meant a little tongue in cheek... different queries in different environments with different data will just perform... different. I did some of the benchmarks against CH with SSD data and results were mixed. We tuned some stuff, added things to the code and things were still mixed. Both databases are very, very fast and very well suited to real time workloads. It all comes down to the use caase, deployment environment, etc.