Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Kestra - Open-Source Airflow Alternative (github.com/kestra-io)
142 points by tchiotludo on March 24, 2022 | hide | past | favorite | 69 comments
Hey HN, I'm really proud to share with you my new open source project: Kestra https://github.com/kestra-io/kestra

I created a few years ago a successful open source AKHQ project: https://github.com/tchiotludo/akhq (renamed from KafkaHQ) which has been adopted by big companies like Best Buy, Pipedrive, BMW, Decathlon and many more. 2300 stars, 120 contributors, 10M docker downloads, much more than I expected.

Now let's talk about Kestra, an infinitely scalable orchestration and scheduling platform for creating, running, scheduling and monitoring millions of complex pipelines.

I started the project 30 months ago and I'm even more proud of this project that required a lot of investment and time to build the future of data pipelines (I hope). The result is now ready to be presented and I hope to get some feedback from you, HN community.

To have a fully scalable solution, we choose Kafka as our database (of course, I love Kafka if you didn't know) as well as ElasticSearch, Micronaut, ... and can be deployed on Kubernetes, VM or on premise.

You may think there are many alternatives in this area, but we decided to take a different road by using a descriptive approach (low code) to build your pipelines allowing to edit directly from the web interface and deploy to production with terraform directly. We paid a lot of attention to the scalability and performance part which allows us to have already a big production at a big French retailer: Leroy Merlin

Since Kestra core is plugin based, many are available from the core team, but you can create one easily.

More information: - on the official website: https://kestra.io/ - on the medium post: https://medium.com/@kestra-io/introducing-kestra-infinitely-... - check out the project: https://github.com/kestra-io/kestra

Your comments are more than welcome, thank you!



I just noticed the title says "Open-Source Airflow Alternative" but Airflow is already Open-Source, so shouldn't you describe it as just "Airflow Alternative"? Otherwise you make it sound like Airflow isn't Open-Source but this is.


The title was changed by moderators and I can't edit it anymore :'(


This looks incredible!

Is there a way to use this as a managed service?

Are you looking for independent partners/integrators?


For now, we don't provide a SAAS for Kestra, it's definitely on the roadmap and our next project.

In a meantime, we provide different installation : - Docker compose: https://kestra.io/docs/administrator-guide/deployment/docker... - Kubernetes: https://kestra.io/docs/administrator-guide/deployment/kubern... - Jar: https://kestra.io/docs/administrator-guide/deployment/manual...

Kestra is not so complicated to be installed, for Kafka and Elasticsearch, you could use Amazon managed service or Aiven for example.

But be sure that we will provide a managed service as soon as possible


I'm definitely trying it out on my machine this evening.

What's your position on other companies providing managed service of your project?


I've just added Kestra as a fully managed service on Elest.io here: https://elest.io/open-source/kestra

We do deployment, security, remote backups, vm snapshots, monitoring/alerts, automated OS & Software updates, and much more. We launched few weeks ago here on HN

There is a free trial, so you can start and get a working instance of Kestra in less than 5 minutes. Let me know what you think :)


To be honest, we don't have think about that for now. It's a complicated subject.


Hey Tchiotludo, we have a partner program where we give back part of the revenue generated on Elestio to the OSS author every month, if you are interested please contact us through our website and we will be happy to setup the partnership with you :)

You can see here your software already available as a fully managed service: https://elest.io/open-source/kestra


True, you don't have to think about it because it's obvious: Anyone is allowed to provide a managed service of Kestra since you licensed the project under Apache License 2.0. Not that complicated.

emteycz: I say go for it. If you get it up and running, let me know (email in profile) as I'm interested in trying it out as a managed service.


I'm more interested in integrating this into the software/ops of my customers at the moment. I just don't want to run a service right now, I like consulting more.

I have a friend who has a data/ops cloud service. He's been looking for something like this. However in that case Kestra would be hidden behind his abstraction, I guess.


I’d be interested in learning more about your friend’s service.


Are there any instructions on how to build a full docker image with all dependencies and plugins?


Sorry, I didn't see your comment before. We provide 2 image tag, one with no plugins and one with all plugin, more information here ; https://kestra.io/docs/administrator-guide/deployment/docker...

In resume, all image exist with tag full containing all plugins


Hey, no worries. I have seen that page but I wonder if there's a documented way of building an image exactly like your base image. All examples on that page assume FROM kestra:$TAG. Do you have any documentation on how to build that base image so one does not need to rely on your upstream image?


There is no documentation for that for now but it's not complicated:

- git clone the project

- ./gradlew executableJar

- cp build/executable/* docker/app/kestra

- docker build -t myimage .

In resume, you just need to have an image providing java 11 and start the executable build by gradle

hope it help


Awesome, thank you!


Why is this better than Airflow, or Prefect, or Dagster?


Airflow have design issue and performance issue, If you want to have some details, you can find some reason on this article: https://kestra.io/blogs/2022-02-22-leroy-merlin-usage-kestra....

For other workflow engine (dagster, prefect, ...), we decided to use a complete different approach on how to build a pipeline. Since others decide to use python code, we decided to go to descriptive language (like terraform for example). This have a lot of advantages on how the developer user experience is: With Kestra, you can directly the web UI in order to edit, create and run your flows, no need to install anything on the user desktop and no need a complex deployment pipeline in order to test on final instance. Other advantage is that it allow to use terraform to deploy your flows, typical development workflow are: on development environment, use the UI, on production deploy your resource with terraform, flow and all the others cloud resource.

After, it will be really nice to have some independent performance benchmark. I really think Kestra is really fast since it was based on a queue system (Kafka) and not a Database. Since workflow are only events (change status, new tasks, ...) that is need to be consume by different service, database don't seems to be a good choice and my benchmark show that Kestra is able to handle a lot of concurrent tasks without using a lot of CPU.


FYI some of the Airflow issues are out of date / can be resolved with config changes.

AirFlow 2 is designed to support larger XCOM messages, so the guidance to only use it for small data no longer applies.

Your DAG construction overhead issue is likely due to dagbag refreshing. Airflow checks for DAG changes on a fixed interval, causing a reimport. The default period for that is fairly small, so for large deployments you will want to use a larger period (e.g. at least 5 minutes). I do not know why the default is so short (or was last I checked, anyway). Python files shouldn't do much of note on import regardless IMO.

I am not otherwise familiar with the improvements in Airflow 2, so I cannot say for sure if your other complaints still remain.


I know that that some issues are fixed in Airflow 2, they have made a large improvement with that release. But not all issues is resolved with this one.

The performance issue is still here, just launch Airflow and submit thousand dagruns with simple python sleep(1) and you will hit the cpu bound very quickly with a total time that will have a large duration. Airflow is not designed for a lot of short duration tasks. When using event driving data flow, it's really complicated to managed.

Imagine a flow that will be triggered for each store for example (thousand of store, with 10+ tasks for each one), Airflow will not be able to manage this kind of workflow quickly (and it's not its goals). Airflow was clearly defined to handle small (hundreds tasks) for a long time.

For the XCOM part, Airflow store this in database, so you can't store data into this, you will need to store a small data (database is not here to store big files). In Kestra, we have a provide a storage that allow storing large data (Go, To, ...) between tasks natively with the pain on multiple node clusters.


AirFlow 2 was released in 2020. You're saying you knew that these issues were fixed, and then an article is published on your webpage in 2022 knowingly comparing against the technical properties of a major version release 2 years behind? That is not a good look.


First of all, the article published is a retrospective, we are talking from decision in 2019, we can't talk from the past that leed us for a choice?

Second, not all issues, some of them are fixed but there is still major issue, just dig google about issue scaling airflow on production, even with airflow 2, it's still complicated. Airflow still use a lot of CPU for doing nothing else than waiting for some api call. Just try to run 5000 tasks that sleep (simulation of an api call) in Airflow and we will see the challenge of scaling it.

Third, Airflow have still design issues that will not allow you to deal with some sort of pipeline.

Last one, I'm not here to fight against Airflow, some people love, some people hate it. We have take a completely different choice about designing and scaling data pipeline, I let people used what they like. For me, Airflow (and other workflow manager) doesn't fit.


Or Ploomber?


Anyone know of something like Dagster / Ploomber but for NodeJS?


Ploomber actually can work with shell script so you could export you js code into a pipeline. But besides that not that I know of.


I got my node script working with Kestra


We all may have questions for you on some of your descriptions and choices. None of that should take away from the fact that this is a pretty impressive stage for a 30-month open source project.

Have not seen the participants - how many contributors do you have?


The project start as a side project (yet another side project I do the night and weekend) but was quickly promoted and used in a French Big Retail Company.

This one trust on the project and decide to go production with Kestra. So they decide to inject some resource in order to develop some features that need and that is missing.

But basically, not so much people for now. We are trying to start a community around the product and started to communicate around the product since few weeks only, I hope community will follow us! And I hope to succeed like on my other open source project: https://github.com/tchiotludo/akhq


Looks cool ! How does it compare to temporal.io in your experience ? I’m evaluating options at my current company, between airflow and temporal.


Temporal.io is a really cool framework for building business process like managing microservice workflow (like paiement workflow: user pay, we call the shipping microservice, the billing microservice, ...) and good fit to handle individual event (lots of individual events).

Kestra (and so airflow) is more a workflow manager to handle data pipeline like moving large dataset (batch) between different source and destination, do some transformation inside database (ELT) or with Kestra you are also able to transform the data (ETL) before save it to external systems.

This lead Kestra (and so airflow) to have a lot of connectors to differents systems (like SQL, NOSQL, Columns database, Cloud Storage, ...) that is ready to use out of the box.

temporal.io, since it's first design to handle microservice (proprietary & internal service) don't have this connector out of the box, and you will need code all this interaction.

So my opinion:

Building data pipeline interacting with many standard systems will be done easily & quickly with Kestra (or airflow)

Handling internal business process of micro service will done easily with temporal.io


Netflix Conductor is great alternative to Temporal. There is a fully managed offering for this as well. The biggest advantage is that it’s quite simple to understand and has great visualization of flows.


Have you tried Netflix Conductor (https://github.com/Netflix/conductor) - if you are evaluating between Airflow - this could be a great alternative - scales well and gives you option to write your workflows in code as well as config.


These kinds of tools seem to be meant to scale up well, but are there good ones that “scale down” to small projects too?


You can easily scale down for small project using a mono node using a simple docker compose setup: https://kestra.io/docs/getting-started/

Working well on a standard laptop easily


Airflow is relatively easy to set up once you have the hang of it. At its most basic it needs three containers (server, sql, executor), and your dag definitions which are very straightforward python code.


Cool. But please fix that light gray text on white background in the demo (or make it even paler for more avant-garde :)


Do you have a screenshot please ? I didn't notice where. Thanks


Are there any plans for a desktop app for Kestra or the ability to support Windows Server outside of docker?


The support of windows server seems to be easy I think. Since it's java behind, most of the api is working on windows. Just need to create a custom task for windows, added in the backlog : https://github.com/kestra-io/kestra/issues/519

For the desktop app, I don't know, build one with electron can be simple, but a full app is not on the roadmap for now. What is your usages ?


Can you run software containers as steps ?


Yes, of course!

You have 3 solutions for that:

- you can use this task using runner:DOCKER property and choose the image: https://kestra.io/plugins/core/tasks/scripts/io.kestra.core....

- you can also use PodCreate to launch a pod on a kubernetes cluster: https://kestra.io/plugins/plugin-kubernetes/tasks/io.kestra....

- you have also CustomJob from VertexAI on GCP to be able to launch a container a ephemeral cluster (with any CPU / GPU): https://kestra.io/plugins/plugin-gcp/tasks/vertexai/io.kestr...


Great! That makes Kestra more useful than Dagster for me.


You’re basically pitching this as a more complicated version of airflow that does basically the same thing, but slightly differently, and scales better?

… but your core dependencies are a Kafka cluster and an elastic search cluster which are both a pain in the ass to scale; so really, could you run this seriously without a really expensive hosted cloud instance of both of those?

This kind of wording:

> Since the application is a Kafka Stream, the application can be scale infinitely

Is a major turn off to me.

Kafka cannot scale infinitely. Nothing can. In fact, Kafka can be a pain in the ass to scale.

In makes me question some of the other commentary on the project.


As long as we're airing pet peeves, mine is about over-literal misunderstandings:

> Kafka cannot scale infinitely. Nothing can.

It is very common that when a phrase can't be literally true, it signals a metaphorical meaning. E.g., if a teen tells you their new teacher is a million years old, it's not a literal statement of age. Similarly, nobody expects "scale infinitely" to mean that, as in Universal Paperclips, that we'll be converting whole galaxies into Kestra clusters. It means that any bottlenecks are external to the system.


> Similarly, nobody expects "scale infinitely" to mean that, as in Universal Paperclips, that we'll be converting whole galaxies into Kestra clusters.

I disagree. I've worked with plenty of people that would probably take this statement at face value and assume you could scale to a completely arbitrary amount of load with no marginal effort.

Plus, there's a difference in context here: we're talking about a technical product. It doesn't hurt to be precise and technical in your description of it, does it? This is the most likely setting in which someone might interpret something literally.


Aren't those the same people who would assume that of many technologies whether or not the word "infinite" was used?

I do agree there's a difference in context, but for me it goes the other way. I'd expect pretty much anybody in a technical audience to know technical basics. For me that's a big part of the fun in writing on HN, in that it's not obligatory to dumb my points down just to coddle the clueless.


but is it web scale? like mongodb? :)


Really thanks wpietri to have understand the hidden sense behind :+1:


I don't pitch as more complicated version of Airflow, rather, I think it's more simple than Airflow on the UX side: we use declarative flow with yaml and not python code that can be

I agree with you that Kafka & ElasticSearch can be a pain to scale if you need to have a horizontal and vertical scaling.

On other side, on single machine, it's really has easy to setup. With this, you will have the same scaling than Airflow for exemple since it depend on a non scalable database (mysql or postgres). But the chance you will have with Kestra is that you will be able to scale to multiple node for your backend (as well with kestra that allow scaling all services). When you hit the limit with standard database, you will be stuck.

And yes clearly infinite scale is not a literal statement terms, nothing can scale infinitely but since the architecture is really robust (and scalable), the issues will be on other aspects than Kestra (cloud limit, database overload, ...).

A final point and a more important one, the backend are all pluggagle in Kestra since Kestra is really think as module: Look at the directory here : https://github.com/kestra-io/kestra :

- runner-kafka & runner-memory are 2 implementation of Kestra, you can add a new one that will use Redis, Pulsar, ...

- repository-elasticsearch & repository-memory is the same, you can implement another one, I started one implementation for JDBC that I don't have the time to finish for now : https://github.com/kestra-io/kestra/pull/368


But using a proper programming language to define dependencies is one of airflow's main advantages! I'd even go so far as to say you're not using it to its full potential if you're not writing code to infer complicated dependencies programmatically.


I don't think that with "standard databases" you get "stuck". But I do buy that they're harder to scale


Elasticsearch is not a pain in the ass to scale, it is one of the easiest databases to scale. Kafka is medium, since they ditched Zookeeper.


That hasn't been our experience. Our elasticsearch cluster has been a pain in the ass since day one with the main fix always "just double the size of the server" to the point where our ES cluster ended up costing more than our entire AWS bill pre-ES. (but that might be our limited experience) whereas something like postgres has required nearly 0 maintenance apart from adding the occasional index but even that has been just due to tuning, not that the DB fell over.

Both are AWS hosted products (RDS, AWS Elasticsearch).


Easiest database to scale is a pretty low bar. Databases are typically really hard to scale and Elasticsearch is no exception. Aside from the issue of ease, one thing that has been universally true for me is that Elasticsearch is incredibly expensive to scale in terms of compute costs.


Elasticsearch has built in horizontal scaling abilities, unlike Postgres/other SQL databases. It also has integrations with cloud providers for peer discovery, or can use DNS. Once a new data node is detected and reachable, the masters will start sending it shards of data, distributing the load. This all happens without any user intervention. I can't really speak to cost, it is somewhat easy to blow up the memory usage in Elastic for sure, but I can't say its been more expensive than similarly sized Postgres clusters.


Right, GB for GB ES is much easier to scale than Postgres (or any other DB) but probably also more expensive since ES is much more memory and compute hungry. But I can't say I have an apples-to-apples comparison since the use case for ES is usually "dump massive amounts of raw data in and index everything" which you wouldn't typically do with a Postgres instance. But in places where we have run large ES clusters my experience has not really been that it works without any user intervention (at least once you reach a certain scale) and that it involved a lot of operational support. Not that any other solution with comparable features would have been easier necessarily but still not easy in any absolute sense.


I go hilariously out of my way to eliminate elasticsearch at any org I join. Usually because it's only being used for logs and modern tools like loki are immeasurably easier to scale and cheaper to run. But I also find many many developers using it don't know about time series databases or anything at all about which data structures go in which kind of database and just dump everything into a horrifically organized search database. Its at least one order of magnitude worse to scale and operate than a mongo-type nosql database being used incorrectly by a developer who doesn't know any better and two orders of magnitude worse than a sql database being used incorrectly by a developer who doesn't know any better.


Loki's fine if you are very cost sensitive and are comfortable with Prometheus, but it's not really a replacement for a text-search database like Elasticsearch. It also scales about the same, both being horizontally scalable (I'm not sure what Loki's sharding strategy is). Our ELK stack runs on 3 2cpu/8gb ram nodes totaling about $160 a month and can handle 50+ million of records or so (I haven't ran it to its absolute limit). This is a comfortable price to performance ratio for us and I imagine many other companies.


I think people that have issues scaling any modern distributed data stack are because a) Don't have experts or b) Bad practices/stretching the use case. I worked on a project once where the ES cluster performance was degrading because they kept increasing the number of fields. At some point, they had more than 5k for a single document schema even though ES docs mention going over the limit (1k) is not a good idea. I mean if any of these big tech companies can manage clusters of hundreds of nodes for any of these data stacks I'm sure your scaling issues aren't because of the tool.


Easy/hard is depending on the experience of the user. Someone with a lot of experience with Elasticsearch will have a easy time scaling Elasticsearch and hard time scaling Kafka, and vice-versa.

Better to compare how complex they are to scale in terms of actions required.


Yeah and ES doesn’t scale forever, also running both of those is incredibly computationally expensive. You would really need the right use care


Agree that both are expensive to scale on multiple node. But keep in mind, you can use it with a single node (like others do with a database like mysql).

Just don't go multiple node if not needed by the project. But when you will need to, with Kestra you can go multiple node and scale.


Why Java?


- For performance mostly, Kestra rely a lot on Java thread to be able to handle a very large workload

- Because the application is built on top of Kafka, and Kafka Streams that is only available on Java

- Because the java ecosystem is very large and there is a lot good library to handle a lot of workload

- Because I love strong typing and the language (but no matter for the user, just a personal pleasure :D)


[deleted]


Why is everyone ok with logs being dumped like it's in a trash can?

I see partial structures and then JSON string as is and then some long blob of string no one can understand what it is with no new lines.

What devs want are pretty simple, structured log with table layout without repeating the column names on every row to make it look insanely verbose for any human to consume.

I'm picking up bits of open source apps to build a decent solution with Vector (which has awesome Vector remap language to parse strings into structured data if it isn't already) and throw it into ClickHouse to view it from Metabase.

Apparently, Kibana, Graylog or even Grafana are pretty bad at displaying logs to even feel tiny comfortable reading it every day.

Logging is such a crucial part of developer life and not sure why that there aren't any sane open source solutions.


Sorry I don't understand, we display with pretty print, see here : https://kestra.io/assets/img/05.8b5545ef.png

It's not as json ? or I don't understand where you see that.


This is what I see on the first page of the log in your demo. https://imgur.com/a/S1QkzuG

It can look a bit better with more config for these 3 other tools but these are pretty much what they're and the readability isn't any better than tail/grep.

- Graylog : https://adamtheautomator.com/wp-content/uploads/2022/02/imag...

- Grafana : https://grafana.com/static/assets/img/blog/logcontext_explor...

- Kibana : https://blog.ip2location.com/wp-content/uploads/2018/12/logs...

Comapred to displaying structured logs in Metabase. (This isn't text logs but you get the point.)

- https://www.predictiveanalyticstoday.com/wp-content/uploads/...


Ok got it, every task type are able to logged what they want, so basically, there is one task that is sending dirty output, I add a fix on the roadmap in order to clean this task, thanks for the report


I think the point is that logs should have a well-defined structure that can be easily parsed and queried by other programs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: