Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Launch HN: ContainIQ (YC S21) – Kubernetes Native Monitoring with eBPF
79 points by NWMatherson on Jan 6, 2022 | hide | past | favorite | 20 comments
Hi HN, I’m Nate, and here together with my co-founder Matt, we are the founders of ContainIQ (https://www.containiq.com/). ContainIQ is a complete K8s monitoring solution that is easy to set up and maintain and provides a comprehensive view of cluster health.

Over the last few years, we noticed a shift that more of our friends and other founders were using Kubernetes earlier on. (Whether or not they actually need it so early is not as clear, but that’s a point for another discussion.) From our past experience using open-source tooling and other platforms on the market, we knew that the existing tooling out there wasn’t built for this generation of companies building with Kubernetes.

Many early to middle-market tech companies don’t have the resources to manage and maintain a bunch of disparate monitoring tools, and most engineering teams don’t know how to use them. But when scaling, engineering teams do know that they need to monitor cluster health and core metrics, or else end users will suffer. Measuring HTTP response latency by URL path, in particular, is important for many companies, but can be time-consuming to install application packages for each individual microservice.

We decided to build a solution that was easy to set up and maintain. Our goal was to get users 95% of the way there almost instantly.

Today, our Kubernetes monitoring platform has four core features: (1) metrics: CPU and memory for pods/nodes, view limits, capacity, and correlate to events, alert on changes; (2) events: K8s events dashboard, correlate to logs, alerts; (3) latency: monitor RPS, p95, and p99 latencies by microservices, including by URL path, alerts; and (4) logs: container level log storage and search.

Our latency feature set was built using a technology called eBPF. BPF, or the Berkeley Packet Filter, was developed from a need to filter network packets in order to minimize unnecessary packet copies from the kernel space to the user space. Since version 3.18, the Linux kernel provides extended BPF, or eBPF, which uses 64-bit registers and increases the number of registers from two to ten. We install the necessary kernel headers for users automatically.

With eBPF, we are monitoring from the kernel and OS level, and not at the application level. Our users can measure and monitor HTTP response latency across all of their microservices and URL paths, as long as their kernel version is supported. We are able to deliver this experience immediately by parsing the network packet from the socket directly. We then correlate the socket and sk_buff information to your Kubernetes pods to provide metrics like requests per second, p95, and p99 latency at the path and microservice level, without you having to instrument each microservice at the application level. For example with ContainIQ, you can track how long your node.js application is taking to respond to HTTP requests from your users, ultimately allowing you to see which parts of your web application are slowest and alerting you when users are experiencing slowdowns.

Users can correlate events to logs and metrics in one view. We knew how annoying it was to toggle between multiple tabs and then scroll endlessly through logs trying to match up timestamps. We fixed this. For example, a user can click from an event (ex a pod dying) to the logs at that point in time.

Users can set alerts across really all data points (ex. p95 latency, a K8s job failing, a pod eviction).

Installation is straightforward either using helm or with our YAML files.

Pricing is $20 per node / month + $1 per GB of log data ingested. You can sign up on our website directly with the self-service flow. You can also book a demo if you would like to talk to us, but that isn’t required. Here are some videos (https://www.containiq.com/kubernetes-monitoring) if you are curious to see our UX before signing up.

We know that we have a lot of work left to do. And we welcome your suggestions, comments, and feedback. Thank you!



I recently had an issue where my UDP service worked fine exposed directly as a NodePort type, but not through an nginx UDP ingress. I _think_ the issue was that the ingress controller forwarding operation was just too slow for the service's needs, but I had no way of really knowing.

Now if I had this kernel level network monitoring system, I probably could have had a clearer picture as to what is going on.

Really one of the hardest problems I've had with learning/deploying in k8s is trying to trace down the multiple levels of networking, from external TLS termination to LoadBalancers, through ingress controllers, all the way down to application-level networking, I've found more often than not the easiest path is to just get rid of those layers of complexity completely.

In the end I just exposed my server on NodePort, forwarded my NAT to it, and called it done. But it sounds like something like ContainIQ can really add to a k8s admin's toolset for troubleshooting these complex network issues. I also agree with other comments here that a limited, personal-use/community tier would be great for wider adoption and home-lab users like me :)


Appreciate this insight and I agree with you.

And I can definitely circle back here when our limited use tier goes live. Agree on that too.


A community edition/non-paid would be quite nice to be able to trial this out before paying.

This is how an old employer adopted CockroachDB because we trialed the non-enterprise version and then ultimatley bought a license.


I agree. We are planning to launch a free edition with limited size and data retention. For users to try / play with before paying. It is in the works and we hope to have this out in the next few months.

We are also thinking about launching trials too.


Agreed.

Our employer does not invest on this kind of tools, so when a free version does not exist for us the tool does not exist.

We would be happy to provide usage metrics and reports, our company is full open source and open data, and we work/invest time on open projects when possible.


Not the OP, but I develop a different open source tool for Kubernetes and would love to talk! (Email is in my profile)


Does your company use paid versions of the open source tools or pay support?


We are planning to open source our agents in 2022!


Hello. I own and run a DevOps consulting company and use DataDog exclusively for clients. DD works pretty well as it integrates with cloud providers (such as AWS), physical servers (agent), and Kubernetes (helm chart). The pain point is still creating all the custom dashboards, alerts, and DataDog integrations and configuration. Managing the DataDog account can almost be a full-time job for somebody. Especially with clients who have lots of independent k8s clusters all in a single DD account (lots of filtering on tags and labels).

What does ContainIQ offer in terms of benefits over well established players like DataDog? I will say, the Traefik DataDog integration is horrible and hasn't been updated in years so that's something I wish was better. DataDog does support Kubernetes events (into the feed), and their logging offering is quite good (though very expensive).


The dashboard configuration issue was actually one of the pain points we targeted initially. It was an issue we experienced too. And we talked to a lot of our friends who had spent significant time setting these dashboards up in Datadog. One of our initial goals has been to try to automate to get you 95% of the way there without any configuration on your end. We've also tried to make alerting really easy and are working to automate the process of setting smart alerts. Would love to chat more about your experience if you are open to it. My email is nate (at) containiq (dot) com


How does this compare to Pixie? [0]

[0]: https://github.com/pixie-io/pixie


Polar signals develops Parca [0] which is another eBPF observability tool, and Isovalent develops Cilium [1] which is built on eBPF as well. Genuinely curious if there are differences, or if eBPF only allows for specific observability functionality and each tool has it all.

[0]: https://github.com/parca-dev/parca

[1]: https://github.com/cilium/cilium


Polar Signals founder and one of the creators of Parca here. From what I can tell ContainIQ is distinct from Parca and Polar Signals as we only concern ourselves with continuous profiling, which is complementary to metrics, logs and traces. From our experience, while eBPF is certainly limited and it can be painful to work with the verifier at times, it hits a sweet spot for observability collection because of low overhead and you really only read some structs from memory somewhere for which the limitations of eBPF tend to be plentiful.

Definitely excited to see more eBPF tooling appear in the observability space.


Well said, we are excited to see more eBPF tooling appear as well.


Pixie is definitely similar in their eBPF based approach. I believe there are differences in the types of data they collect and correlate with. For example we collect logs and state information (node status, node conditions, pod scheduled ect) along side our eBPF based metrics like latency. I'm sure there are things they collect that we don’t as well.


Nice to see a new eBPF based solution out there. Good luck.


Thanks so much!


How does this compare to Opstrace? [0]

[0]: https://opstrace.com


OpsTrace took an interesting approach (and was a YC company too, recently acquired by GitLab). We are a managed solution, whereas OpsTrace was a self hosted open source solution. And we are not building on top of other open source tools. With ContainIQ, you can get metrics natively and other features that you wouldn't otherwise be able to get (ex p95 latency by endpoint) with OpsTrace and its integrations.


Gcp wants 50 cent per ingested log gb.

Gcp is already quite expensive in this regard and you want double.

I think that's way to expensive.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: