>- Are there any recommendations for scaling (e.g. benchmarks) on how many spans/s are supported on what hardware?
With Uptrace, I was able to achieve 10k spans / second on a single core by running the binary with GOMAXPROCS=1. That is 1-3 terrabytes of compressed data each month which is more than most users need.
Practically, you are limited by the $$$ you are willing to spend on ClickHouse servers, not by Uptrace ingestion speed.
So my recommendation is to scale Uptrace vertically by throwing more cores at it. That will allow you to go very very far.
>- Is there support for SSO for self-hosted installation?
So far the only way to add news users is via the YAML config. We are considering to add a REST API or a CLI tool for the same purpose, but it is not clear how that would work with the YAML.
Regarding the SSO, it would be nice if you can provide an app that already does that so we can better estimate the complexity. But so far we don't have such plans.
>- Can I easily disable certain features? (e.g. alerting)
Yes, most YAML sections just can be removed / commented out to disable the feature.
I have no other good examples for SSO than Grafana.
But something I would love to see more for logins are Application Tokens. We use Cloudflare Access for Team related logins, which will send such a token in the header, so the application can use it to authorize a user and by which group a user is in it can enable/disable features
We already use JWT tokens that are passed in a HTTP cookie so perhaps we could document how it works and let users sign JWT tokens themselves. That way your app only needs to set a cookie and the user should be authorized.
You can ingest data using OpenTelemetry Protocol (OTLP), Vector Logs, and Zipkin API. You can also use OpenTelemetry Collector to collect Prometheus metrics or receive data from Jaeger, X Ray, Apache, PostgreSQL, MySQL and many more.
The latest Uptrace release introduces support for OpenTelemetry Metrics which includes:
- User interface to build table-based and grid-based dashboards.
- Pre-built dashboard templates for Golang, Redis, PostgreSQL, MySQL, and host metrics.
- Metrics monitoring aka alerting rules inspired by Prometheus.
- Notifications via email/Slack/PagerDuty using AlertManager integration.
this looks amazing! i would definitely like to use this for log monitoring. however, i have a question. is it possible to get logs for individual docker containers?
So far my experience is that it is best to avoid trying to solve such problems with a query language and instead provide a much simpler UI to achieve the same. Solving such problems with SQL is tedious enough and learning another custom language is not fun / too much to ask from users.
Sometimes using a UI is not possible, for example, if you want to automate such checks. In that case, I would build a custom metric or two and would use that metric for monitoring purposes. That requires some programming / instrumentation, but it still looks like a better solution to me.
Those are 2 separate projects and they don't work together. I still did not have a chance to try loki / tempo so can't say how well they work in practice...
I understand that it is not ideal to have so many competing tools, but contributing to an existing mature project is a nightmare. It is by far easier to start a new one.
>Jaeger UI is anyway not the main tool people tend to work with.
Which tools / features do you have in mind?
Uptrace OS competes with Jaeger / Zipkin / SigNoz / SkyWalking and I believe it already does a pretty good job.
We use a custom elastic storage on steroids with Keycloak integration for enabling multi-tenancy in Jaeger so we can do SLA tracking and reporting. So the answer is yes.
I get your point about contributing, especially features that are incompatible with the maintainer vision. Feature creep, right?
What I value in open source projects is extensibility. Plugins which one can maintain outside of the main product.
> "But" do you need another storage? :)
I’m only saying that it’s possible. I might not need it but if someone does and they want to self host it as a managed solution, it can be done right in Jaeger.
> Which tools / features do you have in mind?
The default Jaeger UI isn’t really ergonomic. Trace info is more useful in the context of other information. As in, tools pulling trace info out of storage and overlaying on other data. There’s also Grafana Tempo.
Thanks for the answer - that all makes sense... "but" :) there is also some sense in NOT having to support different storages / plugins API and instead supporting more features that work out of the box. At least I hope so.
I will try Grafana Tempo & Loki - thanks for reminding. It is just that all their products look the same like the "original" Grafana for metrics which TBH does not look especially ergonomic too... Somehow Jaeger still comes first when people talk about tracing.
> there is also some sense in NOT having to support different storages / plugins API and instead supporting more features that work out of the box
I humbly do not agree. There’s always going to be an edge case which your features aren’t going to support and having the ability to roll out a plugin is the fastest way to integrate. There are also features I may want to keep confidential/proprietary depending on who the client is. In some cases even the existence of a certain feature in a certain shape gives away what a client is doing.
I’m generally staying away from software without extensibility opportunities. These types of solutions almost always end up as part of a larger infrastructure so extensibility is important.
Maybe, are you already using that feature in production? Buffers are available for years and just work. Hard to say how async inserts perform in real applications.
I also use buffer engine, don't get me wrong.
The only reason why I'm not using async writes is because I only run Altinity certified versions in my clusters do I'm waiting for it.
We have been using your hosted service for a while now and we are perfectly happy with uptrace, compared to jaeger the UI is miles ahead, I like jaeger for what it brought but the ui is just not very good.
We tried other hosted services but most if not all of them consider you are creating gold from thin air and charge you accordingly, we started with a hosted jaeger solution and switched to uptrace without looking back.
OpenTelemetry Java should work with Android, but I haven't really tried it. Java supports true auto-instrumentation so it should not be hard to figure out if you have an Android project.
I've provided a Docker example so users can quickly try Uptrace without downloading and configuring anything. I don't use Docker to run anything besides examples so can't give any recommendations.
Thanks for the clickhouse-operator though. I guess we will have to provide something for k8s sooner or later.
The data is received via OTLP (Otel protocol) and almost immediately inserted into ClickHouse buffer table in small batches. Simple and very efficient.
Tail-based sampling will require buffering spans in memory for some time, but tail-based sampling is not implemented yet.
Cloud version also uses Kafka to survive surges in traffic, but I guess "personal" / company version does not need that as much. So no need to introduce additional dependency.
For tail-based sampling, does it mean that every process in a trace will keep its spans in memory until the initial process 'ends' the trace? How does the flushing happen (e.g. all processes 'commit' their buffer spans)? Many thanks for the explanations!
Uptrace / Go process will buffer spans in memory for some short period of time (5-15 seconds). It does not work for long traces, but most traces are short.
>- Are there any recommendations for scaling (e.g. benchmarks) on how many spans/s are supported on what hardware?
With Uptrace, I was able to achieve 10k spans / second on a single core by running the binary with GOMAXPROCS=1. That is 1-3 terrabytes of compressed data each month which is more than most users need.
Practically, you are limited by the $$$ you are willing to spend on ClickHouse servers, not by Uptrace ingestion speed.
So my recommendation is to scale Uptrace vertically by throwing more cores at it. That will allow you to go very very far.
>- Is there support for SSO for self-hosted installation?
So far the only way to add news users is via the YAML config. We are considering to add a REST API or a CLI tool for the same purpose, but it is not clear how that would work with the YAML.
Regarding the SSO, it would be nice if you can provide an app that already does that so we can better estimate the complexity. But so far we don't have such plans.
>- Can I easily disable certain features? (e.g. alerting)
Yes, most YAML sections just can be removed / commented out to disable the feature.