\

Traceway: MIT-licensed observability stack you can self-host in ~90s

121 points - last Monday at 7:05 AM

Source
  • denysvitali

    today at 10:31 AM

    At KubeCon Europe a very good chunk of booths were observability stacks. Everyone was claiming they're better than the competitors (with some of the just justifying themselves by saying "it's written in Rust).

    Having dealt with Prometheus (+Thanos) / Grafana / OTEL and other stacks (e.g: custom solution on ClickHouse, Victoria{Metrics,Logs}, Jaeger/Tempo, Loki, ...) and even cloud ones (Google's Monarch rebranded as Prometheus)... what's your selling point? This to me seems like yet another way to re-invent the wheel.

    If it's just for running locally, okay, fine, but when it comes to production (where the stack really matters) at scale, you end up with lots of tradeoffs and approaches.

    Why is this one a winning one compared to the overwhelming "competition"? Seems like we're re-inventing the wheel for the 100th time instead of focusing on unifying the efforts in making the existing solutions better. Thankfully we now have OTEL, so at least the interoperability part is somewhat solved (or mitigated)

      • yuppiepuppie

        today at 11:22 AM

        I was thinking this might be a result of the Cheap-money (post covid) era ending and everyone scrambling to reduce their Datadog/Cloud costs. Thinking back on 2023/2024, lots of companies were leaking large amounts of capital to those vendors and I imagine lots of people saw an opportunity for creating leaner and cheaper stacks.

        • yard2010

          today at 11:07 AM

          I have tried to self host grafana (loki prom and alloy) as o11y stack for prepbook.app. This is hard. I have a bsc in cs not that it says something. I managed to do it eventually, after some research. It was not plug and play in any way. The docs kept saying this solution is not production ready even. I couldn't find the production guide, only the "forget about self hosting and simply pay for us hosting this". After I deployed it the UX was so abrasive my partner won't even try to go into it to figure out a problem. It was a few months ago. Since then new solutions have arrived and I'm waiting to have the time to migrate. I saw PostHog have a solution but I prefer something I could self host and completely own.

          I thought how come no one is trying to solve this problem. It looks like it's just a matter of time.

          With that being said, my experience can be very skewed since prepbook is a passion project running on a VPS with essentially 0 scale. All I care about is the UX of the stack, not scale. Just for context.

      • amne

        today at 10:35 AM

        how can you claim in the readme "no per-language vendor SDK" and then link to a list of per-language client SDKs?

          • danparsonson

            today at 11:20 AM

            Aren't they two different things? Vendor SDKs to get the data in, client SDKs as an option to get the data out?

        • tecoholic

          today at 2:04 AM

          I was looking into this just yesterday. So the Loki + … comparison is a bit off in the Open Source space. The main ones are Signoz and ClickStack in this space. Both using ClickHouse as the database. Heavy compared to something like Loki, but they are OTEL native and not log monitoring. So not in the same category.

            • jillesvangurp

              today at 6:07 AM

              I used Signoz + Clickstack on a vibe coded Go server project a few weeks ago. I just made codex figure out how to setup signoz + dependencies via docker compose. I even got it to pre-populate signoz with dashboards. It wasn't too bad. The whole thing runs with a few GB. I tried to cover metrics, tracing, and logging at the same time. This is not a production ready setup but you need to trade off cost vs. utility here. If it's useful enough, that could justify extra cost.

              I have a background in having done a lot of stuff on the Elastic stack related to this; including setting up a big Elastic Fleet based stack for one client at some point. It might not be the cheapest, but it does provide awesome filtering and querying capabilities. However, a lot of teams that use it don't really know how to tap into that capability so it tends to be overengineered for what it does in the end. And the extra, underutilized complexity is why a lot of teams are wary of dealing with that stack.

              Storing the data is the easy part but what's the point if you can't run queries against it and produce dashboards and diagnostic tools that actually help you? Prometheus/grafana or older graphite type setups tend to be compromises where you get lots of data but are then limited on the querying front or the number of metrics. The tradeoff is always between scale and querying flexibility. If you store tens/hundreds of GB of telemetry per day, you need a way to make sense of it. Clickhouse seems to be quite good at scaling and querying. It's basically a column database. I don't have direct experience with Loki.

              But in the end, all that power only matters if people actually use it. And, again, in my experience teams tend not to. They tend to have a lot of unrealized aspirations around their tools and infrastructure. If it's just a dumping ground for data + a few simplistic dashboards, optimize for that. A lot of that data is actually only kept for compliance/auditing reasons. For that, querying is usually a secondary concern and it's OK if queries take a bit longer and are less powerful.

                • tecoholic

                  today at 9:59 AM

                  I agree. The sentiment applies to most analytics. People who setup analytics are not the same as end users.

              • adenta

                today at 2:12 AM

                I'm partial to open observe, especially because in Ruby the OTEL stuff isn't great for metrics and logs yet.

                  • lytedev

                    today at 2:30 AM

                    I also run open observe at home, but I can't help but feel that the interface could use some... sparkle, and the mobile experience kinda sucks.

                    But you can't beat the excellent price and performance. Does what I need and much more

            • oulipo2

              today at 8:04 AM

              There's a few contenders in self-hostable otel:

              - ClickStack (ex HyperDX) - SigNoz - Traceway - a few more

              does someone has enough feedback on those to be able to tell which one works best?

              • ting0

                today at 9:17 AM

                This looks cool

                • sgt

                  today at 7:04 AM

                  Funny, the first thing I look for for infra projects like these is to find out if it's written in Go. At that point, my confidence level is increased.

                    • neya

                      today at 9:50 AM

                      Here's something better than that:

                      https://github.com/plausible/analytics

                      Elixir.

                        • sexylinux

                          today at 10:15 AM

                          Why is it better? On the internet it is not enough to just say something. You need to deliver some facts and / or a comparison. Please try it.

                  • today at 6:19 AM