Documentation Index
Fetch the complete documentation index at: https://launchdarkly-preview.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Open
Telemetry Metrics OpenTelemetry (OTeL) is becoming the de facto standard for observability, providing a unified way to collect, process, and export telemetry data, including traces, logs, and metrics. While traces and logs are crucial for debugging, metrics offer a high-level view of system performance and health. Efficiently storing and querying these metrics is essential for real-time insights, and ClickHouse—a high-performance, columnar database—provides an ideal backend for scalable and cost-effective metric ingestion. At LaunchDarkly, we recently introduced support for OTeL metrics ingest. Below, we’ll describe how we structured the implementation to deliver an efficient OpenTelemetry metrics pipeline using ClickHouse, covering ingestion, aggregation, querying, and visualization.OTeL Metrics Formats
OpenTelemetry metrics are designed to be flexible, supporting various aggregation and encoding formats. The key formats include: • Gauge: Represents a single numerical value that changes over time, such as CPU usage or memory consumption. • Counter: A monotonically increasing value, commonly used for request counts or error rates. • Histogram: Captures the distribution of values over a given time period, useful for tracking request latencies. • Summary: Similar to histograms but includes percentile calculations for more detailed insights. The OTel protocol transmits these metric types in a structured format, typically in protobuf or JSON when using OTLP ( OpenTelemetry Protocol). Understanding these formats is crucial for designing an efficient ingestion pipeline that minimizes storage overhead while maximizing query performance.Building an Ingest Path
LaunchDarkly uses Apache Kafka to buffer data for bulk inserts into ClickHouse. While we use the OpenTelemetry collector to receive, deserialize, and batch data, we export to our Golang API that mutates the data before writing to Apache Kafka. A set of workers (the Apache Kafka Connect ClickHouse exporter) read the data and write it to ClickHouse in large batches.Open
Telemetry Collector Setup The OpenTelemetry Collector is a key component in an OTel pipeline, responsible for receiving, processing, and exporting telemetry data. For metric ingestion into ClickHouse, we configure the collector to receive OTel metrics via the OTLP receiver, process them using built-in processors (e.g., batch and transform), and export them to our API. Here’s an example OpenTelemetry Collector configuration for exporting metrics to our LaunchDarkly API which then batch exports data to ClickHouse:clickhouse collector export for direct
writes to the database. For our production use-case, we route the data through our API for pre-processing and write
buffering via Apache Kafka, but you may find success with the exporter even for large volumes.
Aggregating and Reducing Data Granularity
High-cardinality metrics can quickly balloon in storage size, making efficient aggregation crucial. ClickHouse provides materialized views and TTL-based rollups to downsample data while retaining aggregate insights. Our production data pipeline initially writes the metrics in their OTeL native format to one of three tables. Metrics are written to one of themetrics_sum, metrics_histogram, and metrics_summary tables.
The frequency of metric data can be a challenge with querying over wide time-ranges. While the OpenTelemetry SDK emitting the metrics may aggregate data, the collector does not perform any additional aggregation.
A real-world example: imagine having a 100-node Kubernetes cluster running your application. Each application instance is receiving many requests per second and emitting a number of latency metrics for each API endpoint. Even if the OTeL SDK is configured to aggregate metrics down to each second, each node will still produce one row per second for each of the unique metrics and their attributes. Any unique tags emitted on the metrics will result in unique metric rows written to ClickHouse. On top of that, the 100 nodes will all be sending their respective data which will not be aggregated by the Collector. The result: writing thousands of rows per second to ClickHouse with fine timestamp granularity.
Another reason to transform the data is to aggregate the different OTeL metrics formats into a cohesive one that’s easier to query. We went with a an approach that solves both problems, aggregating metric values to 1-second resolution and merging data between the metrics formats.
Below you’ll find the schema we adopted for each OTeL metric type along with the materialized views that perform aggregations:
Attributes for keys that are similar across metrics.