Skip to content
Learni
View all tutorials
Observabilité

How to Implement OpenTelemetry in Java in 2026

Lire en français

Introduction

In 2026, Java applications are evolving toward complex distributed architectures like microservices and cloud-native environments. OpenTelemetry (OTel), the open standard born from the merger of OpenTracing and OpenCensus, has become essential for observability. Unlike proprietary tools, OTel unifies traces, metrics, and logs into a single framework, exportable to any backend (Jaeger, Prometheus, Grafana).

Why is it crucial? Imagine a Java e-commerce app where a customer request spans 10 services: without traces, pinpointing latency is a nightmare. OTel captures these flows automatically, reducing MTTR (Mean Time To Resolution) by 50% according to CNCF. This beginner tutorial, 100% theoretical, lays the foundations: concepts, architecture, and practices. No code, but concrete analogies to help you visualize implementation. By the end, you'll structure Java observability like a pro. (128 words)

Prerequisites

  • Basic Java knowledge (JDK 17+ recommended in 2026).
  • Understanding of observability: monitoring vs. tracing.
  • Familiarity with microservices or Spring Boot (no deep expertise needed).
  • Access to a backend like Jaeger or Zipkin for future tests.

Step 1: Understand OpenTelemetry's Core Pillars

OpenTelemetry rests on three main signals, like a pilot's three senses: view (traces), dashboard (metrics), logbook (logs).

  • Traces: Distributed request tracking. Analogy: a pizza delivery through multiple steps (kitchen, packaging, delivery). Each span (segment) measures duration and attributes (e.g., 'db.query' = 150ms).
  • Metrics: Numeric aggregates (counters, histograms). Real-world example: HTTP requests per minute in a Java service, with percentiles to spot spikes.
  • Logs: Contextual text events. Link to traces: a 'DB error' log attaches to a span for correlation.
SignalObjectiveJava Example
---------------------------------
TracesDistributed flowSpring Boot request spanning Feign clients
MetricsTrendsJVM heap usage over 24h
LogsDetailsException stack trace linked to a trace
These signals integrate via context propagation (W3C TraceContext), ensuring continuity across Java and non-Java services.

Step 2: Decode the OpenTelemetry Architecture

Think of OTel as a production line: API (abstract), SDK (implementation), Instrumentation (automatic), Exporter (output).

  • API: Stable, language-independent interface. In Java, io.opentelemetry.api defines Tracer, Meter, Logger.
  • SDK: Java engine (io.opentelemetry:opentelemetry-sdk). Configures processors, samplers.
  • Instrumentation: Magic libraries for Spring, JDBC, Kafka. E.g., @WithSpan on a Java method traces without boilerplate.
  • Exporter: Bridge to backends. OTLP (gRPC/HTTP) is the 2026 standard for Prometheus, Elastic.
Conceptual diagram:

Java App → API/SDK → Processors (Batch/Sampler) → Exporter → Collector → Backend

The OTel Collector (standalone component) aggregates, filters, and routes data, avoiding network overload in Java Kubernetes clusters.

Step 3: Master Context Propagation

In Java microservices, trace context travels like a passport. Baggage (custom data) and TraceContext (traceId, spanId) propagate via HTTP headers (e.g., traceparent).

Real-world example: Service A (Spring Boot) calls B via RestTemplate. OTel auto-injects headers; B extracts them for a child span.

Span states:

  • Active: In progress (CPU time).
  • Ended: Finalized, with status (OK/ERROR).

Samplers control volume: AlwaysOn (all), Probabilistic (1%), ParentBased (follows parent). In Java production, aim for <1% traces to avoid 10GB+/day.

Step 4: Integrate Metrics and Logs

Metrics: Four types in Java.

TypeDescriptionJava Usage
-------------------------------
CounterIncrementalRequests processed
UpDownCounter+ or -Connection pool
HistogramDistributionAPI latency (P50/P95/P99)
GaugeSnapshotActive threads
Logs: Structured logging with correlation. E.g., Logback OTel appender adds traceId to JSON logs.

Benefit: In Grafana, pivot from a 'high latency' metric to the causal trace, then detailed logs.

Step 5: Configure for Production

In 2026, configure via properties or env vars. E.g., OTEL_SERVICE_NAME=my-java-app, OTEL_TRACES_EXPORTER=otlp, OTEL_METRICS_EXPORTER=prometheus.

Key processors:

  • Batch: Groups spans (5s timeout, max 512).
  • Attributes: Limit to 128 per span to avoid cardinality explosion.

Java resources: OTel monitors JVM metrics (GC, threads) natively via Micrometer bridge.

Best Practices

  • Start with traces: 80% immediate value for distributed debugging.
  • Limit cardinality: Low-card attributes (userId) vs. high-card (request.body) → drop or hash.
  • Use semantic conventions: http.method=POST, db.statement for interoperability.
  • Adaptive sampling: Head-based in prod to focus on errors (1:1000 ratio).
  • Central Collector: Avoid direct exporters from Java pods for scalability.

Common Pitfalls to Avoid

  • No-op by default: Check OTEL_SDK_DISABLED=false, or everything stays silent.
  • Explosive cardinality: Logs with unique timestamps → billions of series; use {timestamp} templates.
  • Lost context: Forget propagation in async (CompletableFuture) → orphan traces.
  • Ignored overhead: Measure CPU (+5-10%); optimize with batching/small spans.

Next Steps