03 — OpenTelemetry

Technical Overview

OpenTelemetry (OTel) is a CNCF project providing a vendor-neutral, open-standard framework for generating, collecting, and exporting telemetry data (metrics, logs, and traces). It was formed in 2019 through the merger of OpenCensus (Google) and OpenTracing (CNCF), resolving the fragmentation problem where two competing instrumentation standards forced library authors to choose. OTel is now the industry standard: all major observability vendors — Datadog, Dynatrace, New Relic, Honeycomb, Grafana, Splunk — accept OTLP (OpenTelemetry Protocol), and major open-source projects (Jaeger, Zipkin, Prometheus) all support it.

The core value proposition: instrument your application once with OTel APIs, and ship telemetry to any backend without changing application code.

Prerequisites

Familiarity with distributed tracing concepts (trace, span, context propagation)
Basic understanding of metrics and structured logs
Experience with at least one programming language (Java, Python, Go, or JavaScript)
Familiarity with Docker/Kubernetes for the Collector deployment

Core Content

OTel Architecture Overview

OTel ARCHITECTURE

  Application                  OTel Collector               Backends
  ┌────────────────────┐       ┌────────────────────────┐
  │                    │       │                        │
  │  OTel API          │       │  Receivers             │   ┌──────────────┐
  │  (vendor-neutral   │       │  ┌──────────────────┐  │   │ Jaeger/Tempo │
  │   interfaces)      │       │  │ OTLP gRPC/HTTP   │  │──→│ (traces)     │
  │         │          │       │  │ Prometheus        │  │   └──────────────┘
  │         ▼          │       │  │ Zipkin            │  │
  │  OTel SDK          │ OTLP  │  │ Kafka, Fluentd    │  │   ┌──────────────┐
  │  (implementation:  │──────→│  └──────────────────┘  │──→│ Prometheus / │
  │   trace, metric,   │       │                        │   │ Mimir/Cortex │
  │   log providers)   │       │  Processors            │   │ (metrics)    │
  │                    │       │  ┌──────────────────┐  │   └──────────────┘
  │  Auto-Instrument.  │       │  │ batch            │  │
  │  (Java agent /     │       │  │ memory_limiter   │  │   ┌──────────────┐
  │   Python bytecode) │       │  │ tail_sampling    │  │──→│ Loki / ES    │
  └────────────────────┘       │  │ attributes       │  │   │ (logs)       │
                               │  │ resource detect  │  │   └──────────────┘
                               │  └──────────────────┘  │
                               │                        │   ┌──────────────┐
                               │  Exporters             │──→│ Datadog /    │
                               │  ┌──────────────────┐  │   │ Honeycomb /  │
                               │  │ OTLP, Jaeger     │  │   │ New Relic    │
                               │  │ Prometheus, Loki │  │   └──────────────┘
                               │  │ Datadog, Kafka   │  │
                               │  └──────────────────┘  │
                               └────────────────────────┘

The Three OTel Components

OTel API: Language-specific interfaces (no implementation). Application code imports the API. If no SDK is registered, API calls are no-ops — zero overhead. This allows library authors to instrument their code without forcing a specific vendor on end users.

OTel SDK: The implementation behind the API. Configures exporters, samplers, processors. Typically configured by the application operator, not the library author. SDK handles batching, queuing, and retry of telemetry export.

OTel Collector: A standalone binary that receives telemetry from applications (or other collectors), processes it, and exports to backends. Decouples application instrumentation from backend configuration. You can change backends without touching application code by reconfiguring the Collector.

OTLP: The Wire Protocol

OTLP (OpenTelemetry Protocol) is the native protocol. It supports gRPC (port 4317) and HTTP/protobuf (port 4318). Most SDKs default to OTLP gRPC. The protobuf schema defines ResourceSpans, ResourceMetrics, and ResourceLogs as top-level messages, each carrying a Resource (common attributes for the emitting entity — service name, host, k8s pod) plus the signal data.

# Verify an application is sending OTLP traces to the Collector
grpcurl -plaintext localhost:4317 list
# Should show: opentelemetry.proto.collector.trace.v1.TraceService

# Check Collector health
curl http://localhost:13133/  # default health_check extension port

Auto-Instrumentation

Zero-code instrumentation means you do not modify application source to get traces, metrics, and logs.

Java Agent: A JAR attached as a JVM agent (-javaagent:opentelemetry-javaagent.jar). It uses bytecode manipulation (via ByteBuddy) to instrument frameworks (Spring, Tomcat, JDBC, gRPC, Kafka, etc.) at class load time. Instruments 150+ libraries automatically.

# Launch Java service with OTel auto-instrumentation
java -javaagent:opentelemetry-javaagent-2.x.jar \
     -Dotel.service.name=checkout-service \
     -Dotel.exporter.otlp.endpoint=http://otel-collector:4317 \
     -Dotel.metrics.exporter=otlp \
     -jar app.jar

Python: Uses sitecustomize.py + monkey-patching of standard library modules and popular frameworks (Django, Flask, FastAPI, SQLAlchemy, requests, grpc).

# Install and run Python auto-instrumentation
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install  # installs instrumentation packages for detected frameworks
opentelemetry-instrument \
  --service_name=payment-service \
  --exporter_otlp_endpoint=http://otel-collector:4317 \
  python app.py

Node.js: --require @opentelemetry/auto-instrumentations-node/register instruments Express, Fastify, gRPC, pg, mysql, redis, etc.

Kubernetes auto-instrumentation with OTel Operator: The OTel Operator for Kubernetes can inject auto-instrumentation by watching pod annotations:

# On the deployment:
annotations:
  instrumentation.opentelemetry.io/inject-java: "true"
  # Operator injects the javaagent sidecar and environment variables automatically

Manual Instrumentation

For operations not covered by auto-instrumentation (business logic, batch jobs, custom protocols):

from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode

tracer = trace.get_tracer(__name__)

def process_payment(order_id: str, amount: float):
    with tracer.start_as_current_span("process_payment") as span:
        span.set_attribute("order.id", order_id)
        span.set_attribute("payment.amount", amount)
        span.set_attribute("payment.currency", "USD")

        try:
            result = call_payment_gateway(order_id, amount)
            span.set_attribute("payment.gateway_txn_id", result.txn_id)
            span.set_status(Status(StatusCode.OK))
            return result
        except PaymentDeclinedException as e:
            span.set_status(Status(StatusCode.ERROR, str(e)))
            span.record_exception(e)
            raise

Span Events (structured log entries inside a span):

span.add_event("retry_attempt", {
    "retry.count": 2,
    "retry.reason": "connection_timeout"
})

OTel Collector Architecture

The Collector uses a pipeline model: receivers → processors → exporters. Multiple pipelines can exist for different signal types.

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  prometheus:
    config:
      scrape_configs:
        - job_name: 'self-monitoring'
          static_configs:
            - targets: ['localhost:8888']

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024
  memory_limiter:
    check_interval: 1s
    limit_mib: 1500
    spike_limit_mib: 512
  resource:
    attributes:
      - key: deployment.environment
        value: production
        action: upsert
  attributes:
    actions:
      - key: http.user_agent   # redact PII
        action: delete

exporters:
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true
  prometheusremotewrite:
    endpoint: http://mimir:9009/api/v1/push
  loki:
    endpoint: http://loki:3100/loki/api/v1/push

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, attributes]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp, prometheus]
      processors: [memory_limiter, batch, resource]
      exporters: [prometheusremotewrite]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch, attributes]
      exporters: [loki]

Tail Sampling in the OTel Collector

Head sampling (random %) is simple but discards interesting traces. Tail sampling defers the sampling decision until the full trace is assembled. The tailsampling processor in the Collector holds spans in memory until a trace is complete or a timeout expires, then applies policies:

processors:
  tail_sampling:
    decision_wait: 10s      # wait this long for a complete trace
    num_traces: 100000      # max traces held in memory
    policies:
      - name: errors-policy
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow-traces-policy
        type: latency
        latency: {threshold_ms: 500}
      - name: sample-10pct
        type: probabilistic
        probabilistic: {sampling_percentage: 10}

With this config: all error traces are kept, all traces >500ms are kept, and 10% of the remaining traces are kept. This provides high-value samples without uniform 100% overhead.

Semantic Conventions

OTel defines standardized attribute names to ensure consistent meaning across backends and vendors. Key conventions:

# HTTP spans
http.method = "GET"
http.url = "https://api.example.com/checkout"
http.status_code = 200
http.request_content_length = 1024
http.flavor = "1.1"

# Database spans
db.system = "postgresql"
db.name = "orders"
db.statement = "SELECT * FROM orders WHERE id = $1"
db.operation = "SELECT"

# Messaging
messaging.system = "kafka"
messaging.destination = "orders.created"
messaging.operation = "publish"

# Service identity (Resource attributes)
service.name = "checkout-service"
service.version = "1.4.2"
deployment.environment = "production"
k8s.pod.name = "checkout-59d8b6-xk7qm"
k8s.namespace.name = "payments"

OTel for Metrics

OTel provides its own metrics API and SDK, intended to complement or eventually replace Prometheus client libraries. OTel metrics support the same types (Counter, Gauge, Histogram) plus UpDownCounter (can go negative, like Prometheus Gauge). The SDK exports to Prometheus format (via exposition endpoint), OTLP, or other exporters.

The key advantage is consistency: with OTel, traces and metrics share the same SDK, same Resource attributes, and same propagation context — enabling native metric-to-trace correlation without exemplar complexity.

Historical Context

Before OTel, the distributed tracing ecosystem was fragmented: - OpenTracing (CNCF, 2016): defined a tracing API abstraction. Libraries would instrument against OpenTracing, and users would plug in Jaeger, Zipkin, or LightStep. But OpenTracing only covered traces, not metrics or logs. - OpenCensus (Google, 2017): covered both traces and metrics from the same API. Supported by Google, Microsoft. But had a different API design and no standard protocol.

Two competing standards caused library maintainers to either choose one, support both, or instrument neither. In 2019, the two communities merged, forming OpenTelemetry under CNCF. The design incorporated lessons from both projects: OpenTracing's extensibility and OpenCensus's multi-signal scope. OTLP emerged as the wire protocol, inspired by the need for a protocol that could carry all three signals in one connection.

As of 2024, OpenTelemetry is in full production stability for traces and metrics; logs are in beta for most languages but stable in Go.

Production Examples

# Check OTel Collector internal metrics (self-telemetry)
curl http://localhost:8888/metrics | grep otelcol_receiver_accepted_spans

# Diagnose dropped spans
curl http://localhost:8888/metrics | grep otelcol_exporter_send_failed_spans

# View Collector pipeline configuration at runtime
curl http://localhost:55679/debug/tracez  # zpages extension

# Validate OTel Collector config
docker run otel/opentelemetry-collector:latest validate --config=/etc/otel/config.yaml

Real production Collector deployment with resource limits for Kubernetes:

# Deployment recommendation: Collector as DaemonSet (one per node)
# Each Collector handles ~10k spans/sec per core
# Memory: 1GB for tail sampling with 100k trace buffer
resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: 2000m
    memory: 2Gi

Debugging Notes

Spans not appearing in Jaeger/Tempo: 1. Check Collector logs: kubectl logs deploy/otel-collector | grep -i error 2. Check Collector receiver accepted spans metric: otelcol_receiver_accepted_spans_total 3. Verify OTLP endpoint is reachable: grpcurl -plaintext otel-collector:4317 opentelemetry.proto.collector.trace.v1.TraceService/Export 4. Check application SDK config: OTEL_LOG_LEVEL=debug for verbose SDK logging

Context propagation broken (traces not joined): - Ensure all services use compatible propagators. Default is tracecontext,baggage (W3C). If one service uses B3 headers (Zipkin legacy), cross-service traces will not link. - Set explicitly: OTEL_PROPAGATORS=tracecontext,baggage

High memory in tail sampling Collector: Tail sampling holds full traces in memory. At 100k traces * 10 spans/trace * 1KB/span = 1GB. Tune num_traces and decision_wait based on your trace completion time (usually 95th-percentile request latency + 2x).

Security Implications

OTLP endpoints should require mTLS in production. Unauthenticated OTLP allows any process to inject arbitrary spans into your tracing backend, polluting traces and potentially overloading storage.
The attributes processor in the Collector can be used to redact PII from spans before export. This is the right place to strip http.user_agent, user IDs from URL parameters, and authorization headers.
Collector's zpages extension (/debug/tracez, /debug/pipelinez) exposes internal state. Disable or restrict access in production.
Be aware that trace span attributes may contain secrets (API keys in headers, passwords in SQL queries). Auto-instrumentation that captures full SQL statements or HTTP headers can inadvertently capture credentials.

Performance Implications

Auto-instrumentation adds 5-15% overhead in Java (bytecode rewriting). Python has higher overhead due to the interpreted nature.
Manual instrumentation is cheaper: a span creation in Go is ~1-3 microseconds. Attribute setting is ~100ns per attribute.
OTLP gRPC with batching (default: 512 spans per batch, 5s timeout) is efficient. Raw throughput: a single Collector on 4 vCPUs can handle ~50k spans/sec.
Tail sampling buffers full traces in memory. Baseline: 100k trace buffer = ~1-2GB RAM depending on average trace size.
Use sampling aggressively: 1-10% head sampling before tail sampling for extremely high-volume services (>100k req/s).

Failure Modes and Real Incidents

Collector OOM during traffic spike: Tail sampling Collector held 1M traces in memory during a holiday traffic spike. The memory_limiter processor was not configured. Fix: always configure memory_limiter as the first processor in all pipelines; set limit_mib to 80% of available memory.

Propagation header incompatibility after Zipkin migration: A team migrating from Zipkin to OTel left some services still using B3 headers. Traces appeared in Jaeger as disconnected single-service traces. Fixed by configuring all services with OTEL_PROPAGATORS=b3multi,tracecontext during the transition period.

Auto-instrumentation breaking SQL queries: Java auto-instrumentation captured full SQL statements including user-submitted data in db.statement attributes. Sensitive PII appeared in Jaeger trace UI. Fix: configure otel.instrumentation.common.db-statement-sanitizer.enabled=true (enabled by default in recent versions).

Modern Usage

OTel Operator for Kubernetes (stable 2023): manages OTel Collector deployments and pod auto-instrumentation via CRDs. Standard deployment pattern in production K8s.
OTel Arrow (2024): a new columnar batch format for OTLP transport, achieving 10x compression improvement over protobuf for large batches.
Profiles signal: OTel is adding continuous profiling as a fourth signal, with a standard profile data model. Integration with Parca and Pyroscope is in progress.
OpenTelemetry Weaver (2024): schema-first approach to semantic conventions, generating SDK code from convention definitions.

Future Directions

Unified storage: As OTel becomes the standard ingestion protocol, backends (Grafana stack, Datadog) are converging on native OTLP storage, eliminating the need for signal-specific protocols.
Client-side telemetry: OTel for browsers and mobile apps (RUM — Real User Monitoring) to extend traces from backend to the end user's device.
AI/LLM observability: OTel semantic conventions for GenAI systems (LLM token counts, model name, prompt/completion attributes) are being standardized to enable observability for AI inference pipelines.

Exercises

End-to-end OTel setup: Deploy an OTel Collector locally using Docker. Instrument a simple Python Flask app with opentelemetry-instrument. Send traces to Jaeger and metrics to Prometheus. Verify traces appear in Jaeger and metrics in Prometheus with correct service.name labels.
Tail sampling policy: Configure a tail sampling processor that retains: 100% of traces with status ERROR, 100% of traces with latency >1s, and 5% of all other traces. Generate load with a mixture of fast/slow/error requests and verify the sampling policy behavior in Jaeger.
Context propagation chain: Build two microservices (any language). Service A receives an HTTP request and calls Service B. Verify that the trace ID is propagated via the traceparent header. Use curl -v to inspect the header. Remove the propagator from Service B and observe that traces break.
Semantic conventions compliance: Review a production service's OTel instrumentation. Identify any attributes that don't follow semantic conventions (e.g., using url instead of http.url, status instead of http.status_code). Fix them and verify improved trace readability in your backend.
Collector pipeline configuration: Set up a Collector that receives OTLP, adds a deployment.environment=staging resource attribute, drops all spans with http.target = "/health", and exports to two backends simultaneously (Jaeger and a file exporter). Verify both backends receive the same traces minus health-check spans.

References

OpenTelemetry Documentation: https://opentelemetry.io/docs/
OTLP Specification: https://opentelemetry.io/docs/specs/otlp/
Semantic Conventions: https://opentelemetry.io/docs/specs/semconv/
OTel Collector GitHub: https://github.com/open-telemetry/opentelemetry-collector-contrib
Sigelman, Ben et al. "OpenTracing: Distributed Tracing's Big Tent." 2016.
"Merging OpenTracing and OpenCensus." CNCF Blog, 2019.
Yahn, Ted. Distributed Systems Observability. O'Reilly, 2018.
Grafana OTel Getting Started: https://grafana.com/docs/opentelemetry/