03 — OpenTelemetry
Technical Overview
OpenTelemetry (OTel) is a CNCF project providing a vendor-neutral, open-standard framework for generating, collecting, and exporting telemetry data (metrics, logs, and traces). It was formed in 2019 through the merger of OpenCensus (Google) and OpenTracing (CNCF), resolving the fragmentation problem where two competing instrumentation standards forced library authors to choose. OTel is now the industry standard: all major observability vendors — Datadog, Dynatrace, New Relic, Honeycomb, Grafana, Splunk — accept OTLP (OpenTelemetry Protocol), and major open-source projects (Jaeger, Zipkin, Prometheus) all support it.
The core value proposition: instrument your application once with OTel APIs, and ship telemetry to any backend without changing application code.
Prerequisites
- Familiarity with distributed tracing concepts (trace, span, context propagation)
- Basic understanding of metrics and structured logs
- Experience with at least one programming language (Java, Python, Go, or JavaScript)
- Familiarity with Docker/Kubernetes for the Collector deployment
Core Content
OTel Architecture Overview
OTel ARCHITECTURE
Application OTel Collector Backends
┌────────────────────┐ ┌────────────────────────┐
│ │ │ │
│ OTel API │ │ Receivers │ ┌──────────────┐
│ (vendor-neutral │ │ ┌──────────────────┐ │ │ Jaeger/Tempo │
│ interfaces) │ │ │ OTLP gRPC/HTTP │ │──→│ (traces) │
│ │ │ │ │ Prometheus │ │ └──────────────┘
│ ▼ │ │ │ Zipkin │ │
│ OTel SDK │ OTLP │ │ Kafka, Fluentd │ │ ┌──────────────┐
│ (implementation: │──────→│ └──────────────────┘ │──→│ Prometheus / │
│ trace, metric, │ │ │ │ Mimir/Cortex │
│ log providers) │ │ Processors │ │ (metrics) │
│ │ │ ┌──────────────────┐ │ └──────────────┘
│ Auto-Instrument. │ │ │ batch │ │
│ (Java agent / │ │ │ memory_limiter │ │ ┌──────────────┐
│ Python bytecode) │ │ │ tail_sampling │ │──→│ Loki / ES │
└────────────────────┘ │ │ attributes │ │ │ (logs) │
│ │ resource detect │ │ └──────────────┘
│ └──────────────────┘ │
│ │ ┌──────────────┐
│ Exporters │──→│ Datadog / │
│ ┌──────────────────┐ │ │ Honeycomb / │
│ │ OTLP, Jaeger │ │ │ New Relic │
│ │ Prometheus, Loki │ │ └──────────────┘
│ │ Datadog, Kafka │ │
│ └──────────────────┘ │
└────────────────────────┘
The Three OTel Components
OTel API: Language-specific interfaces (no implementation). Application code imports the API. If no SDK is registered, API calls are no-ops — zero overhead. This allows library authors to instrument their code without forcing a specific vendor on end users.
OTel SDK: The implementation behind the API. Configures exporters, samplers, processors. Typically configured by the application operator, not the library author. SDK handles batching, queuing, and retry of telemetry export.
OTel Collector: A standalone binary that receives telemetry from applications (or other collectors), processes it, and exports to backends. Decouples application instrumentation from backend configuration. You can change backends without touching application code by reconfiguring the Collector.
OTLP: The Wire Protocol
OTLP (OpenTelemetry Protocol) is the native protocol. It supports gRPC (port 4317) and HTTP/protobuf (port 4318). Most SDKs default to OTLP gRPC. The protobuf schema defines ResourceSpans, ResourceMetrics, and ResourceLogs as top-level messages, each carrying a Resource (common attributes for the emitting entity — service name, host, k8s pod) plus the signal data.
# Verify an application is sending OTLP traces to the Collector
grpcurl -plaintext localhost:4317 list
# Should show: opentelemetry.proto.collector.trace.v1.TraceService
# Check Collector health
curl http://localhost:13133/ # default health_check extension port
Auto-Instrumentation
Zero-code instrumentation means you do not modify application source to get traces, metrics, and logs.
Java Agent: A JAR attached as a JVM agent (-javaagent:opentelemetry-javaagent.jar). It uses bytecode manipulation (via ByteBuddy) to instrument frameworks (Spring, Tomcat, JDBC, gRPC, Kafka, etc.) at class load time. Instruments 150+ libraries automatically.
# Launch Java service with OTel auto-instrumentation
java -javaagent:opentelemetry-javaagent-2.x.jar \
-Dotel.service.name=checkout-service \
-Dotel.exporter.otlp.endpoint=http://otel-collector:4317 \
-Dotel.metrics.exporter=otlp \
-jar app.jar
Python: Uses sitecustomize.py + monkey-patching of standard library modules and popular frameworks (Django, Flask, FastAPI, SQLAlchemy, requests, grpc).
# Install and run Python auto-instrumentation
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install # installs instrumentation packages for detected frameworks
opentelemetry-instrument \
--service_name=payment-service \
--exporter_otlp_endpoint=http://otel-collector:4317 \
python app.py
Node.js: --require @opentelemetry/auto-instrumentations-node/register instruments Express, Fastify, gRPC, pg, mysql, redis, etc.
Kubernetes auto-instrumentation with OTel Operator: The OTel Operator for Kubernetes can inject auto-instrumentation by watching pod annotations:
# On the deployment:
annotations:
instrumentation.opentelemetry.io/inject-java: "true"
# Operator injects the javaagent sidecar and environment variables automatically
Manual Instrumentation
For operations not covered by auto-instrumentation (business logic, batch jobs, custom protocols):
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
tracer = trace.get_tracer(__name__)
def process_payment(order_id: str, amount: float):
with tracer.start_as_current_span("process_payment") as span:
span.set_attribute("order.id", order_id)
span.set_attribute("payment.amount", amount)
span.set_attribute("payment.currency", "USD")
try:
result = call_payment_gateway(order_id, amount)
span.set_attribute("payment.gateway_txn_id", result.txn_id)
span.set_status(Status(StatusCode.OK))
return result
except PaymentDeclinedException as e:
span.set_status(Status(StatusCode.ERROR, str(e)))
span.record_exception(e)
raise
Span Events (structured log entries inside a span):
span.add_event("retry_attempt", {
"retry.count": 2,
"retry.reason": "connection_timeout"
})
OTel Collector Architecture
The Collector uses a pipeline model: receivers → processors → exporters. Multiple pipelines can exist for different signal types.
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
prometheus:
config:
scrape_configs:
- job_name: 'self-monitoring'
static_configs:
- targets: ['localhost:8888']
processors:
batch:
timeout: 1s
send_batch_size: 1024
memory_limiter:
check_interval: 1s
limit_mib: 1500
spike_limit_mib: 512
resource:
attributes:
- key: deployment.environment
value: production
action: upsert
attributes:
actions:
- key: http.user_agent # redact PII
action: delete
exporters:
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
prometheusremotewrite:
endpoint: http://mimir:9009/api/v1/push
loki:
endpoint: http://loki:3100/loki/api/v1/push
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, attributes]
exporters: [otlp/tempo]
metrics:
receivers: [otlp, prometheus]
processors: [memory_limiter, batch, resource]
exporters: [prometheusremotewrite]
logs:
receivers: [otlp]
processors: [memory_limiter, batch, attributes]
exporters: [loki]
Tail Sampling in the OTel Collector
Head sampling (random %) is simple but discards interesting traces. Tail sampling defers the sampling decision until the full trace is assembled. The tailsampling processor in the Collector holds spans in memory until a trace is complete or a timeout expires, then applies policies:
processors:
tail_sampling:
decision_wait: 10s # wait this long for a complete trace
num_traces: 100000 # max traces held in memory
policies:
- name: errors-policy
type: status_code
status_code: {status_codes: [ERROR]}
- name: slow-traces-policy
type: latency
latency: {threshold_ms: 500}
- name: sample-10pct
type: probabilistic
probabilistic: {sampling_percentage: 10}
With this config: all error traces are kept, all traces >500ms are kept, and 10% of the remaining traces are kept. This provides high-value samples without uniform 100% overhead.
Semantic Conventions
OTel defines standardized attribute names to ensure consistent meaning across backends and vendors. Key conventions:
# HTTP spans
http.method = "GET"
http.url = "https://api.example.com/checkout"
http.status_code = 200
http.request_content_length = 1024
http.flavor = "1.1"
# Database spans
db.system = "postgresql"
db.name = "orders"
db.statement = "SELECT * FROM orders WHERE id = $1"
db.operation = "SELECT"
# Messaging
messaging.system = "kafka"
messaging.destination = "orders.created"
messaging.operation = "publish"
# Service identity (Resource attributes)
service.name = "checkout-service"
service.version = "1.4.2"
deployment.environment = "production"
k8s.pod.name = "checkout-59d8b6-xk7qm"
k8s.namespace.name = "payments"
OTel for Metrics
OTel provides its own metrics API and SDK, intended to complement or eventually replace Prometheus client libraries. OTel metrics support the same types (Counter, Gauge, Histogram) plus UpDownCounter (can go negative, like Prometheus Gauge). The SDK exports to Prometheus format (via exposition endpoint), OTLP, or other exporters.
The key advantage is consistency: with OTel, traces and metrics share the same SDK, same Resource attributes, and same propagation context — enabling native metric-to-trace correlation without exemplar complexity.
Historical Context
Before OTel, the distributed tracing ecosystem was fragmented: - OpenTracing (CNCF, 2016): defined a tracing API abstraction. Libraries would instrument against OpenTracing, and users would plug in Jaeger, Zipkin, or LightStep. But OpenTracing only covered traces, not metrics or logs. - OpenCensus (Google, 2017): covered both traces and metrics from the same API. Supported by Google, Microsoft. But had a different API design and no standard protocol.
Two competing standards caused library maintainers to either choose one, support both, or instrument neither. In 2019, the two communities merged, forming OpenTelemetry under CNCF. The design incorporated lessons from both projects: OpenTracing's extensibility and OpenCensus's multi-signal scope. OTLP emerged as the wire protocol, inspired by the need for a protocol that could carry all three signals in one connection.
As of 2024, OpenTelemetry is in full production stability for traces and metrics; logs are in beta for most languages but stable in Go.
Production Examples
# Check OTel Collector internal metrics (self-telemetry)
curl http://localhost:8888/metrics | grep otelcol_receiver_accepted_spans
# Diagnose dropped spans
curl http://localhost:8888/metrics | grep otelcol_exporter_send_failed_spans
# View Collector pipeline configuration at runtime
curl http://localhost:55679/debug/tracez # zpages extension
# Validate OTel Collector config
docker run otel/opentelemetry-collector:latest validate --config=/etc/otel/config.yaml
Real production Collector deployment with resource limits for Kubernetes:
# Deployment recommendation: Collector as DaemonSet (one per node)
# Each Collector handles ~10k spans/sec per core
# Memory: 1GB for tail sampling with 100k trace buffer
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
Debugging Notes
Spans not appearing in Jaeger/Tempo:
1. Check Collector logs: kubectl logs deploy/otel-collector | grep -i error
2. Check Collector receiver accepted spans metric: otelcol_receiver_accepted_spans_total
3. Verify OTLP endpoint is reachable: grpcurl -plaintext otel-collector:4317 opentelemetry.proto.collector.trace.v1.TraceService/Export
4. Check application SDK config: OTEL_LOG_LEVEL=debug for verbose SDK logging
Context propagation broken (traces not joined):
- Ensure all services use compatible propagators. Default is tracecontext,baggage (W3C). If one service uses B3 headers (Zipkin legacy), cross-service traces will not link.
- Set explicitly: OTEL_PROPAGATORS=tracecontext,baggage
High memory in tail sampling Collector: Tail sampling holds full traces in memory. At 100k traces * 10 spans/trace * 1KB/span = 1GB. Tune num_traces and decision_wait based on your trace completion time (usually 95th-percentile request latency + 2x).
Security Implications
- OTLP endpoints should require mTLS in production. Unauthenticated OTLP allows any process to inject arbitrary spans into your tracing backend, polluting traces and potentially overloading storage.
- The
attributesprocessor in the Collector can be used to redact PII from spans before export. This is the right place to striphttp.user_agent, user IDs from URL parameters, and authorization headers. - Collector's zpages extension (
/debug/tracez,/debug/pipelinez) exposes internal state. Disable or restrict access in production. - Be aware that trace span attributes may contain secrets (API keys in headers, passwords in SQL queries). Auto-instrumentation that captures full SQL statements or HTTP headers can inadvertently capture credentials.
Performance Implications
- Auto-instrumentation adds 5-15% overhead in Java (bytecode rewriting). Python has higher overhead due to the interpreted nature.
- Manual instrumentation is cheaper: a span creation in Go is ~1-3 microseconds. Attribute setting is ~100ns per attribute.
- OTLP gRPC with batching (default: 512 spans per batch, 5s timeout) is efficient. Raw throughput: a single Collector on 4 vCPUs can handle ~50k spans/sec.
- Tail sampling buffers full traces in memory. Baseline: 100k trace buffer = ~1-2GB RAM depending on average trace size.
- Use sampling aggressively: 1-10% head sampling before tail sampling for extremely high-volume services (>100k req/s).
Failure Modes and Real Incidents
Collector OOM during traffic spike: Tail sampling Collector held 1M traces in memory during a holiday traffic spike. The memory_limiter processor was not configured. Fix: always configure memory_limiter as the first processor in all pipelines; set limit_mib to 80% of available memory.
Propagation header incompatibility after Zipkin migration: A team migrating from Zipkin to OTel left some services still using B3 headers. Traces appeared in Jaeger as disconnected single-service traces. Fixed by configuring all services with OTEL_PROPAGATORS=b3multi,tracecontext during the transition period.
Auto-instrumentation breaking SQL queries: Java auto-instrumentation captured full SQL statements including user-submitted data in db.statement attributes. Sensitive PII appeared in Jaeger trace UI. Fix: configure otel.instrumentation.common.db-statement-sanitizer.enabled=true (enabled by default in recent versions).
Modern Usage
- OTel Operator for Kubernetes (stable 2023): manages OTel Collector deployments and pod auto-instrumentation via CRDs. Standard deployment pattern in production K8s.
- OTel Arrow (2024): a new columnar batch format for OTLP transport, achieving 10x compression improvement over protobuf for large batches.
- Profiles signal: OTel is adding continuous profiling as a fourth signal, with a standard profile data model. Integration with Parca and Pyroscope is in progress.
- OpenTelemetry Weaver (2024): schema-first approach to semantic conventions, generating SDK code from convention definitions.
Future Directions
- Unified storage: As OTel becomes the standard ingestion protocol, backends (Grafana stack, Datadog) are converging on native OTLP storage, eliminating the need for signal-specific protocols.
- Client-side telemetry: OTel for browsers and mobile apps (RUM — Real User Monitoring) to extend traces from backend to the end user's device.
- AI/LLM observability: OTel semantic conventions for GenAI systems (LLM token counts, model name, prompt/completion attributes) are being standardized to enable observability for AI inference pipelines.
Exercises
-
End-to-end OTel setup: Deploy an OTel Collector locally using Docker. Instrument a simple Python Flask app with
opentelemetry-instrument. Send traces to Jaeger and metrics to Prometheus. Verify traces appear in Jaeger and metrics in Prometheus with correctservice.namelabels. -
Tail sampling policy: Configure a tail sampling processor that retains: 100% of traces with status ERROR, 100% of traces with latency >1s, and 5% of all other traces. Generate load with a mixture of fast/slow/error requests and verify the sampling policy behavior in Jaeger.
-
Context propagation chain: Build two microservices (any language). Service A receives an HTTP request and calls Service B. Verify that the trace ID is propagated via the
traceparentheader. Usecurl -vto inspect the header. Remove the propagator from Service B and observe that traces break. -
Semantic conventions compliance: Review a production service's OTel instrumentation. Identify any attributes that don't follow semantic conventions (e.g., using
urlinstead ofhttp.url,statusinstead ofhttp.status_code). Fix them and verify improved trace readability in your backend. -
Collector pipeline configuration: Set up a Collector that receives OTLP, adds a
deployment.environment=stagingresource attribute, drops all spans withhttp.target = "/health", and exports to two backends simultaneously (Jaeger and a file exporter). Verify both backends receive the same traces minus health-check spans.
References
- OpenTelemetry Documentation: https://opentelemetry.io/docs/
- OTLP Specification: https://opentelemetry.io/docs/specs/otlp/
- Semantic Conventions: https://opentelemetry.io/docs/specs/semconv/
- OTel Collector GitHub: https://github.com/open-telemetry/opentelemetry-collector-contrib
- Sigelman, Ben et al. "OpenTracing: Distributed Tracing's Big Tent." 2016.
- "Merging OpenTracing and OpenCensus." CNCF Blog, 2019.
- Yahn, Ted. Distributed Systems Observability. O'Reilly, 2018.
- Grafana OTel Getting Started: https://grafana.com/docs/opentelemetry/