The Kubernetes Operator Pattern
Overview
An Operator is a method of packaging, deploying, and managing a Kubernetes application by encoding the operational knowledge of a human expert into software. The core insight is that managing complex stateful applications (databases, message queues, certificate authorities) requires more than just running containers — it requires understanding the application's lifecycle: how to safely scale, upgrade, backup, restore, and fail over. Operators codify this knowledge as a controller that watches Custom Resources and continuously reconciles the cluster state toward the desired configuration.
The Operator pattern is the foundation of the modern Kubernetes ecosystem. Products from Prometheus to Apache Kafka to PostgreSQL are managed by Operators in production, enabling self-service infrastructure at scale.
Prerequisites
- Kubernetes controller model and the API machinery (informers, listers, workqueues)
- Custom Resource Definitions (CRDs) and API extension concepts
- Go programming language basics (most operators are written in Go)
- Understanding of Kubernetes reconciliation philosophy
- Familiarity with specific domains (e.g., database clustering) to understand operator logic
Historical Context
The term "Operator" was coined by CoreOS engineers Brandon Philips and Josh Wood in a November 2016 blog post, "Introducing Operators: Putting Operational Knowledge into Software." The first publicly released operators were the etcd Operator and Prometheus Operator, both by CoreOS.
Before Operators, the common approach was Helm charts: templated YAML that could be rendered and applied. Helm handles installation and upgrades well but has no runtime intelligence — it cannot react to failures, perform rolling restarts safely, or validate application-specific constraints.
Key milestones: - 2016: "Operator" term coined; etcd Operator and Prometheus Operator released - 2018: CoreOS acquired by Red Hat; Operator Framework (Operator SDK) open-sourced - 2019: Operator Hub (operatorhub.io) launched; community operators published - 2020: Kubebuilder v2 released with improved scaffolding; controller-runtime becomes the standard library - 2021: OperatorLifecycleManager (OLM) matures; enables operator version management in cluster - 2022-2024: Thousands of operators in production across every major software category
The shift from "bash scripts + cron jobs for DB ops" to Operators represents a fundamental change: infrastructure automation becomes testable, versioned, and expressed in Kubernetes-native declarative APIs.
Custom Resource Definitions (CRDs)
CRDs extend the Kubernetes API with new resource types. Once a CRD is applied, the API server serves CRUD operations on the new resource just like built-in resources.
Standard Kubernetes API types:
/api/v1/pods
/apis/apps/v1/deployments
After applying a CRD for "databases.example.com":
/apis/example.com/v1alpha1/databases
/apis/example.com/v1alpha1/namespaces/production/databases/my-postgres
CRD structure:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: databases.example.com
spec:
group: example.com
versions:
- name: v1alpha1
served: true
storage: true
schema:
openAPIV3Schema: # validation schema
type: object
properties:
spec:
type: object
properties:
replicas:
type: integer
minimum: 1
maximum: 5
image:
type: string
storageSize:
type: string
status:
type: object
properties:
phase:
type: string
readyReplicas:
type: integer
subresources:
status: {} # enables /status subresource
scale: {} # enables /scale subresource for HPA
scope: Namespaced
names:
plural: databases
singular: database
kind: Database
shortNames: ["db"]
Once the CRD exists, users create instances (Custom Resources):
apiVersion: example.com/v1alpha1
kind: Database
metadata:
name: prod-postgres
namespace: production
spec:
replicas: 3
image: postgres:15.3
storageSize: 100Gi
This instance is stored in etcd and immediately watchable by any controller.
The Controller Pattern and Reconcile Loop
Every Kubernetes controller follows the same pattern: watch resources, compare desired state to actual state, take actions to close the gap.
Operator Reconcile Loop
+-----------+ Watch/List +------------------+
| etcd / | ===================> | Informer |
| API | (via long-polling | (local cache) |
| Server | watch stream) +------------------+
+-----------+ |
Add/Update/Delete event
|
v
+------------+
| Workqueue |
| (rate- |
| limited, |
| dedup) |
+------------+
|
Dequeue item (namespace/name)
|
v
+------------------------+
| Reconcile(Request) |
| |
| 1. Fetch CR from |
| cache (Get) |
| |
| 2. Check if exists |
| (handle deletion) |
| |
| 3. Fetch owned |
| resources |
| (Deployments, |
| Services, etc.) |
| |
| 4. Compare desired |
| vs actual |
| |
| 5. Create / Update / |
| Delete resources |
| |
| 6. Update CR status |
| |
| Return: Result |
| - {}: success |
| - Requeue: retry |
| - Error: backoff+retry|
+------------------------+
Critical properties of reconcile:
- Idempotent: Running reconcile twice with the same state must produce the same result. No side effects from running extra times.
- Level-triggered, not edge-triggered: The reconcile loop does not care about the sequence of events; it only cares about the current desired state vs current actual state. If five updates arrive in 100ms, the workqueue deduplicates them into one reconcile call.
- Optimistic concurrency: Use resourceVersion checks to detect and retry conflicts.
Operator SDK and Kubebuilder
Two main frameworks scaffold Operator code:
Kubebuilder (upstream CNCF project, maintained by sig-api-machinery):
- Scaffolds Go project with controller-runtime
- Generates CRD manifests from Go struct annotations
- Provides webhook scaffolding (MutatingWebhookConfiguration)
- Integrates envtest for controller testing
Operator SDK (Red Hat, wraps Kubebuilder + adds):
- Helm-based operators (wrap existing Helm charts with a controller)
- Ansible-based operators (reconcile via Ansible playbooks)
- Integration with OLM (OperatorLifecycleManager) for distribution
Both use controller-runtime under the hood:
import "sigs.k8s.io/controller-runtime"
Key controller-runtime types:
- Manager: orchestrates controllers, shared caches, leader election
- Controller: registers reconciler + watches
- Reconciler: interface with single Reconcile(ctx, req) method
- Client: typed API client (Get, List, Create, Update, Delete, Patch)
- Scheme: maps Go types to GVK (GroupVersionKind)
Minimal reconciler in Go:
func (r *DatabaseReconciler) Reconcile(
ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
// 1. Fetch the CR
var db examplev1.Database
if err := r.Get(ctx, req.NamespacedName, &db); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// 2. Create or update StatefulSet
sts := buildStatefulSet(&db)
if err := ctrl.SetControllerReference(&db, sts, r.Scheme); err != nil {
return ctrl.Result{}, err
}
var existing appsv1.StatefulSet
err := r.Get(ctx, types.NamespacedName{Name: sts.Name, Namespace: sts.Namespace}, &existing)
if errors.IsNotFound(err) {
return ctrl.Result{}, r.Create(ctx, sts)
}
if err != nil {
return ctrl.Result{}, err
}
// Update if spec changed
existing.Spec = sts.Spec
if err := r.Update(ctx, &existing); err != nil {
return ctrl.Result{}, err
}
// 3. Update status
db.Status.ReadyReplicas = existing.Status.ReadyReplicas
db.Status.Phase = computePhase(&existing)
if err := r.Status().Update(ctx, &db); err != nil {
return ctrl.Result{}, err
}
log.Info("Reconciled Database", "name", db.Name, "phase", db.Status.Phase)
return ctrl.Result{}, nil
}
Finalizers
Finalizers prevent deletion of a resource until cleanup is complete:
Without finalizer:
kubectl delete database prod-postgres
→ API server deletes object from etcd immediately
→ Persistent volumes, cloud resources, DNS entries: orphaned
With finalizer:
1. CR has metadata.finalizers: ["databases.example.com/cleanup"]
2. kubectl delete database prod-postgres
3. API server sets deletionTimestamp (soft delete) — does NOT delete yet
4. Reconcile loop detects deletionTimestamp != nil:
a. Run cleanup: delete cloud resources, volumes, backups
b. Remove finalizer: patch metadata.finalizers = []
5. API server sees no more finalizers → deletes object from etcd
Code:
const finalizerName = "databases.example.com/cleanup"
if db.ObjectMeta.DeletionTimestamp.IsZero() {
// Not being deleted — ensure finalizer is present
if !containsString(db.Finalizers, finalizerName) {
db.Finalizers = append(db.Finalizers, finalizerName)
return ctrl.Result{}, r.Update(ctx, &db)
}
} else {
// Being deleted — run cleanup then remove finalizer
if containsString(db.Finalizers, finalizerName) {
if err := r.cleanupExternalResources(&db); err != nil {
return ctrl.Result{}, err
}
db.Finalizers = removeString(db.Finalizers, finalizerName)
return ctrl.Result{}, r.Update(ctx, &db)
}
}
Owner References and Cascade Deletion
When an Operator creates child resources (Deployments, Services, ConfigMaps) on behalf of a CR, it sets owner references so that deleting the CR cascades to all children:
Database CR (owner)
├── StatefulSet (owned)
│ └── Pods (owned by StatefulSet)
├── Service (owned)
├── ConfigMap (owned)
└── Secret (owned)
metadata.ownerReferences:
- apiVersion: example.com/v1alpha1
kind: Database
name: prod-postgres
uid: abc-123
controller: true
blockOwnerDeletion: true
When Database is deleted:
→ garbage collector deletes StatefulSet, Service, ConfigMap, Secret
→ StatefulSet deletion cascades to Pods
blockOwnerDeletion: true — prevents owner deletion until child is gone
(used with finalizers to control deletion order)
Real-World Operators
Prometheus Operator (prometheus-community/kube-prometheus-stack):
CRDs managed:
- Prometheus: deploys Prometheus instances with specified config
- Alertmanager: deploys Alertmanager with routing config
- ServiceMonitor: declares which services Prometheus scrapes
- PodMonitor: scrapes pods directly
- PrometheusRule: defines alerting/recording rules
Workflow:
1. Team creates ServiceMonitor pointing at their service
2. Prometheus Operator detects new ServiceMonitor
3. Operator updates Prometheus configuration (secret-based config)
4. Prometheus reloads config via /-/reload endpoint
→ No manual Prometheus config editing needed
cert-manager:
CRDs:
- Issuer / ClusterIssuer: certificate authority config
(Let's Encrypt ACME, Vault PKI, self-signed, etc.)
- Certificate: desired TLS certificate (domains, duration, secretRef)
- CertificateRequest: pending CSR
- Challenge / Order: ACME protocol flow objects
Workflow:
1. Create Certificate requesting *.example.com from Let's Encrypt
2. cert-manager creates ACME Order
3. Completes HTTP-01 or DNS-01 Challenge
4. Receives signed certificate
5. Stores in Kubernetes Secret
6. Renews automatically 30 days before expiry
→ No manual cert rotation
Strimzi Kafka Operator:
CRDs:
- Kafka: full Kafka cluster (brokers + ZooKeeper/KRaft)
- KafkaTopic: topic creation with partition/replica config
- KafkaUser: user + ACL management
- KafkaConnect: Kafka Connect cluster
- KafkaMirrorMaker2: cross-cluster replication
Operator handles:
- Rolling restarts that maintain ISR (In-Sync Replicas)
- ZooKeeper → KRaft migration
- Certificate rotation for mTLS
- Topic rebalancing via Cruise Control integration
ArgoCD (GitOps operator):
CRDs:
- Application: git repo + path + target cluster + namespace
- AppProject: RBAC boundary for Applications
Reconcile loop:
1. Watch Application CRs
2. Fetch manifests from git (compare to last known commit)
3. Compare cluster state to git state (using resource hashing)
4. If out-of-sync: apply diff to cluster (or alert, if manual sync)
5. Update Application status (Synced/OutOfSync, Healthy/Degraded)
→ Git as the source of truth for cluster state
Operator Maturity Levels
The Operator Framework defines five capability levels:
Level 1: Basic Install
- Automates installation and initial configuration
- Example: deploy pods, create services, apply config
Level 2: Seamless Upgrades
- Manages patch/minor version upgrades safely
- Rolling upgrades that maintain availability
- Example: etcd operator rolling upgrade
Level 3: Full Lifecycle
- Backup, restore, failure recovery
- Example: DB operator with automated backup to S3 + point-in-time restore
Level 4: Deep Insights
- Exposes metrics, alerting rules, dashboards
- Self-healing based on observed metrics
- Example: automatically increases connection pool when p99 rises
Level 5: Auto Pilot
- Autonomous horizontal/vertical scaling
- Anomaly detection and automatic remediation
- Example: database operator that automatically reshards on hot partition
Debugging Notes
# Watch reconcile loop logs
kubectl logs -n <operator-namespace> <operator-pod> -f | grep -i "reconcile\|error\|requeue"
# Check CRD status
kubectl get crd databases.example.com -o yaml
# Look for: status.conditions — should show "NamesAccepted" and "Established"
# Inspect CR status set by operator
kubectl describe database prod-postgres -n production
# Look at Status section — operator should update phase/conditions
# Check if finalizer is blocking deletion
kubectl get database prod-postgres -o jsonpath='{.metadata.finalizers}'
# If stuck: check operator logs for cleanup errors
# Check controller-runtime metrics (if operator exposes them)
kubectl port-forward svc/operator-metrics 8080:8080
curl localhost:8080/metrics | grep controller_runtime_reconcile
# Useful metrics:
# controller_runtime_reconcile_total{controller,result="error"}
# controller_runtime_reconcile_time_seconds
# controller_runtime_active_workers
Security Implications
- Operators typically run with broad RBAC permissions (ClusterRole with full access to their CRD namespace and child resource types). Audit these carefully — a compromised operator can create arbitrary pods.
- Leader election uses a Lease object; multiple operator replicas compete for the lease. Only the leader runs reconciles. Ensure RBAC allows writing Leases.
- CRDs can expose sensitive spec fields (passwords, connection strings). Use Kubernetes Secrets and reference them from the CR spec; never store plaintext credentials in CR fields.
- Operators that interact with external APIs (cloud providers) need credentials — use Workload Identity or IRSA rather than static keys.
- Validating webhooks registered by operators can become single points of failure: if the webhook pod is down and
failurePolicy: Fail, no resources of that type can be created.
Performance Implications
- Operators that reconcile on every event (including status updates they write themselves) can enter tight reconcile loops. Use
predicates.GenerationChangedPredicateto filter events. - Large numbers of CRs (thousands) can stress the informer cache. Use server-side filtering (
listOptions.LabelSelector) to limit what the operator caches. - Concurrent reconciliations (via
MaxConcurrentReconciles) improve throughput but require the reconcile function to be concurrency-safe.
Failure Modes
| Symptom | Cause | Diagnosis |
|---|---|---|
| CR stuck in "Pending" | Operator not running / crash loop | Check operator pod logs |
| Deletion stuck | Finalizer present, cleanup failing | Check cleanup logs, force remove finalizer |
| Resources not created | RBAC missing for operator SA | Check RBAC, kubectl auth can-i as operator SA |
| Reconcile loop thrashing | Writing status triggers re-watch | Use Status subresource; apply predicates |
| Webhook timeout blocking creation | Validating webhook pod down | Check webhook pod; consider failurePolicy: Ignore |
Modern Usage
- Crossplane: Takes the Operator pattern beyond Kubernetes-native resources to provision cloud infrastructure (AWS RDS, GCP GKE) via CRDs and controllers. Treats all infrastructure as a Kubernetes custom resource.
- Gateway API: The replacement for Ingress is built entirely on CRDs and relies on Gateway controllers (operators) from Istio, Cilium, Envoy Gateway.
- ACK (AWS Controllers for Kubernetes): AWS-maintained operators for every major AWS service.
- KubeVirt: Operator that adds VMs as a Kubernetes resource type, managing libvirt under the hood.
Future Directions
- Operator best practices standardization: CNCF Operator SIG is working on a maturity model and test framework to certify operator quality.
- Declarative validation with CEL: Operators can reduce reliance on validating webhooks by using CRD-level CEL validation rules (1.26+), improving reliability.
- Finite State Machine libraries: New controller-runtime primitives for expressing operator logic as explicit state machines rather than ad-hoc if/else logic.
- Multi-cluster operators: Managing resources across cluster boundaries using fleet APIs (Karmada, Cluster API) as the next frontier.
Exercises
-
Install Kubebuilder and scaffold a minimal operator for a
WebAppCRD (spec:image,replicas). Implement a reconciler that creates a Deployment and Service. Test withenvtest. -
Add a finalizer to the
WebAppoperator that logs "cleanup started" and waits 5 seconds before removing itself. Delete aWebAppinstance and observe the deletion sequence in the logs. -
Install the Prometheus Operator in a local cluster. Create a
ServiceMonitorfor a simple HTTP server. Verify that Prometheus picks up the new scrape target. -
Examine the Strimzi Kafka Operator source code. Trace the code path from a
KafkaTopicCR creation to the actual Kafka topic being created via the Admin API. -
Build an operator that manages a simple
CounterCRD. The reconciler should incrementstatus.countevery 30 seconds (usingctrl.Result{RequeueAfter: 30 * time.Second}). Deploy it and watch the status field update.
References
- "Introducing Operators" — CoreOS Blog, November 2016 (original paper)
- Operator Framework documentation: operatorframework.io
- Kubebuilder book: book.kubebuilder.io (comprehensive guide)
- controller-runtime source: github.com/kubernetes-sigs/controller-runtime
- "Programming Kubernetes" — Michael Hausenblas & Stefan Schimanski, O'Reilly 2019
- Strimzi Kafka Operator: strimzi.io
- cert-manager: cert-manager.io
- Prometheus Operator: prometheus-operator.dev
- Operator Hub: operatorhub.io (browse 300+ community operators)
- CNCF Operator SIG: github.com/cncf/tag-app-delivery/tree/main/operator-wg