The Kubernetes Operator Pattern

Overview

An Operator is a method of packaging, deploying, and managing a Kubernetes application by encoding the operational knowledge of a human expert into software. The core insight is that managing complex stateful applications (databases, message queues, certificate authorities) requires more than just running containers — it requires understanding the application's lifecycle: how to safely scale, upgrade, backup, restore, and fail over. Operators codify this knowledge as a controller that watches Custom Resources and continuously reconciles the cluster state toward the desired configuration.

The Operator pattern is the foundation of the modern Kubernetes ecosystem. Products from Prometheus to Apache Kafka to PostgreSQL are managed by Operators in production, enabling self-service infrastructure at scale.

Prerequisites

Kubernetes controller model and the API machinery (informers, listers, workqueues)
Custom Resource Definitions (CRDs) and API extension concepts
Go programming language basics (most operators are written in Go)
Understanding of Kubernetes reconciliation philosophy
Familiarity with specific domains (e.g., database clustering) to understand operator logic

Historical Context

The term "Operator" was coined by CoreOS engineers Brandon Philips and Josh Wood in a November 2016 blog post, "Introducing Operators: Putting Operational Knowledge into Software." The first publicly released operators were the etcd Operator and Prometheus Operator, both by CoreOS.

Before Operators, the common approach was Helm charts: templated YAML that could be rendered and applied. Helm handles installation and upgrades well but has no runtime intelligence — it cannot react to failures, perform rolling restarts safely, or validate application-specific constraints.

Key milestones: - 2016: "Operator" term coined; etcd Operator and Prometheus Operator released - 2018: CoreOS acquired by Red Hat; Operator Framework (Operator SDK) open-sourced - 2019: Operator Hub (operatorhub.io) launched; community operators published - 2020: Kubebuilder v2 released with improved scaffolding; controller-runtime becomes the standard library - 2021: OperatorLifecycleManager (OLM) matures; enables operator version management in cluster - 2022-2024: Thousands of operators in production across every major software category

The shift from "bash scripts + cron jobs for DB ops" to Operators represents a fundamental change: infrastructure automation becomes testable, versioned, and expressed in Kubernetes-native declarative APIs.

Custom Resource Definitions (CRDs)

CRDs extend the Kubernetes API with new resource types. Once a CRD is applied, the API server serves CRUD operations on the new resource just like built-in resources.

  Standard Kubernetes API types:
    /api/v1/pods
    /apis/apps/v1/deployments

  After applying a CRD for "databases.example.com":
    /apis/example.com/v1alpha1/databases
    /apis/example.com/v1alpha1/namespaces/production/databases/my-postgres

  CRD structure:

  apiVersion: apiextensions.k8s.io/v1
  kind: CustomResourceDefinition
  metadata:
    name: databases.example.com
  spec:
    group: example.com
    versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:       # validation schema
          type: object
          properties:
            spec:
              type: object
              properties:
                replicas:
                  type: integer
                  minimum: 1
                  maximum: 5
                image:
                  type: string
                storageSize:
                  type: string
            status:
              type: object
              properties:
                phase:
                  type: string
                readyReplicas:
                  type: integer
      subresources:
        status: {}             # enables /status subresource
        scale: {}              # enables /scale subresource for HPA
    scope: Namespaced
    names:
      plural: databases
      singular: database
      kind: Database
      shortNames: ["db"]

Once the CRD exists, users create instances (Custom Resources):

apiVersion: example.com/v1alpha1
kind: Database
metadata:
  name: prod-postgres
  namespace: production
spec:
  replicas: 3
  image: postgres:15.3
  storageSize: 100Gi

This instance is stored in etcd and immediately watchable by any controller.

The Controller Pattern and Reconcile Loop

Every Kubernetes controller follows the same pattern: watch resources, compare desired state to actual state, take actions to close the gap.

  Operator Reconcile Loop

  +-----------+      Watch/List        +------------------+
  |  etcd /   |  ===================>  |   Informer       |
  |  API      |  (via long-polling     |   (local cache)  |
  |  Server   |   watch stream)        +------------------+
  +-----------+                                |
                                      Add/Update/Delete event
                                               |
                                               v
                                        +------------+
                                        |  Workqueue |
                                        |  (rate-    |
                                        |  limited,  |
                                        |  dedup)    |
                                        +------------+
                                               |
                                   Dequeue item (namespace/name)
                                               |
                                               v
                                  +------------------------+
                                  |  Reconcile(Request)    |
                                  |                        |
                                  |  1. Fetch CR from      |
                                  |     cache (Get)        |
                                  |                        |
                                  |  2. Check if exists    |
                                  |     (handle deletion)  |
                                  |                        |
                                  |  3. Fetch owned        |
                                  |     resources          |
                                  |     (Deployments,      |
                                  |      Services, etc.)   |
                                  |                        |
                                  |  4. Compare desired    |
                                  |     vs actual          |
                                  |                        |
                                  |  5. Create / Update /  |
                                  |     Delete resources   |
                                  |                        |
                                  |  6. Update CR status   |
                                  |                        |
                                  |  Return: Result        |
                                  |  - {}: success         |
                                  |  - Requeue: retry      |
                                  |  - Error: backoff+retry|
                                  +------------------------+

Critical properties of reconcile: - Idempotent: Running reconcile twice with the same state must produce the same result. No side effects from running extra times. - Level-triggered, not edge-triggered: The reconcile loop does not care about the sequence of events; it only cares about the current desired state vs current actual state. If five updates arrive in 100ms, the workqueue deduplicates them into one reconcile call. - Optimistic concurrency: Use resourceVersion checks to detect and retry conflicts.

Operator SDK and Kubebuilder

Two main frameworks scaffold Operator code:

  Kubebuilder (upstream CNCF project, maintained by sig-api-machinery):
    - Scaffolds Go project with controller-runtime
    - Generates CRD manifests from Go struct annotations
    - Provides webhook scaffolding (MutatingWebhookConfiguration)
    - Integrates envtest for controller testing

  Operator SDK (Red Hat, wraps Kubebuilder + adds):
    - Helm-based operators (wrap existing Helm charts with a controller)
    - Ansible-based operators (reconcile via Ansible playbooks)
    - Integration with OLM (OperatorLifecycleManager) for distribution

  Both use controller-runtime under the hood:

  import "sigs.k8s.io/controller-runtime"

  Key controller-runtime types:
  - Manager:    orchestrates controllers, shared caches, leader election
  - Controller: registers reconciler + watches
  - Reconciler: interface with single Reconcile(ctx, req) method
  - Client:     typed API client (Get, List, Create, Update, Delete, Patch)
  - Scheme:     maps Go types to GVK (GroupVersionKind)

Minimal reconciler in Go:

func (r *DatabaseReconciler) Reconcile(
    ctx context.Context, req ctrl.Request) (ctrl.Result, error) {

    log := log.FromContext(ctx)

    // 1. Fetch the CR
    var db examplev1.Database
    if err := r.Get(ctx, req.NamespacedName, &db); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // 2. Create or update StatefulSet
    sts := buildStatefulSet(&db)
    if err := ctrl.SetControllerReference(&db, sts, r.Scheme); err != nil {
        return ctrl.Result{}, err
    }

    var existing appsv1.StatefulSet
    err := r.Get(ctx, types.NamespacedName{Name: sts.Name, Namespace: sts.Namespace}, &existing)
    if errors.IsNotFound(err) {
        return ctrl.Result{}, r.Create(ctx, sts)
    }
    if err != nil {
        return ctrl.Result{}, err
    }

    // Update if spec changed
    existing.Spec = sts.Spec
    if err := r.Update(ctx, &existing); err != nil {
        return ctrl.Result{}, err
    }

    // 3. Update status
    db.Status.ReadyReplicas = existing.Status.ReadyReplicas
    db.Status.Phase = computePhase(&existing)
    if err := r.Status().Update(ctx, &db); err != nil {
        return ctrl.Result{}, err
    }

    log.Info("Reconciled Database", "name", db.Name, "phase", db.Status.Phase)
    return ctrl.Result{}, nil
}

Finalizers

Finalizers prevent deletion of a resource until cleanup is complete:

  Without finalizer:
    kubectl delete database prod-postgres
    → API server deletes object from etcd immediately
    → Persistent volumes, cloud resources, DNS entries: orphaned

  With finalizer:
    1. CR has metadata.finalizers: ["databases.example.com/cleanup"]
    2. kubectl delete database prod-postgres
    3. API server sets deletionTimestamp (soft delete) — does NOT delete yet
    4. Reconcile loop detects deletionTimestamp != nil:
       a. Run cleanup: delete cloud resources, volumes, backups
       b. Remove finalizer: patch metadata.finalizers = []
    5. API server sees no more finalizers → deletes object from etcd

  Code:
    const finalizerName = "databases.example.com/cleanup"

    if db.ObjectMeta.DeletionTimestamp.IsZero() {
        // Not being deleted — ensure finalizer is present
        if !containsString(db.Finalizers, finalizerName) {
            db.Finalizers = append(db.Finalizers, finalizerName)
            return ctrl.Result{}, r.Update(ctx, &db)
        }
    } else {
        // Being deleted — run cleanup then remove finalizer
        if containsString(db.Finalizers, finalizerName) {
            if err := r.cleanupExternalResources(&db); err != nil {
                return ctrl.Result{}, err
            }
            db.Finalizers = removeString(db.Finalizers, finalizerName)
            return ctrl.Result{}, r.Update(ctx, &db)
        }
    }

Owner References and Cascade Deletion

When an Operator creates child resources (Deployments, Services, ConfigMaps) on behalf of a CR, it sets owner references so that deleting the CR cascades to all children:

  Database CR (owner)
  ├── StatefulSet (owned)
  │   └── Pods (owned by StatefulSet)
  ├── Service (owned)
  ├── ConfigMap (owned)
  └── Secret (owned)

  metadata.ownerReferences:
  - apiVersion: example.com/v1alpha1
    kind: Database
    name: prod-postgres
    uid: abc-123
    controller: true
    blockOwnerDeletion: true

  When Database is deleted:
    → garbage collector deletes StatefulSet, Service, ConfigMap, Secret
    → StatefulSet deletion cascades to Pods

  blockOwnerDeletion: true — prevents owner deletion until child is gone
  (used with finalizers to control deletion order)

Real-World Operators

Prometheus Operator (prometheus-community/kube-prometheus-stack):

  CRDs managed:
  - Prometheus: deploys Prometheus instances with specified config
  - Alertmanager: deploys Alertmanager with routing config
  - ServiceMonitor: declares which services Prometheus scrapes
  - PodMonitor: scrapes pods directly
  - PrometheusRule: defines alerting/recording rules

  Workflow:
  1. Team creates ServiceMonitor pointing at their service
  2. Prometheus Operator detects new ServiceMonitor
  3. Operator updates Prometheus configuration (secret-based config)
  4. Prometheus reloads config via /-/reload endpoint
  → No manual Prometheus config editing needed

cert-manager:

  CRDs:
  - Issuer / ClusterIssuer: certificate authority config
    (Let's Encrypt ACME, Vault PKI, self-signed, etc.)
  - Certificate: desired TLS certificate (domains, duration, secretRef)
  - CertificateRequest: pending CSR
  - Challenge / Order: ACME protocol flow objects

  Workflow:
  1. Create Certificate requesting *.example.com from Let's Encrypt
  2. cert-manager creates ACME Order
  3. Completes HTTP-01 or DNS-01 Challenge
  4. Receives signed certificate
  5. Stores in Kubernetes Secret
  6. Renews automatically 30 days before expiry
  → No manual cert rotation

Strimzi Kafka Operator:

  CRDs:
  - Kafka: full Kafka cluster (brokers + ZooKeeper/KRaft)
  - KafkaTopic: topic creation with partition/replica config
  - KafkaUser: user + ACL management
  - KafkaConnect: Kafka Connect cluster
  - KafkaMirrorMaker2: cross-cluster replication

  Operator handles:
  - Rolling restarts that maintain ISR (In-Sync Replicas)
  - ZooKeeper → KRaft migration
  - Certificate rotation for mTLS
  - Topic rebalancing via Cruise Control integration

ArgoCD (GitOps operator):

  CRDs:
  - Application: git repo + path + target cluster + namespace
  - AppProject: RBAC boundary for Applications

  Reconcile loop:
  1. Watch Application CRs
  2. Fetch manifests from git (compare to last known commit)
  3. Compare cluster state to git state (using resource hashing)
  4. If out-of-sync: apply diff to cluster (or alert, if manual sync)
  5. Update Application status (Synced/OutOfSync, Healthy/Degraded)
  → Git as the source of truth for cluster state

Operator Maturity Levels

The Operator Framework defines five capability levels:

  Level 1: Basic Install
    - Automates installation and initial configuration
    - Example: deploy pods, create services, apply config

  Level 2: Seamless Upgrades
    - Manages patch/minor version upgrades safely
    - Rolling upgrades that maintain availability
    - Example: etcd operator rolling upgrade

  Level 3: Full Lifecycle
    - Backup, restore, failure recovery
    - Example: DB operator with automated backup to S3 + point-in-time restore

  Level 4: Deep Insights
    - Exposes metrics, alerting rules, dashboards
    - Self-healing based on observed metrics
    - Example: automatically increases connection pool when p99 rises

  Level 5: Auto Pilot
    - Autonomous horizontal/vertical scaling
    - Anomaly detection and automatic remediation
    - Example: database operator that automatically reshards on hot partition

Debugging Notes

# Watch reconcile loop logs
kubectl logs -n <operator-namespace> <operator-pod> -f | grep -i "reconcile\|error\|requeue"

# Check CRD status
kubectl get crd databases.example.com -o yaml
# Look for: status.conditions — should show "NamesAccepted" and "Established"

# Inspect CR status set by operator
kubectl describe database prod-postgres -n production
# Look at Status section — operator should update phase/conditions

# Check if finalizer is blocking deletion
kubectl get database prod-postgres -o jsonpath='{.metadata.finalizers}'
# If stuck: check operator logs for cleanup errors

# Check controller-runtime metrics (if operator exposes them)
kubectl port-forward svc/operator-metrics 8080:8080
curl localhost:8080/metrics | grep controller_runtime_reconcile

# Useful metrics:
# controller_runtime_reconcile_total{controller,result="error"}
# controller_runtime_reconcile_time_seconds
# controller_runtime_active_workers

Security Implications

Operators typically run with broad RBAC permissions (ClusterRole with full access to their CRD namespace and child resource types). Audit these carefully — a compromised operator can create arbitrary pods.
Leader election uses a Lease object; multiple operator replicas compete for the lease. Only the leader runs reconciles. Ensure RBAC allows writing Leases.
CRDs can expose sensitive spec fields (passwords, connection strings). Use Kubernetes Secrets and reference them from the CR spec; never store plaintext credentials in CR fields.
Operators that interact with external APIs (cloud providers) need credentials — use Workload Identity or IRSA rather than static keys.
Validating webhooks registered by operators can become single points of failure: if the webhook pod is down and failurePolicy: Fail, no resources of that type can be created.

Performance Implications

Operators that reconcile on every event (including status updates they write themselves) can enter tight reconcile loops. Use predicates.GenerationChangedPredicate to filter events.
Large numbers of CRs (thousands) can stress the informer cache. Use server-side filtering (listOptions.LabelSelector) to limit what the operator caches.
Concurrent reconciliations (via MaxConcurrentReconciles) improve throughput but require the reconcile function to be concurrency-safe.

Failure Modes

Symptom	Cause	Diagnosis
CR stuck in "Pending"	Operator not running / crash loop	Check operator pod logs
Deletion stuck	Finalizer present, cleanup failing	Check cleanup logs, force remove finalizer
Resources not created	RBAC missing for operator SA	Check RBAC, `kubectl auth can-i` as operator SA
Reconcile loop thrashing	Writing status triggers re-watch	Use Status subresource; apply predicates
Webhook timeout blocking creation	Validating webhook pod down	Check webhook pod; consider failurePolicy: Ignore

Modern Usage

Crossplane: Takes the Operator pattern beyond Kubernetes-native resources to provision cloud infrastructure (AWS RDS, GCP GKE) via CRDs and controllers. Treats all infrastructure as a Kubernetes custom resource.
Gateway API: The replacement for Ingress is built entirely on CRDs and relies on Gateway controllers (operators) from Istio, Cilium, Envoy Gateway.
ACK (AWS Controllers for Kubernetes): AWS-maintained operators for every major AWS service.
KubeVirt: Operator that adds VMs as a Kubernetes resource type, managing libvirt under the hood.

Future Directions

Operator best practices standardization: CNCF Operator SIG is working on a maturity model and test framework to certify operator quality.
Declarative validation with CEL: Operators can reduce reliance on validating webhooks by using CRD-level CEL validation rules (1.26+), improving reliability.
Finite State Machine libraries: New controller-runtime primitives for expressing operator logic as explicit state machines rather than ad-hoc if/else logic.
Multi-cluster operators: Managing resources across cluster boundaries using fleet APIs (Karmada, Cluster API) as the next frontier.

Exercises

Install Kubebuilder and scaffold a minimal operator for a WebApp CRD (spec: image, replicas). Implement a reconciler that creates a Deployment and Service. Test with envtest.
Add a finalizer to the WebApp operator that logs "cleanup started" and waits 5 seconds before removing itself. Delete a WebApp instance and observe the deletion sequence in the logs.
Install the Prometheus Operator in a local cluster. Create a ServiceMonitor for a simple HTTP server. Verify that Prometheus picks up the new scrape target.
Examine the Strimzi Kafka Operator source code. Trace the code path from a KafkaTopic CR creation to the actual Kafka topic being created via the Admin API.
Build an operator that manages a simple Counter CRD. The reconciler should increment status.count every 30 seconds (using ctrl.Result{RequeueAfter: 30 * time.Second}). Deploy it and watch the status field update.

References

"Introducing Operators" — CoreOS Blog, November 2016 (original paper)
Operator Framework documentation: operatorframework.io
Kubebuilder book: book.kubebuilder.io (comprehensive guide)
controller-runtime source: github.com/kubernetes-sigs/controller-runtime
"Programming Kubernetes" — Michael Hausenblas & Stefan Schimanski, O'Reilly 2019
Strimzi Kafka Operator: strimzi.io
cert-manager: cert-manager.io
Prometheus Operator: prometheus-operator.dev
Operator Hub: operatorhub.io (browse 300+ community operators)
CNCF Operator SIG: github.com/cncf/tag-app-delivery/tree/main/operator-wg