Zero-Downtime Deployments on Kubernetes
Downtime during deployments is a solved problem. Yet we still see teams losing minutes (or hours) of availability every time they push a release. The root cause is almost always the same: missing or misconfigured health checks. Kubernetes has everything you need for zero-downtime deployments built in — you just have to configure it correctly.
Rolling Update Strategy
The default Kubernetes deployment strategy is RollingUpdate, which gradually replaces old pods with new ones. The key parameters are maxUnavailable and maxSurge. For zero-downtime, set maxUnavailable: 0 so Kubernetes never terminates an old pod until a new one is confirmed healthy.
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
namespace: production
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
containers:
- name: api
image: registry.example.com/api-server:v2.4.1
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
Readiness vs. Liveness
These two probes serve fundamentally different purposes, and confusing them is the number one cause of deployment-related outages we see in client clusters.
-
Readiness probes tell Kubernetes whether a pod is ready to receive traffic. A pod that fails its readiness probe is removed from the Service’s endpoint list — no requests are routed to it. This is what enables zero-downtime: new pods do not receive traffic until they are fully initialized.
-
Liveness probes tell Kubernetes whether a pod is still functioning. A pod that fails its liveness probe is killed and restarted. Setting this too aggressively (low
failureThreshold, shortperiodSeconds) causes restart loops during startup.
Graceful Shutdown
When Kubernetes decides to terminate a pod, it sends a SIGTERM signal and waits for terminationGracePeriodSeconds (default 30s) before force-killing with SIGKILL. Your application must handle SIGTERM by stopping acceptance of new connections, draining in-flight requests, and then exiting cleanly.
Without graceful shutdown handling, active requests get dropped during pod termination — even if your rolling update configuration is perfect.
Pre-Stop Hooks
There is a subtle race condition in Kubernetes networking: the kubelet sends SIGTERM and simultaneously updates the Endpoints object, but the kube-proxy and ingress controllers may not process the endpoint removal before the pod starts shutting down. A preStop hook with a short sleep (3-5 seconds) gives the network layer time to catch up.
Putting It Together
Zero-downtime deployments require all four pieces working in concert: rolling update strategy with maxUnavailable: 0, properly configured readiness probes, graceful shutdown handling in your application, and pre-stop hooks to handle the networking race condition. Miss any one of these, and you will drop requests during deploys.