Kubernetes HPA, VPA, and Cluster Autoscaler: Using Them Together Correctly
Kubernetes autoscaling is often discussed as a single feature, but production behavior is actually a three-layer system: HPA scales pod count, VPA adjusts pod resource requests, and Cluster Autoscaler scales node capacity. If these layers are not designed together, systems become either fragile under load or unnecessarily expensive.
What scales what
- HPA: changes replica count
- VPA: changes CPU/memory requests and limits
- Cluster Autoscaler: adds/removes nodes
Traffic spike
-> HPA increases replicas
-> scheduler needs more capacity
-> Cluster Autoscaler adds nodes
VPA works best for baseline sizing and long-term optimization, not for instant traffic bursts.
Practical guidance
- Use HPA for burst response.
- Use VPA in recommend/controlled modes depending on workload type.
- Keep request values realistic so scheduling can work predictably.
- Tune scale-up/scale-down windows to avoid oscillation.
Conflict to avoid
Running fully autonomous HPA and VPA on the same deployment can cause unstable feedback loops when both continuously react to each other. Use clear ownership:
- HPA for horizontal elasticity
- VPA for baseline recommendation and periodic rightsizing
Metrics that matter
- pending pod count
- node utilization
- p95 latency during scale events
- HPA/VPA action frequency
- cost per request trend
Conclusion
HPA, VPA, and Cluster Autoscaler are strongest as a coordinated system, not isolated features. With clear ownership and tuned policies, you can keep services responsive during peaks while controlling infrastructure cost and avoiding scaling instability.
Related posts
SLO, SLI, and Error Budget: Operating Service Reliability
A practical SRE playbook for choosing user-centric indicators, setting realistic objectives, and using error budgets in release decisions.
Shrinking Docker Images: Multi-stage Builds and Distroless Techniques
Smaller container images for Go and Node: multi-stage Dockerfiles, distroless bases, Alpine caveats, and layer caching.
StatefulSet vs Deployment: Critical Differences and Practical Decisions
A deep technical comparison of Kubernetes StatefulSet and Deployment, including workload fit, identity and storage behavior, rollout risks, and production-critical best practices.