Kubernetes HPA, VPA, and Cluster Autoscaler: Using Them Together Correctly
Kubernetes autoscaling is often discussed as a single feature, but production behavior is actually a three-layer system: HPA scales pod count, VPA adjusts pod resource requests, and Cluster Autoscaler scales node capacity. If these layers are not designed together, systems become either fragile under load or unnecessarily expensive.
What scales what
- HPA: changes replica count
- VPA: changes CPU/memory requests and limits
- Cluster Autoscaler: adds/removes nodes
Traffic spike
-> HPA increases replicas
-> scheduler needs more capacity
-> Cluster Autoscaler adds nodes
VPA works best for baseline sizing and long-term optimization, not for instant traffic bursts.
Practical guidance
- Use HPA for burst response.
- Use VPA in recommend/controlled modes depending on workload type.
- Keep request values realistic so scheduling can work predictably.
- Tune scale-up/scale-down windows to avoid oscillation.
Conflict to avoid
Running fully autonomous HPA and VPA on the same deployment can cause unstable feedback loops when both continuously react to each other. Use clear ownership:
- HPA for horizontal elasticity
- VPA for baseline recommendation and periodic rightsizing
Metrics that matter
- pending pod count
- node utilization
- p95 latency during scale events
- HPA/VPA action frequency
- cost per request trend
Conclusion
HPA, VPA, and Cluster Autoscaler are strongest as a coordinated system, not isolated features. With clear ownership and tuned policies, you can keep services responsive during peaks while controlling infrastructure cost and avoiding scaling instability.
Related posts
Service Mesh Adoption Guide: When It Adds Value and When It Becomes Overhead
Detailed evaluation of service mesh adoption, balancing operational cost, security gains, and migration planning.
Chaos Engineering in Microservices: Controlled Failure Experiments
How to design chaos experiments in microservices to uncover and reduce resilience gaps before incidents happen.
SLO, SLI, and Error Budget: Operating Service Reliability
A practical SRE playbook for choosing user-centric indicators, setting realistic objectives, and using error budgets in release decisions.