Mert Tosun
← Posts
Kubernetes HPA, VPA, and Cluster Autoscaler: Using Them Together Correctly

Kubernetes HPA, VPA, and Cluster Autoscaler: Using Them Together Correctly

Mert TosunDevOps

Kubernetes autoscaling is often discussed as a single feature, but production behavior is actually a three-layer system: HPA scales pod count, VPA adjusts pod resource requests, and Cluster Autoscaler scales node capacity. If these layers are not designed together, systems become either fragile under load or unnecessarily expensive.

What scales what

  • HPA: changes replica count
  • VPA: changes CPU/memory requests and limits
  • Cluster Autoscaler: adds/removes nodes
Traffic spike
   -> HPA increases replicas
      -> scheduler needs more capacity
         -> Cluster Autoscaler adds nodes

VPA works best for baseline sizing and long-term optimization, not for instant traffic bursts.

Practical guidance

  1. Use HPA for burst response.
  2. Use VPA in recommend/controlled modes depending on workload type.
  3. Keep request values realistic so scheduling can work predictably.
  4. Tune scale-up/scale-down windows to avoid oscillation.

Conflict to avoid

Running fully autonomous HPA and VPA on the same deployment can cause unstable feedback loops when both continuously react to each other. Use clear ownership:

  • HPA for horizontal elasticity
  • VPA for baseline recommendation and periodic rightsizing

Metrics that matter

  • pending pod count
  • node utilization
  • p95 latency during scale events
  • HPA/VPA action frequency
  • cost per request trend

Conclusion

HPA, VPA, and Cluster Autoscaler are strongest as a coordinated system, not isolated features. With clear ownership and tuned policies, you can keep services responsive during peaks while controlling infrastructure cost and avoiding scaling instability.