Understanding the Kubernetes Scale subresource

How it interacts with HPA and PDB & its importance for Custom Resources

Understanding the scale subresource in Kubernetes

Resources like Deployments and Statefulsets in Kubernetes have a scale subresource which captures three things:

  1. spec.replicas: The desired number of replicas.
  2. status.replicas: The actual, current number of replicas.
  3. status.labelSelector: Identifies the pods managed by the resource.

Here’s an example of what a typical query response looks like when you explore the scale subresource:

# Sample output from querying a Kubernetes deployment's scale settings.
➜ curl -s localhost:8001/apis/apps/v1/namespaces/kube-system/deployments/coredns/scale | jq .
{
  "kind": "Scale",
  "apiVersion": "autoscaling/v1",
  "metadata": {
    "name": "coredns",
    "namespace": "kube-system",
    "uid": "0f39b1dd-8cb4-4374-a95d-11d96c0b9d6a",
    "resourceVersion": "3260769",
    "creationTimestamp": "2023-08-10T15:55:24Z"
  },
  "spec": {
    "replicas": 2
  },
  "status": {
    "replicas": 2,
    "selector": "k8s-app=kube-dns"
  }
}

Why’s the scale subresource Necessary?

When I first learned about the scale subresource, I wondered why it’s needed when the replica information is already available in the spec. As with many things in Kubernetes, it’s needed to support extensibility and flexibility.

Kubernetes doesn’t enforce a uniform schema for representing desired and current replica counts in workload resources. The scale subresource provides a unified interface, enabling different workload resources, both built-in and custom, to consistently expose their scaling settings. This uniform interface allows for seamless integration with the rest of the Kubernetes ecosystem.

This is better understood by looking at how the scale subresource is used.

Uses of the scale subresource

The scale subresource is used for the following cases:

  1. By the Horizontal Pod Autoscaler (HPA): HPA uses the scale subresource to dynamically adjust the desired replica count based on some utilization metrics like CPU usage, QPS, etc.
  2. By the Pod Disruption Budget (PDB) controller: When .spec.maxUnavailable or .spec.minAvailable in PDB configurations is specified as a percentage, the PDB controller queries the scale subresource, of the resource managing the pod, to get the desired number of replicas.
  3. Scaling pods through the kubectl scale command: This command allows manually changing the replica count of resources that have the scale subresource enabled.

Scale subresource and Custom Resources

If you’re working with custom resources that manage pods and want them to integrate seamlessly with HPA, PDB, or be compatible with the kubectl scale command, you need to enable the scale subresource in the Custom Resource Definition (CRD).

If you are using kubebuilder to write controllers, you can enable the scale subresource for your custom resource as mentioned here. Also, it seems that the scale subresource can be enabled

PDB, Custom Resources, and the scale subresource

For pods managed by a custom resource, PDB can be used without restrictions only if the custom resource supports the scale subresource. This is because when a percentage value is specified for .spec.maxUnavailable or .spec.minAvailable, the PDB controller needs to know the total desired replicas in order to calculate the number of replicas that should be available during a disruption. And PDB controller gets the total desired replicas via the scale subresource of the resource owning the pods.

// if maxUnavailable is set as a percentage
desiredAvailableReplicas := desiredTotalReplicas - (desiredTotalReplicas * maxUnavailable / 100)
// if minAvailable is set a percentage
desiredAvailableReplicas := desiredTotalReplicas * minAvailable / 100

Call paths for reference:

getExpectedPodCountgetExpectedScalegetScaleController

If scale subresource isn’t enabled for your custom resource, you can still use PDB, albeit with certain limitations - you can only use .spec.minAvailable with an integer value, not percentages, as mentioned in the Kubernetes documentation.


Previous

Related

comments powered by Disqus