Infrastructure

Keeping GPU Workloads NUMA-Local in Kubernetes

GPU workloads often need hardware-aware placement to avoid silent latency regressions. This post explains what NUMA locality means for Kubernetes GPU nodes, how CPU Manager, Topology Manager, and Memory Manager work together, and the operational gotchas to watch for.

Understanding the Kubernetes Scale subresource

In this post, I discuss the scale subresource in Kubernetes and its role in managing resources like Deployments and StatefulSets. I discuss its importance in providing a unified interface for scaling settings in Kubernetes, how it's used with Horizontal Pod Autoscaler (HPA), Pod Disruption Budget (PDB), and manual scaling with the kubectl scale command. Additionally, the blog covers the significance of the scale subresource for custom resources, explaining how it facilitates integration with rest of the Kubernetes components.