Kubernetes v1.36 'Haru' Released: 70 Enhancements Every Backend Dev Must Know

Published on 2 months ago

DevOps and Infrastructure

Kubernetes v1.36 'Haru' Released: 70 Enhancements Every Backend Dev Must Know

If you've managed Kubernetes clusters for more than a year, you already know the pattern: a new release drops, you skim the changelog, and you mentally separate the "ship this now" features from the "let's wait and see" ones. Kubernetes v1.36 — codenamed (Haru), meaning spring, clear skies, and far-off horizons — makes that separation unusually easy. This is a release about maturity, not novelty.

Released on April 22, 2026, v1.36 is the first major Kubernetes release of the year. Led by Release Lead Ryota Sawada across a 15-week cycle, it was shaped by contributions from 491 individuals across 106 companies. The result: 70 enhancements — 18 graduating to Stable, 25 entering Beta, and 25 in Alpha — that collectively make Kubernetes safer, more observable, and better suited for the AI infrastructure workloads that now dominate production clusters.

This isn't a release full of flashy new primitives. It's a release where long-planted seeds finally bloom. And for backend engineers running Kubernetes in production, that's exactly what you want.

What Graduated to Stable — The Features You Can Ship Now

Stable means production-ready. No feature gates to enable. No experimental warnings. These 18 features are committed to long-term support and backward compatibility. Here are the ones that matter most.

User Namespaces for Pods

KEP-127 · SIG Node

This is arguably the most significant security improvement in Kubernetes in years — and it took four years to get here. First introduced in v1.25 as an Alpha feature in 2022, User Namespaces for pods reaches General Availability in v1.36.

The concept is simple but powerful: the user identity inside a container is mapped to an unprivileged user on the host. Even if a process escapes the container entirely, it has no administrative power over the node. Root inside the container is not root on the host.

Before v1.36, achieving truly rootless containers in Kubernetes required third-party runtimes like gVisor or Kata Containers, or accepting weaker isolation guarantees. Now it's native, stable, and enabled with a single field in your pod

spec: hostUsers: false # Root in container ≠ root on host containers: - name: app image: my-secure-app:latest

For teams running multi-tenant clusters, workloads from untrusted sources, or any environment with strict compliance requirements, this is not a nice-to-have. It's something you should be enabling now.

Mutating Admission Policies

KEP-3962 · SIG API Machinery

If you've ever maintained a mutating admission webhook, you know the overhead: a TLS-secured HTTP server, certificate management, availability requirements that can block your entire API server if something goes wrong, and failure modes that are frustratingly hard to debug.

Mutating Admission Policies bring CEL-based mutation logic directly into Kubernetes objects — no external server needed. Define your mutation rules as native Kubernetes resources, version-control them alongside the rest of your configuration in Git, and deploy them through your normal GitOps workflow. This is the same model that made Validating Admission Policies popular. Now it covers mutations too.

For platform teams running lean, this removes an entire category of operational overhead and eliminates a whole class of webhook-related incidents.

Fine-Grained Kubelet API Authorization

SIG Auth & SIG Node

Previously, anything allowed to reach the kubelet API got broad access. In v1.36, that authorization can now be scoped to specific kubelet endpoints — /metrics, /healthz, /pods, individual node log endpoints — using standard Kubernetes RBAC.

This means you can grant your monitoring stack access to /metrics without also granting it access to node logs or exec endpoints. You can give your logging agent access to log endpoints without giving it the ability to inspect running pod processes. This is exactly the kind of least-privilege improvement that regulated environments and compliance teams have been waiting for, and it's now stable and ready to enforce.

External ServiceAccount Token Signing

KEP-740 · SIG Auth

Clusters can now delegate ServiceAccount token signing to external systems — cloud key management services or hardware security modules — instead of relying only on internally managed keys. For organizations with centralized signing infrastructure or strict key management requirements, this simplifies integration considerably and is now production-ready.

SELinux Mount Optimization

KEP-1710 · SIG Node

On SELinux-enforcing nodes, Kubernetes previously applied volume labels by recursively relabeling every file in a volume at mount time. For large volumes — think persistent data directories with thousands of files — this caused significant pod startup delays that were difficult to diagnose and impossible to avoid.

v1.36 replaces this with mount-level context labeling using mount -o context=XYZ. The label is applied at mount, not file by file. Faster pod starts, more consistent behavior, and one less reason for operations teams to pull their hair out during deployments.

AI/ML Infrastructure — DRA Reaches Production Maturity

Kubernetes has become the default substrate for AI workloads, and the platform is evolving to meet that responsibility. Dynamic Resource Allocation (DRA) — the framework designed to manage GPUs, TPUs, FPGAs, and other specialized hardware — makes its biggest leap in v1.36.

DRA Core Framework Graduates to Stable

Four separate DRA-related KEPs reach General Availability in v1.36. The new hardware allocation model — designed to replace the limitations of the older device plugin API — is now a committed, permanent part of Kubernetes.

What this means in practice: platform operators can now access per-pod metrics for DRA-managed resources. This is essential for accurate billing and chargeback, performance tuning of GPU workloads, and debugging failures in AI training and inference pipelines. The GA guarantee includes a 99.9% success rate for Get and List requests over a rolling five-minute window and P99 response times under 100ms — production-grade SLOs baked into the spec.

Device Status Visibility in kubectl

Users can now run kubectl describe pod to determine immediately whether a container's crash loop is caused by an Unhealthy or Unknown device status — regardless of whether the hardware was provisioned via traditional device plugins or the newer DRA framework. This enhanced visibility allows administrators and automated controllers to quickly identify faulty hardware and streamline recovery of high-performance workloads without digging through logs.

OCI VolumeSource — Model Weights as First-Class Citizens

Getting non-code artifacts into a container has always been awkward for AI/ML teams. Bloat the main image with model weights. Use an init container to pull things in at startup. Fight ConfigMap size limits. Build custom distribution pipelines. None of these feel right.

OCI VolumeSource changes that. It lets you reference any OCI image as a volume — Kubernetes pulls the image and mounts its contents into the pod just like a regular volume. Your model weights live in the registry alongside your container images, versioned, pulled on demand, and mounted cleanly. This is especially significant for large language model deployments where weights can be tens of gigabytes and need to be managed independently of application code.

Workload Aware Scheduling — Distributed Jobs as Single Entities

For years, the Kubernetes scheduler has treated pods as independent units. For simple web services, that's fine. For distributed AI training runs, HPC workloads, or any computation that requires a specific number of coordinated pods to be useful at all, it's a serious problem.

If you're running an eight-GPU training job and the scheduler only places six of the eight pods because the other two nodes are occupied, you've wasted six GPUs. The six pods sit idle waiting for peers that never arrive, or worse, they run and fail after burning hours of compute.

Kubernetes v1.36 introduces a comprehensive Workload Aware Scheduling (WAS) suite in Alpha. It natively integrates the Job controller with a revised Workload API and a new decoupled PodGroup API that treats related pods as a single logical schedulable entity.

The practical result is gang scheduling — either all pods in a group land on nodes simultaneously, or none of them do. No partial allocation. No wasted resources. This is the kind of scheduling behavior that AI infrastructure teams have been building custom controllers to achieve for years. It's now moving into the Kubernetes core.

Performance and Scalability

CRI List Streaming

KEP-5825 · SIG Node · Alpha

On large-scale nodes with hundreds of containers, the traditional way the kubelet retrieved container and image data — a single, monolithic List request to the container runtime — created memory pressure spikes and latency problems that were hard to avoid and hard to debug.

CRI List Streaming replaces those monolithic requests with a server-side streaming RPC. The kubelet now processes results incrementally as they arrive, significantly reducing peak memory footprint and improving responsiveness on high-density nodes. For teams running Kubernetes at serious scale, this is an important infrastructure improvement even in Alpha.

In-Place Pod-Level Resources Vertical Scaling

SIG Node · Beta · cgroupv2 only

The ability to resize pod CPU and memory allocations without restarting the pod graduates to Beta in v1.36 at the pod level — building on the existing per-container in-place resize capability. For multi-container pods, this means you can dynamically adjust the total resource footprint without disrupting running services. A meaningful reduction in operational overhead for teams managing stateful workloads.

HPA External Metrics Fallback

KEP-5679 · SIG Autoscaling · Alpha

The Horizontal Pod Autoscaler was built on one assumption: external metric APIs are always available. When Datadog goes down, or a cloud provider's metrics API returns errors, HPAs could behave erratically — scaling to zero or refusing to scale at all. The new fallback behavior lets HPAs handle external metric retrieval failures gracefully, maintaining their last-known scaling state rather than making a potentially destructive decision based on unavailable data.

It's also worth noting that scalability tests in v1.36 have been expanded to cover 1.5GB of resources, up from 800MB — a quiet acknowledgment of how large production Kubernetes clusters have grown.

Observability and Debugging

Kubernetes v1.36 continues the multi-release trend of improving built-in observability — reducing the number of situations where debugging requires SSH access, log diving, or custom tooling.

Memory QoS (Alpha update) — uses the cgroup v2 memory controller to give the Linux kernel better guidance on how to treat container memory, reducing unpredictable OOM kills and improving memory pressure handling on dense nodes.

PSI metrics on cgroupv2 (Beta) — exports CPU, memory, and IO Pressure Stall Information at the container level. This gives you early warning signals before a node saturates, rather than finding out via a crashed pod.

New Alpha metrics — informer latencies, work queue depths, and terminated containers by exit code are now exposed. These metrics close gaps that previously required custom exporters or log parsing to observe.

Node log queries via kubectl — no more SSH required to query journal logs or system logs on Linux or Windows nodes. kubectl can now retrieve them directly, which is particularly valuable for Windows nodes where SSH access has historically been painful to set up.

PVC last-used timestamp (Alpha) — pvc.Status now reports when a PersistentVolumeClaim was last used. This closes a long-standing gap for teams trying to automate storage cleanup or capacity planning for unused volumes.

Networking — Gateway API Takes Over from Ingress NGINX

On March 24, 2026, Kubernetes SIG Network and the Security Response Committee officially retired the Ingress NGINX project. No further releases. No bug fixes. No security patches. Existing deployments continue to function, and installation artifacts remain available — but you are now running unsupported software.

If your cluster still uses Ingress NGINX, this is the signal to start your migration planning. The recommended path is Gateway API v1.5, released February 27, 2026 as the biggest Gateway API release yet.

Gateway API offers a more expressive and extensible model than the legacy Ingress resource:

Structured routing — HTTPRoute, GRPCRoute, and TCPRoute resources with richer matching semantics than Ingress annotations, without the annotation sprawl
Cross-namespace references — gateway policies can be shared safely across team namespaces without requiring cluster-admin
Built-in traffic management — header manipulation, traffic splitting, retries, and timeouts are first-class API concepts, not implementation-specific annotations
Ingress2Gateway 1.0 — the migration tooling hit 1.0 in March 2026 and automates conversion of existing Ingress manifests to equivalent Gateway API resources

Also graduating to Beta in this release: IP/CIDR validation improvements (KEP-4858), which tighten how Service and Endpoint CIDRs are validated — preventing a class of subtle misconfigurations that can be difficult to diagnose in large multi-tenant clusters.

Deprecations and Removals — What You Need to Migrate Before Upgrading

Feature / API	Status in v1.36	Required Action
gitRepo volume plugin	Permanently removed	Migrate to init containers or Git-syncing sidecar patterns. Disabled since v1.11 — no path to re-enable.
IPVS mode in kube-proxy	Removed (deprecated v1.35)	Switch to iptables mode or an eBPF-based proxy alternative before upgrading.
Service.spec.externalIPs	Deprecated (warnings active)	Removal planned for v1.43. Audit your services. CVE-2020-8554 security fix applied.
Ingress NGINX Controller	Retired (project-level)	Migrate to Gateway API v1.5. Use Ingress2Gateway 1.0 tooling to assist the conversion.

One additional heads-up for teams running SELinux-enforcing Linux nodes: the SELinuxMount feature gate is anticipated to turn on by default in v1.37. The change makes volume setup faster for most workloads, but it can break certain volume configurations. Test on staging now — v1.36 is your window to find issues before they become mandatory.

The Business Case: What This Release Means for Engineering Teams

Kubernetes v1.36 is less about new capabilities and more about reducing the operational tax of running Kubernetes in production. The patterns are clear across the release:

Security moves into the platform. User Namespaces GA, fine-grained kubelet authorization, and Mutating Admission Policies all eliminate the need for third-party tools or complex workarounds to achieve baseline security postures. What used to require additional tooling is now native and committed.

AI infrastructure gets first-class support. DRA reaching GA, OCI volumes, Workload Aware Scheduling, and Pod-Level Resource Managers are not incremental improvements — they represent Kubernetes making a deliberate architectural commitment to being the right platform for GPU workloads at scale. If you run AI training or inference on Kubernetes, start evaluating DRA now.

Observability gaps close. PSI metrics, improved kubelet visibility, node log queries via kubectl, and PVC usage timestamps collectively reduce the number of situations where you need SSH access or custom exporters to understand what's happening in your cluster.

Platform teams get their time back. Mutating Admission Policies alone eliminate an entire category of webhook infrastructure. HPA fallback behavior reduces a class of autoscaling incidents. CRI list streaming improves node stability at high density. These aren't exciting in demos — but they matter enormously on Monday morning when something goes wrong.

Upgrade Checklist Before Moving to v1.36

This is not an exhaustive list, but it covers the things most likely to cause problems.

Audit gitRepo volume usage — they are permanently removed. Find replacements before upgrading to any v1.36 cluster.
Migrate off IPVS kube-proxy mode — verify your CNI configuration and proxy mode before upgrading.
Plan your Ingress NGINX migration — evaluate Gateway API v1.5 and use Ingress2Gateway 1.0 to automate the conversion of existing manifests.
Audit Service.spec.externalIPs — deprecation warnings are now active; removal is v1.43, but start the audit now.
Test SELinuxMount on staging — expected to be on by default in v1.37; find any volume configuration issues while you still have a release cycle to fix them.
Enable User Namespaces gradually — test with hostUsers: false in non-critical pods first to validate runtime compatibility before rolling out broadly.
Evaluate DRA for GPU workloads — if you run GPU-accelerated workloads, begin the migration planning from device plugins to DRA now that the framework is stable.

Kubernetes v1.36 has an end-of-life date of June 2027. You have time to upgrade carefully. But the deprecation work — particularly around Ingress NGINX and gitRepo volumes — should start now regardless of when you plan to upgrade.

Final Thoughts

Kubernetes v1.36 Haru is a spring release in the truest sense — not a dramatic new direction, but the careful, deliberate bloom of years of accumulated work reaching maturity. User Namespaces took four years. Mutating Admission Policies built on Validating Admission Policies. DRA has been in development across multiple release cycles. Workload Aware Scheduling builds on Job controller work that predates the AI boom.

That's how infrastructure improves. Not in dramatic leaps, but in the patient accumulation of stability, until one day you realize the things you used to need third-party tools for are now just there — native, stable, and committed.

"Spring returns, life begins again, and true craftsmanship shows through stability." — Kubernetes v1.36 release theme

If your team is still managing GPU workloads through device plugins, running Ingress NGINX in production, or maintaining a webhook server just to inject default labels — v1.36 is the release that gives you the native alternatives to finally move on. The cluster is waiting to do a lot more work.

What Graduated to Stable — The Features ...
User Namespaces for Pods
Mutating Admission Policies
Fine-Grained Kubelet API Authorization
External ServiceAccount Token Signing
SELinux Mount Optimization
AI/ML Infrastructure — DRA Reaches Produ...
DRA Core Framework Graduates to Stable
Device Status Visibility in kubectl
OCI VolumeSource — Model Weights as Firs...
Workload Aware Scheduling — Distributed ...
Performance and Scalability
CRI List Streaming
In-Place Pod-Level Resources Vertical Sc...
HPA External Metrics Fallback
Observability and Debugging
Networking — Gateway API Takes Over from...
Deprecations and Removals — What You Nee...
The Business Case: What This Release Mea...
Upgrade Checklist Before Moving to v1.36
Final Thoughts

Written by

Subhash TiwariDevOps Engineer

Written by

Subhash TiwariDevOps Engineer

Kubernetes v1.36 'Haru' Released: 70 Enhancements Every Backend Dev Must Know

What Graduated to Stable — The Features You Can Ship Now

User Namespaces for Pods

Mutating Admission Policies

Fine-Grained Kubelet API Authorization

External ServiceAccount Token Signing

SELinux Mount Optimization

AI/ML Infrastructure — DRA Reaches Production Maturity

DRA Core Framework Graduates to Stable

Device Status Visibility in kubectl

OCI VolumeSource — Model Weights as First-Class Citizens

Workload Aware Scheduling — Distributed Jobs as Single Entities

Performance and Scalability

CRI List Streaming

In-Place Pod-Level Resources Vertical Scaling

HPA External Metrics Fallback

Observability and Debugging

Networking — Gateway API Takes Over from Ingress NGINX

Deprecations and Removals — What You Need to Migrate Before Upgrading

The Business Case: What This Release Means for Engineering Teams

Upgrade Checklist Before Moving to v1.36

Final Thoughts

On this page

Written by

Written by