April 25, 2026 · 6 min read

Kubernetes In-Place Pod Resize (1.35): Why Commercial Rightsizing Tools Are Losing Their Moat

Kubernetes 1.35 made in-place pod resize GA. ScaleOps, StormForge, and Cast AI sell dashboards over a feature the platform now ships natively. Build-vs-buy analysis for UAE and GCC platform teams, with data residency and NESA-compliance angles.

Kubernetes In-Place Pod Resize (1.35): Why Commercial Rightsizing Tools Are Losing Their Moat

Kubernetes 1.35 shipped in December 2025 with a feature that quietly changes the build-vs-buy calculation for an entire category of commercial tooling. In-place pod vertical scaling is now GA - you can change a running pod’s CPU and memory allocation without a restart, reschedule, or IP change.

This is the underlying mechanism ScaleOps, StormForge, and Cast AI use for their marketed “no downtime rightsizing” capability. It is not proprietary. It is a standard Kubernetes feature any controller - commercial, open-source, or in-house - can call. The gap that justified paying enterprise money for rightsizing SaaS has narrowed substantially, and AI-assisted development has narrowed it further.

For UAE and GCC platform teams, this is a good moment to rerun the build-vs-buy math. The answer may have shifted since the last time you evaluated.


What Changed in Kubernetes 1.35

Before Kubernetes 1.27, the resources.requests and resources.limits fields on a Pod were immutable after admission. Any change required delete-and-recreate. This was not a Linux limitation - cgroups have supported live resize for over a decade - it was a Kubernetes API-level constraint.

Kubernetes 1.27 introduced a new /resize subresource on the Pod object, allowing controllers to mutate resources on running pods. 1.33 moved the feature to beta. 1.35 (December 2025) graduated it to GA with stable API guarantees. The flow:

  1. Controller writes new resource values via the Pod /resize subresource
  2. API server accepts and sets a Resize status condition (Proposed, InProgress, Deferred, Infeasible)
  3. Kubelet on the node writes new values directly to cgroup files (cpu.max, cpu.weight, memory.max)
  4. The Linux kernel enforces the new limits on the already-running process - no signal, no restart, no new PID

The cgroup files involved are standard cgroup v2 primitives available on every modern Linux kernel. The kubelet has been able to write them since Kubernetes launched - the API-level machinery is what was missing.


What ScaleOps, StormForge, and Cast AI Actually Do

Strip away the dashboards, and commercial Kubernetes rightsizing tools are control loops with policy engines. The core logic:

  1. Scrape metrics - CPU and memory usage per container from Prometheus or metrics-server
  2. Compute new values - P95 or P99 of recent usage, plus a headroom factor (typically 15-30%)
  3. Classify workload type - detect JVM, stateful, batch, fixed-worker-count apps
  4. Call the /resize subresource - patch Pod resources with new values
  5. Monitor the Resize condition - back off on Deferred, fall back to rolling replace on Infeasible

Everything else is productization:

  • Multi-cluster UI with central dashboards and team-level RBAC
  • Workload type library with hundreds of heuristics for detecting JVM, Python Gunicorn, Redis, PostgreSQL, etc.
  • Bin-packing optimization coordinating with Karpenter or Cluster Autoscaler to consolidate freed capacity
  • Safety layer detecting SLI degradation and rolling back automatically
  • Audit trail for SOC 2, ISO 27001, and regulated-industry compliance

The core loop is no longer a moat. The surrounding product is.


The UAE and GCC Economics

For a typical multi-cluster UAE enterprise running 200 nodes across dev, staging, and production on AWS me-central-1 or Azure UAE North, the costs stack up:

CategoryAnnual Cost
Commercial rightsizing SaaS ($15-50/node/month)$60k - $120k
Engineering time to evaluate and integrate$20k - $40k (one-time)
Over-provisioning cost avoided$200k - $800k

If your over-provisioning exceeds $500k/year, commercial tool ROI is obvious and governance features justify the spend. Below that threshold, the math gets thinner - a home-built controller starts to look rational.

The shifted 2026 reality: building used to take a quarter, now it takes a sprint. With AI-assisted development, a single platform engineer can produce a working rightsizing controller using in-place resize in under two weeks. The controller itself is ~500 lines of Go using client-go. The heuristics for workload classification, safety checks, and multi-cluster coordination take longer - but you do not need all of them to capture the bulk of the over-provisioning savings.


Data Residency: Why GCC Banks Lean Self-Hosted

UAE Personal Data Protection Law (PDPL), CBUAE Article 13 outsourcing guidance, and DESC ISR v3 all place constraints on where cluster metadata and operational data can be processed. Commercial rightsizing SaaS typically operates a central control plane that receives metrics and returns resize decisions. For regulated financial workloads, this triggers several compliance questions:

  • Where is the vendor control plane hosted?
  • What telemetry is sent outside the cluster?
  • How long is data retained, and where?
  • Does the vendor offer a UAE-resident or self-hosted option?

Self-hosted alternatives keep all data in-cluster by default. Options:

  • Vertical Pod Autoscaler (VPA) - CNCF, open-source, runs entirely in-cluster. VPA 1.2+ supports InPlaceOrRecreate update mode using in-place resize where possible.
  • Goldilocks (Fairwinds, open-source) - VPA recommendation dashboard, entirely in-cluster.
  • Custom controller - ~500 lines of Go using client-go, deployed as a Deployment with namespace-scoped RBAC, no external control plane.

For UAE banks subject to CBUAE outsourcing approval or DESC ISR audits, the self-hosted path avoids the data-residency attestation step entirely.


Decision Framework for 2026

A practical framework for UAE and GCC platform teams in 2026:

Build if:

  • Single cluster, one or two teams
  • Homogeneous workloads (mostly stateless Go, Node, Python services)
  • You already run a platform team with client-go experience
  • Current rightsizing SaaS bill under $30k/year
  • Data residency requirements make self-hosted preferable

Buy if:

  • Multi-cluster, multi-team, heterogeneous workload mix
  • Stateful workloads (databases, JVM-heavy apps, Kafka) requiring sophisticated handling
  • Enterprise governance, audit trails, RBAC across teams
  • Current over-provisioning exceeds $500k/year - commercial tool ROI is clear
  • No platform engineering capacity for in-cluster controllers

Hybrid (common 2026 pattern):

  • Kubecost or OpenCost for cost visibility (keeps auditors happy)
  • In-house controller or VPA for the bulk of workloads
  • Commercial tool for JVM-heavy tiers where workload-type detection matters

What UAE Platform Teams Should Do Next

A concrete 90-day plan:

Week 1-2: Audit. Check Kubernetes versions across clusters. Anything below 1.27 needs upgrade planning before in-place resize is usable. Inventory over-provisioning with Kubecost or OpenCost for 30 days to quantify the cost.

Week 3-6: Pilot. Pick one namespace of stateless workloads. Deploy VPA in InPlaceOrRecreate mode with conservative policy (P95 + 25% headroom). Measure actual cost reduction vs manual baseline.

Week 7-10: Decide. Based on pilot savings, over-provisioning total, and multi-cluster complexity, decide: expand VPA-based self-hosted, build custom controller, or evaluate commercial tools. If you evaluated commercial tools in 2024-2025, the rerun is worth the two weeks - the underlying economics have shifted.

Week 11-12: Roll out. Gradual namespace-by-namespace rollout with monitoring. Document the resizePolicy per workload type (RestartContainer for JVM and Postgres, NotRequired for generic stateless).


Broader Signal: AI and the SaaS Tooling Market

In-place pod resize going GA is a specific example of a broader 2026 trend: commercial tooling built as thin wrappers over underlying platform features is losing its moat as those features stabilize and AI-assisted development shrinks the build side of the calculation.

A pile of Series A and Series B DevOps and FinOps startups are, in honest terms, control loops with dashboards over features that Kubernetes, Prometheus, or cloud providers already ship. The productization (multi-cluster, governance, audit) still takes real engineering effort. The core feature does not.

For UAE and GCC enterprise architects evaluating DevOps tooling spend in 2026: the question is no longer “can we build it?” - AI makes most of these weekend projects - but “is the governance layer worth the per-seat cost?” The answer varies, but the default assumption has changed.


Further Reading


Running a Kubernetes rightsizing evaluation in UAE or GCC? We help platform teams design build-vs-buy analyses, run commercial tool PoCs, and deploy VPA-InPlace rollouts for regulated workloads. Get in touch.

Frequently Asked Questions

What is Kubernetes in-place pod resize?

In-place pod resize is a Kubernetes feature that lets kubelet change a running pod's CPU and memory allocation without restarting the container. Introduced as alpha in Kubernetes 1.27 (May 2023), promoted to beta in 1.33, and went GA in Kubernetes 1.35 (December 2025). The kubelet writes new values directly to Linux cgroup files (cpu.max, memory.max), and the kernel enforces them on the already-running process. No restart, no reschedule, no IP change. This is the underlying mechanism ScaleOps, StormForge, and Cast AI use for their no-restart rightsizing claims.

Does ScaleOps still add value now that in-place resize is native?

For large multi-cluster UAE and GCC enterprises, yes - the policy engine, workload-type detection, multi-cluster governance, bin-packing optimization, and automated rollback are meaningful. For single-cluster teams running mostly stateless workloads, a home-built controller covers roughly 70% of the value. The shifted 2026 reality: the build side of build-vs-buy is now a sprint rather than a quarter, particularly with AI-assisted development. Many teams that bought rightsizing SaaS in 2024-2025 will reconsider at renewal.

Can UAE regulated banks use ScaleOps or StormForge?

Depends on the deployment model. ScaleOps and StormForge operate SaaS control planes that receive cluster metrics and issue resize commands. For banks and financial institutions subject to CBUAE Article 13 or data residency requirements under UAE Personal Data Protection Law (PDPL), the control plane typically needs to be hosted in-region or self-hosted. Verify vendor region support explicitly before commercial agreements. Self-hosted alternatives (VPA + custom controller using in-place resize) keep all data in-cluster.

What Kubernetes version do I need for in-place pod resize?

Kubernetes 1.35 (December 2025) for GA / production use with no feature gate required. Kubernetes 1.33-1.34 have the feature in beta with the feature gate on by default. Kubernetes 1.27-1.32 have it in alpha behind the InPlacePodVerticalScaling feature gate - not recommended for production. Kubernetes below 1.27 cannot do in-place resize at all; resource changes require delete-recreate. Major managed platforms (EKS, AKS, GKE) support 1.35 in 2026.

What is the real cost of commercial rightsizing tools?

Commercial Kubernetes rightsizing tools typically price between $15 and $50 per node per month, with enterprise multi-cluster agreements in the $100k-$500k annual range. For a 200-node multi-cluster UAE enterprise, expect $60k-$120k annually on rightsizing alone. This is on top of observability, security, and cost-visibility tooling. The 2026 build-vs-buy math: if your over-provisioning is less than $500k/year, a home-built controller has strong ROI. Above that threshold, commercial governance features often justify the licence cost.

Which workloads cannot be rightsized in-place?

JVM applications with fixed -Xmx ignore new cgroup memory ceilings because the heap is set at startup. Python Gunicorn and Ruby Unicorn with fixed worker counts cannot use new CPU beyond their concurrency cap. PostgreSQL shared_buffers, Redis maxmemory, and similar config-file-driven limits are read once at boot. Memory decreases are risky - if the current resident set size exceeds the new limit, the kernel OOM-kills immediately. For these workloads, Kubernetes resizePolicy RestartContainer forces a restart, or a rolling replace happens instead of in-place resize.

What should UAE platform teams do about rightsizing in 2026?

Three-step playbook: First, audit Kubernetes version across clusters - if below 1.27, upgrade is the prerequisite. Second, measure actual over-provisioning with Kubecost or OpenCost for 30 days to quantify the cost. Third, pick the approach - open-source VPA + Goldilocks for small clusters, commercial tool (ScaleOps, StormForge, Cast AI) for multi-cluster enterprises, or a home-built controller for teams with platform engineering capacity. Data-residency-sensitive banks should favour self-hosted options. The decision needs a rerun in 2026 even if you evaluated commercial tools in 2024-2025 - the underlying primitive has changed.

Get Started for Free

We would be happy to speak with you and arrange a free consultation with our Kubernetes Expert in Dubai, UAE. 30-minute call, actionable results in days.

Talk to an Expert