Create and Manage Scaling Policies
This guide describes how to create and manage ScalingPolicy resources using YAML manifests and kubectl. A Scaling Policy defines autoscaling behavior for a Compute Pool by configuring CPU and GPU utilization thresholds, scaling durations, resource bounds, and cooldown periods.
When a Compute Pool references a Scaling Policy, PaletteAI continuously monitors resource utilization and automatically adds or removes nodes to match workload demand.
Prerequisites
Before you create a Scaling Policy, confirm that you have the following resources available.
-
Access to the hub cluster with permissions to create
ScalingPolicyresources in the target namespace -
A Compute Pool in
Runningstatus to enable autoscaling on -
Prometheus running and accessible from the hub cluster. The
ScalingPolicycontroller queries Prometheus for CPU and GPU utilization metrics. APrometheusAvailablecondition is set on theScalingPolicystatus when the connection is confirmed.Refer to Configure Prometheus Agent Monitoring to configure
global.metrics, spoke-side Prometheus agents, and GPU metric collection prerequisites.
Create a Scaling Policy
A ScalingPolicy is a namespaced resource in the spectrocloud.com/v1alpha1 API group. You must specify at least one of cpu or gpu scaling configurations.
CPU Scaling Policy
Use this configuration when your Compute Pool uses CPU-based workloads.
apiVersion: spectrocloud.com/v1alpha1
kind: ScalingPolicy
metadata:
name: my-cpu-policy
namespace: my-project-namespace
spec:
cpu:
scaleUpThreshold: 80
scaleDownThreshold: 20
scaleUpDuration: 5m
scaleDownDuration: 10m
cpuResourceBounds:
minCPUCount: 4
maxCPUCount: 64
cooldownDuration: 15m
abortDuration: 30m
Apply the manifest to the hub cluster:
kubectl apply --filename cpu-scaling-policy.yaml
GPU Scaling Policy
Use this configuration when your Compute Pool uses GPU-based workloads. You must configure one gpu entry per GPU family. Each GPU family in gpuResourceBounds must have a corresponding entry in gpu.
apiVersion: spectrocloud.com/v1alpha1
kind: ScalingPolicy
metadata:
name: my-gpu-policy
namespace: my-project-namespace
spec:
gpu:
- family: 'NVIDIA-H100'
scaleUpThreshold: 85
scaleDownThreshold: 15
scaleUpDuration: 3m
scaleDownDuration: 8m
- family: 'NVIDIA-A100'
scaleUpThreshold: 85
scaleDownThreshold: 15
scaleUpDuration: 3m
scaleDownDuration: 8m
gpuResourceBounds:
- family: 'NVIDIA-H100'
minGPUCount: 1
maxGPUCount: 16
- family: 'NVIDIA-A100'
minGPUCount: 1
maxGPUCount: 8
cooldownDuration: 15m
abortDuration: 30m
Apply the manifest to the hub cluster:
kubectl apply --filename gpu-scaling-policy.yaml
Combined CPU and GPU Scaling Policy
You can configure both CPU and GPU scaling in a single policy. This is useful when worker pools in the same Compute Pool contain both CPU-only and GPU nodes.
apiVersion: spectrocloud.com/v1alpha1
kind: ScalingPolicy
metadata:
name: my-combined-policy
namespace: my-project-namespace
spec:
cpu:
scaleUpThreshold: 80
scaleDownThreshold: 20
scaleUpDuration: 5m
scaleDownDuration: 10m
gpu:
- family: 'NVIDIA-H100'
scaleUpThreshold: 85
scaleDownThreshold: 15
scaleUpDuration: 3m
scaleDownDuration: 8m
cpuResourceBounds:
minCPUCount: 4
maxCPUCount: 64
gpuResourceBounds:
- family: 'NVIDIA-H100'
minGPUCount: 1
maxGPUCount: 16
cooldownDuration: 15m
abortDuration: 30m
Apply the manifest to the hub cluster:
kubectl apply --filename combined-scaling-policy.yaml
Scaling Policy Fields
The following tables describe all fields in the ScalingPolicy spec.
spec
| Field | Type | Default | Required | Description |
|---|---|---|---|---|
cpu | CPUScaling | — | ❌ | CPU scaling configuration. Required if gpu is not specified. |
gpu | []GPUScaling | — | ❌ | GPU scaling configuration per GPU family. Required if cpu is not specified. |
cpuResourceBounds | CPUResourceBounds | — | ❌ | Aggregate CPU count bounds across all nodes in the pool. |
gpuResourceBounds | []GPUResourceBounds | — | ❌ | Aggregate GPU count bounds per GPU family. Each entry must correspond to a family in gpu. |
cooldownDuration | duration | 15m | ❌ | Waiting period after a successful scaling action before the next scaling decision. |
abortDuration | duration | 30m | ❌ | Timeout for an ongoing scale-up operation. When exceeded, pending nodes that have not reached Healthy status are removed; nodes that provisioned successfully are retained. Scale-down operations are never aborted. |
spec.cpu and spec.gpu[*]
Both cpu and each gpu entry use the same set of scaling parameters.
| Field | Type | Default | Required | Description |
|---|---|---|---|---|
scaleUpThreshold | integer | 80 | ❌ | Utilization percentage above which scale-up is triggered. Must be between 1 and 100. Must be greater than scaleDownThreshold. |
scaleDownThreshold | integer | 20 | ❌ | Utilization percentage below which scale-down is triggered. Must be between 0 and 99. Must be less than scaleUpThreshold. |
scaleUpDuration | duration | 5m | ❌ | Duration for which utilization must remain consistently above scaleUpThreshold before scale-up is triggered. |
scaleDownDuration | duration | 10m | ❌ | Duration for which utilization must remain consistently below scaleDownThreshold before scale-down is triggered. |
Each gpu entry also requires:
| Field | Type | Default | Required | Description |
|---|---|---|---|---|
family | string | — | ✅ | GPU family identifier (for example, "NVIDIA-H100"). The controller matches this value against GPU families present on allocated hosts in the worker pool. Entries for GPU families not present in the pool are silently ignored. |
spec.cpuResourceBounds
| Field | Type | Default | Required | Description |
|---|---|---|---|---|
minCPUCount | integer | — | ❌ | Minimum aggregate CPU count across all nodes in the pool. Must be at least 1. |
maxCPUCount | integer | — | ❌ | Maximum aggregate CPU count across all nodes in the pool. Must be at least 1. Must be greater than or equal to minCPUCount. |
spec.gpuResourceBounds[*]
| Field | Type | Default | Required | Description |
|---|---|---|---|---|
family | string | — | ✅ | GPU family identifier. Must match a family defined in spec.gpu. |
minGPUCount | integer | — | ❌ | Minimum aggregate GPU count across all nodes for this family. Must be at least 1. |
maxGPUCount | integer | — | ❌ | Maximum aggregate GPU count across all nodes for this family. Must be at least 1. Must be greater than or equal to minGPUCount. |
Reference a Scaling Policy from a Compute Pool
To enable autoscaling on a Compute Pool, add a scalingPolicyRef to its clusterVariant configuration. Set scalingPolicyRef.namespace to the namespace where the ScalingPolicy is deployed. The Compute Pool and the ScalingPolicy can be in different namespaces.
Dedicated Compute Pool:
spec:
clusterVariant:
dedicated:
scalingPolicyRef:
name: my-cpu-policy
namespace: my-project-namespace
Shared Compute Pool:
spec:
clusterVariant:
shared:
scalingPolicyRef:
name: my-cpu-policy
namespace: my-project-namespace
Apply the updated ComputePool manifest or patch the existing resource:
kubectl patch computepool my-compute-pool \
--namespace my-project-namespace \
--type merge \
--patch '{"spec":{"clusterVariant":{"dedicated":{"scalingPolicyRef":{"name":"my-cpu-policy","namespace":"my-project-namespace"}}}}}'
After the reference is applied, PaletteAI begins evaluating metrics and the Scaling Policy status reflects the associated Compute Pool.
Validate
-
Confirm the
ScalingPolicyresource exists and that Prometheus is available:kubectl get scalingpolicy my-cpu-policy --namespace my-project-namespaceExample OutputNAME PROMETHEUS_AVAILABLE PROCESSED_COMPUTEPOOLS AGE
my-cpu-policy True 1 5mThe
PROMETHEUS_AVAILABLEcolumn indicates whether the controller can reach Prometheus to query metrics. If it showsFalse, verify that Prometheus is running and accessible from the hub cluster. -
Inspect the full status to confirm the associated Compute Pool is listed:
kubectl describe scalingpolicy my-cpu-policy --namespace my-project-namespaceUnder
Status, confirm that the processed pool count is greater than zero and that the name of your Compute Pool appears in thecomputePoolslist. -
Confirm the
ComputePoolEvaluationresource is created for the associated Compute Pool:kubectl get computepoolevaluation --namespace my-project-namespaceA
ComputePoolEvaluationresource is created for each Compute Pool that references the Scaling Policy. It records the current scaling decision and target resource counts.
Pre-Defined Scaling Profiles
PaletteAI ships three pre-defined Scaling Policies that you can reference directly or use as a starting point for your own policies.
| Name | Scale-Up Threshold | Scale-Down Threshold | Scale-Up Duration | Scale-Down Duration | Cooldown |
|---|---|---|---|---|---|
aggressive | 50% | 15% | 2m | 3m | 10m |
balanced | 75% | 20% | 5m | 8m | 20m |
conservative | 85% | 10% | 10m | 15m | 30m |
These profiles are installed into the system namespace of the hub cluster during PaletteAI installation and are re-applied during upgrades. Like all resources in the system namespace, they are managed exclusively by the platform and cannot be modified by users. Reference them in a Compute Pool the same way you would reference a custom Scaling Policy. To customize thresholds or bounds, clone a profile into your project namespace and edit the clone, then reference the clone from your Compute Pool.
Use aggressive for latency-sensitive workloads that need fast scale-up response, balanced for general-purpose workloads, and conservative for stable workloads where over-provisioning is costly.
Update a Scaling Policy
You can update a Scaling Policy you created or cloned in your project namespace at any time. Changes take effect on the next reconciliation cycle. Pre-defined policies (aggressive, balanced, conservative) in the system namespace cannot be modified by users. To change their behavior, clone one to your project namespace and update the clone.
Update the manifest and re-apply it:
kubectl apply --filename cpu-scaling-policy.yaml
Or patch a specific field directly:
kubectl patch scalingpolicy my-cpu-policy \
--namespace my-project-namespace \
--type merge \
--patch '{"spec":{"cooldownDuration":"20m"}}'
Updating a Scaling Policy does not interrupt active scaling operations. The new configuration applies after the current scaling action completes.
Delete a Scaling Policy
Before you delete a Scaling Policy, remove the scalingPolicyRef from all Compute Pools that reference it. The webhook prevents deletion of a Scaling Policy that is still referenced by active Compute Pools.
-
Remove the
scalingPolicyReffrom each referencing Compute Pool:kubectl patch computepool my-compute-pool \
--namespace my-project-namespace \
--type merge \
--patch '{"spec":{"clusterVariant":{"dedicated":{"scalingPolicyRef":null}}}}' -
Delete the Scaling Policy:
kubectl delete scalingpolicy my-cpu-policy --namespace my-project-namespaceIf any active Compute Pool still references the policy, the deletion is rejected with an error listing the referencing resources. Remove the remaining references and retry.
Next Steps
-
Learn about autoscaling behavior in Compute Pool concepts.
-
View the full
ComputePoolconfiguration reference in Compute Pool Configuration.