Version: v0.1.x

Create and Manage Scaling Policies

This guide describes how to create and manage ScalingPolicy resources using YAML manifests and kubectl. A Scaling Policy defines autoscaling behavior for a Compute Pool by configuring CPU and GPU utilization thresholds, scaling durations, resource bounds, and cooldown periods.

When a Compute Pool references a Scaling Policy, PaletteAI continuously monitors resource utilization and automatically adds or removes nodes to match workload demand.

Prerequisites

Before you create a Scaling Policy, confirm that you have the following resources available.

Access to the hub cluster with permissions to create ScalingPolicy resources in the target namespace
A Compute Pool in Running status to enable autoscaling on
Prometheus running and accessible from the hub cluster. The ScalingPolicy controller queries Prometheus for CPU and GPU utilization metrics. A PrometheusAvailable condition is set on the ScalingPolicy status when the connection is confirmed.

Refer to Configure Prometheus Agent Monitoring to configure global.metrics, spoke-side Prometheus agents, and GPU metric collection prerequisites.

Create a Scaling Policy

A ScalingPolicy is a namespaced resource in the spectrocloud.com/v1alpha1 API group. You must specify at least one of cpu or gpu scaling configurations.

CPU Scaling Policy

Use this configuration when your Compute Pool uses CPU-based workloads.

cpu-scaling-policy.yaml
apiVersion: spectrocloud.com/v1alpha1
kind: ScalingPolicy
metadata:
  name: my-cpu-policy
  namespace: my-project-namespace
spec:
  cpu:
    scaleUpThreshold: 80
    scaleDownThreshold: 20
    scaleUpDuration: 5m
    scaleDownDuration: 10m
  cpuResourceBounds:
    minCPUCount: 4
    maxCPUCount: 64
  cooldownDuration: 15m
  abortDuration: 30m

Apply the manifest to the hub cluster:

kubectl apply --filename cpu-scaling-policy.yaml

GPU Scaling Policy

Use this configuration when your Compute Pool uses GPU-based workloads. You must configure one gpu entry per GPU family. Each GPU family in gpuResourceBounds must have a corresponding entry in gpu.

gpu-scaling-policy.yaml
apiVersion: spectrocloud.com/v1alpha1
kind: ScalingPolicy
metadata:
  name: my-gpu-policy
  namespace: my-project-namespace
spec:
  gpu:
    - family: 'NVIDIA-H100'
      scaleUpThreshold: 85
      scaleDownThreshold: 15
      scaleUpDuration: 3m
      scaleDownDuration: 8m
    - family: 'NVIDIA-A100'
      scaleUpThreshold: 85
      scaleDownThreshold: 15
      scaleUpDuration: 3m
      scaleDownDuration: 8m
  gpuResourceBounds:
    - family: 'NVIDIA-H100'
      minGPUCount: 1
      maxGPUCount: 16
    - family: 'NVIDIA-A100'
      minGPUCount: 1
      maxGPUCount: 8
  cooldownDuration: 15m
  abortDuration: 30m

Apply the manifest to the hub cluster:

kubectl apply --filename gpu-scaling-policy.yaml

Combined CPU and GPU Scaling Policy

You can configure both CPU and GPU scaling in a single policy. This is useful when worker pools in the same Compute Pool contain both CPU-only and GPU nodes.

combined-scaling-policy.yaml
apiVersion: spectrocloud.com/v1alpha1
kind: ScalingPolicy
metadata:
  name: my-combined-policy
  namespace: my-project-namespace
spec:
  cpu:
    scaleUpThreshold: 80
    scaleDownThreshold: 20
    scaleUpDuration: 5m
    scaleDownDuration: 10m
  gpu:
    - family: 'NVIDIA-H100'
      scaleUpThreshold: 85
      scaleDownThreshold: 15
      scaleUpDuration: 3m
      scaleDownDuration: 8m
  cpuResourceBounds:
    minCPUCount: 4
    maxCPUCount: 64
  gpuResourceBounds:
    - family: 'NVIDIA-H100'
      minGPUCount: 1
      maxGPUCount: 16
  cooldownDuration: 15m
  abortDuration: 30m

Apply the manifest to the hub cluster:

kubectl apply --filename combined-scaling-policy.yaml

Scaling Policy Fields

The following tables describe all fields in the ScalingPolicy spec.

`spec`

Field	Type	Default	Required	Description
`cpu`	`CPUScaling`	—	❌	CPU scaling configuration. Required if `gpu` is not specified.
`gpu`	`[]GPUScaling`	—	❌	GPU scaling configuration per GPU family. Required if `cpu` is not specified.
`cpuResourceBounds`	`CPUResourceBounds`	—	❌	Aggregate CPU count bounds across all nodes in the pool.
`gpuResourceBounds`	`[]GPUResourceBounds`	—	❌	Aggregate GPU count bounds per GPU family. Each entry must correspond to a family in `gpu`.
`cooldownDuration`	`duration`	`15m`	❌	Waiting period after a successful scaling action before the next scaling decision.
`abortDuration`	`duration`	`30m`	❌	Timeout for an ongoing scale-up operation. When exceeded, pending nodes that have not reached `Healthy` status are removed; nodes that provisioned successfully are retained. Scale-down operations are never aborted.

`spec.cpu` and `spec.gpu[*]`

Both cpu and each gpu entry use the same set of scaling parameters.

Field	Type	Default	Required	Description
`scaleUpThreshold`	`integer`	`80`	❌	Utilization percentage above which scale-up is triggered. Must be between `1` and `100`. Must be greater than `scaleDownThreshold`.
`scaleDownThreshold`	`integer`	`20`	❌	Utilization percentage below which scale-down is triggered. Must be between `0` and `99`. Must be less than `scaleUpThreshold`.
`scaleUpDuration`	`duration`	`5m`	❌	Duration for which utilization must remain consistently above `scaleUpThreshold` before scale-up is triggered.
`scaleDownDuration`	`duration`	`10m`	❌	Duration for which utilization must remain consistently below `scaleDownThreshold` before scale-down is triggered.

Each gpu entry also requires:

Field	Type	Default	Required	Description
`family`	`string`	—	✅	GPU family identifier (for example, `"NVIDIA-H100"`). The controller matches this value against GPU families present on allocated hosts in the worker pool. Entries for GPU families not present in the pool are silently ignored.

`spec.cpuResourceBounds`

Field	Type	Default	Required	Description
`minCPUCount`	`integer`	—	❌	Minimum aggregate CPU count across all nodes in the pool. Must be at least `1`.
`maxCPUCount`	`integer`	—	❌	Maximum aggregate CPU count across all nodes in the pool. Must be at least `1`. Must be greater than or equal to `minCPUCount`.

`spec.gpuResourceBounds[*]`

Field	Type	Default	Required	Description
`family`	`string`	—	✅	GPU family identifier. Must match a family defined in `spec.gpu`.
`minGPUCount`	`integer`	—	❌	Minimum aggregate GPU count across all nodes for this family. Must be at least `1`.
`maxGPUCount`	`integer`	—	❌	Maximum aggregate GPU count across all nodes for this family. Must be at least `1`. Must be greater than or equal to `minGPUCount`.

Reference a Scaling Policy from a Compute Pool

To enable autoscaling on a Compute Pool, add a scalingPolicyRef to its clusterVariant configuration. Set scalingPolicyRef.namespace to the namespace where the ScalingPolicy is deployed. The Compute Pool and the ScalingPolicy can be in different namespaces.

Dedicated Compute Pool:

spec:
  clusterVariant:
    dedicated:
      scalingPolicyRef:
        name: my-cpu-policy
        namespace: my-project-namespace

Shared Compute Pool:

spec:
  clusterVariant:
    shared:
      scalingPolicyRef:
        name: my-cpu-policy
        namespace: my-project-namespace

Apply the updated ComputePool manifest or patch the existing resource:

kubectl patch computepool my-compute-pool \
  --namespace my-project-namespace \
  --type merge \
  --patch '{"spec":{"clusterVariant":{"dedicated":{"scalingPolicyRef":{"name":"my-cpu-policy","namespace":"my-project-namespace"}}}}}'

After the reference is applied, PaletteAI begins evaluating metrics and the Scaling Policy status reflects the associated Compute Pool.

Validate

Confirm the ScalingPolicy resource exists and that Prometheus is available:
```
kubectl get scalingpolicy my-cpu-policy --namespace my-project-namespace
```
Example Output
```
NAME            PROMETHEUS_AVAILABLE   PROCESSED_COMPUTEPOOLS   AGE
my-cpu-policy   True                   1                        5m
```
The PROMETHEUS_AVAILABLE column indicates whether the controller can reach Prometheus to query metrics. If it shows False, verify that Prometheus is running and accessible from the hub cluster.
Inspect the full status to confirm the associated Compute Pool is listed:
```
kubectl describe scalingpolicy my-cpu-policy --namespace my-project-namespace
```
Under Status, confirm that the processed pool count is greater than zero and that the name of your Compute Pool appears in the computePools list.
Confirm the ComputePoolEvaluation resource is created for the associated Compute Pool:
```
kubectl get computepoolevaluation --namespace my-project-namespace
```
A ComputePoolEvaluation resource is created for each Compute Pool that references the Scaling Policy. It records the current scaling decision and target resource counts.

Pre-Defined Scaling Profiles

PaletteAI ships three pre-defined Scaling Policies that you can reference directly or use as a starting point for your own policies.

Name	Scale-Up Threshold	Scale-Down Threshold	Scale-Up Duration	Scale-Down Duration	Cooldown
`aggressive`	50%	15%	2m	3m	10m
`balanced`	75%	20%	5m	8m	20m
`conservative`	85%	10%	10m	15m	30m

These profiles are installed into the system namespace of the hub cluster during PaletteAI installation and are re-applied during upgrades. Like all resources in the system namespace, they are managed exclusively by the platform and cannot be modified by users. Reference them in a Compute Pool the same way you would reference a custom Scaling Policy. To customize thresholds or bounds, clone a profile into your project namespace and edit the clone, then reference the clone from your Compute Pool.

tip

Use aggressive for latency-sensitive workloads that need fast scale-up response, balanced for general-purpose workloads, and conservative for stable workloads where over-provisioning is costly.

Update a Scaling Policy

You can update a Scaling Policy you created or cloned in your project namespace at any time. Changes take effect on the next reconciliation cycle. Pre-defined policies (aggressive, balanced, conservative) in the system namespace cannot be modified by users. To change their behavior, clone one to your project namespace and update the clone.

Update the manifest and re-apply it:

kubectl apply --filename cpu-scaling-policy.yaml

Or patch a specific field directly:

kubectl patch scalingpolicy my-cpu-policy \
  --namespace my-project-namespace \
  --type merge \
  --patch '{"spec":{"cooldownDuration":"20m"}}'

info

Updating a Scaling Policy does not interrupt active scaling operations. The new configuration applies after the current scaling action completes.

Delete a Scaling Policy

Before you delete a Scaling Policy, remove the scalingPolicyRef from all Compute Pools that reference it. The webhook prevents deletion of a Scaling Policy that is still referenced by active Compute Pools.

Remove the scalingPolicyRef from each referencing Compute Pool:

kubectl patch computepool my-compute-pool \
  --namespace my-project-namespace \
  --type merge \
  --patch '{"spec":{"clusterVariant":{"dedicated":{"scalingPolicyRef":null}}}}'

Delete the Scaling Policy:
```
kubectl delete scalingpolicy my-cpu-policy --namespace my-project-namespace
```
If any active Compute Pool still references the policy, the deletion is rejected with an error listing the referencing resources. Remove the remaining references and retry.

Next Steps

Learn about autoscaling behavior in Compute Pool concepts.
View the full ComputePool configuration reference in Compute Pool Configuration.

Prerequisites​

Create a Scaling Policy​

CPU Scaling Policy​

GPU Scaling Policy​

Combined CPU and GPU Scaling Policy​

Scaling Policy Fields​

spec​

spec.cpu and spec.gpu[*]​

spec.cpuResourceBounds​

spec.gpuResourceBounds[*]​

Reference a Scaling Policy from a Compute Pool​

Validate​

Pre-Defined Scaling Profiles​

Update a Scaling Policy​

Delete a Scaling Policy​

Next Steps​

Prerequisites

Create a Scaling Policy

CPU Scaling Policy

GPU Scaling Policy

Combined CPU and GPU Scaling Policy

Scaling Policy Fields

`spec`

`spec.cpu` and `spec.gpu[*]`

`spec.cpuResourceBounds`

`spec.gpuResourceBounds[*]`

Reference a Scaling Policy from a Compute Pool

Validate

Pre-Defined Scaling Profiles

Update a Scaling Policy

Delete a Scaling Policy

Next Steps