Skip to main content
Version: v1.0.x

Create and Manage Scaling Policies

This guide describes how to create and manage ScalingPolicy resources using YAML manifests and kubectl. A Scaling Policy defines autoscaling behavior for a Compute Pool by configuring CPU and GPU utilization thresholds, scaling durations, resource bounds, and cooldown periods.

When a Compute Pool references a Scaling Policy, PaletteAI continuously monitors resource utilization and automatically adds or removes nodes to match workload demand.

Prerequisites

Before you create a Scaling Policy, confirm that you have the following resources available.

  • Access to the hub cluster with permissions to create ScalingPolicy resources in the target namespace

  • A Compute Pool in Running status to enable autoscaling on

  • Prometheus running and accessible from the hub cluster. The ScalingPolicy controller queries Prometheus for CPU and GPU utilization metrics. A PrometheusAvailable condition is set on the ScalingPolicy status when the connection is confirmed.

    Refer to Configure Prometheus Agent Monitoring to configure global.metrics, spoke-side Prometheus agents, and GPU metric collection prerequisites.

Create a Scaling Policy

A ScalingPolicy is a namespaced resource in the spectrocloud.com/v1alpha1 API group. You must specify at least one of cpu or gpu scaling configurations.

CPU Scaling Policy

Use this configuration when your Compute Pool uses CPU-based workloads.

cpu-scaling-policy.yaml
apiVersion: spectrocloud.com/v1alpha1
kind: ScalingPolicy
metadata:
name: my-cpu-policy
namespace: my-project-namespace
spec:
cpu:
scaleUpThreshold: 80
scaleDownThreshold: 20
scaleUpDuration: 5m
scaleDownDuration: 10m
cpuResourceBounds:
minCPUCount: 4
maxCPUCount: 64
cooldownDuration: 15m
abortDuration: 30m

Apply the manifest to the hub cluster:

kubectl apply --filename cpu-scaling-policy.yaml

GPU Scaling Policy

Use this configuration when your Compute Pool uses GPU-based workloads. You must configure one gpu entry per GPU family. Each GPU family in gpuResourceBounds must have a corresponding entry in gpu.

gpu-scaling-policy.yaml
apiVersion: spectrocloud.com/v1alpha1
kind: ScalingPolicy
metadata:
name: my-gpu-policy
namespace: my-project-namespace
spec:
gpu:
- family: 'NVIDIA-H100'
scaleUpThreshold: 85
scaleDownThreshold: 15
scaleUpDuration: 3m
scaleDownDuration: 8m
- family: 'NVIDIA-A100'
scaleUpThreshold: 85
scaleDownThreshold: 15
scaleUpDuration: 3m
scaleDownDuration: 8m
gpuResourceBounds:
- family: 'NVIDIA-H100'
minGPUCount: 1
maxGPUCount: 16
- family: 'NVIDIA-A100'
minGPUCount: 1
maxGPUCount: 8
cooldownDuration: 15m
abortDuration: 30m

Apply the manifest to the hub cluster:

kubectl apply --filename gpu-scaling-policy.yaml

Combined CPU and GPU Scaling Policy

You can configure both CPU and GPU scaling in a single policy. This is useful when worker pools in the same Compute Pool contain both CPU-only and GPU nodes.

combined-scaling-policy.yaml
apiVersion: spectrocloud.com/v1alpha1
kind: ScalingPolicy
metadata:
name: my-combined-policy
namespace: my-project-namespace
spec:
cpu:
scaleUpThreshold: 80
scaleDownThreshold: 20
scaleUpDuration: 5m
scaleDownDuration: 10m
gpu:
- family: 'NVIDIA-H100'
scaleUpThreshold: 85
scaleDownThreshold: 15
scaleUpDuration: 3m
scaleDownDuration: 8m
cpuResourceBounds:
minCPUCount: 4
maxCPUCount: 64
gpuResourceBounds:
- family: 'NVIDIA-H100'
minGPUCount: 1
maxGPUCount: 16
cooldownDuration: 15m
abortDuration: 30m

Apply the manifest to the hub cluster:

kubectl apply --filename combined-scaling-policy.yaml

Scaling Policy Fields

The following tables describe all fields in the ScalingPolicy spec.

spec

FieldTypeDefaultRequiredDescription
cpuCPUScalingCPU scaling configuration. Required if gpu is not specified.
gpu[]GPUScalingGPU scaling configuration per GPU family. Required if cpu is not specified.
cpuResourceBoundsCPUResourceBoundsAggregate CPU count bounds across all nodes in the pool.
gpuResourceBounds[]GPUResourceBoundsAggregate GPU count bounds per GPU family. Each entry must correspond to a family in gpu.
cooldownDurationduration15mWaiting period after a successful scaling action before the next scaling decision.
abortDurationduration30mTimeout for an ongoing scale-up operation. When exceeded, pending nodes that have not reached Healthy status are removed; nodes that provisioned successfully are retained. Scale-down operations are never aborted.

spec.cpu and spec.gpu[*]

Both cpu and each gpu entry use the same set of scaling parameters.

FieldTypeDefaultRequiredDescription
scaleUpThresholdinteger80Utilization percentage above which scale-up is triggered. Must be between 1 and 100. Must be greater than scaleDownThreshold.
scaleDownThresholdinteger20Utilization percentage below which scale-down is triggered. Must be between 0 and 99. Must be less than scaleUpThreshold.
scaleUpDurationduration5mDuration for which utilization must remain consistently above scaleUpThreshold before scale-up is triggered.
scaleDownDurationduration10mDuration for which utilization must remain consistently below scaleDownThreshold before scale-down is triggered.

Each gpu entry also requires:

FieldTypeDefaultRequiredDescription
familystringGPU family identifier (for example, "NVIDIA-H100"). The controller matches this value against GPU families present on allocated hosts in the worker pool. Entries for GPU families not present in the pool are silently ignored.

spec.cpuResourceBounds

FieldTypeDefaultRequiredDescription
minCPUCountintegerMinimum aggregate CPU count across all nodes in the pool. Must be at least 1.
maxCPUCountintegerMaximum aggregate CPU count across all nodes in the pool. Must be at least 1. Must be greater than or equal to minCPUCount.

spec.gpuResourceBounds[*]

FieldTypeDefaultRequiredDescription
familystringGPU family identifier. Must match a family defined in spec.gpu.
minGPUCountintegerMinimum aggregate GPU count across all nodes for this family. Must be at least 1.
maxGPUCountintegerMaximum aggregate GPU count across all nodes for this family. Must be at least 1. Must be greater than or equal to minGPUCount.

Reference a Scaling Policy from a Compute Pool

To enable autoscaling on a Compute Pool, add a scalingPolicyRef to its clusterVariant configuration. Set scalingPolicyRef.namespace to the namespace where the ScalingPolicy is deployed. The Compute Pool and the ScalingPolicy can be in different namespaces.

Dedicated Compute Pool:

spec:
clusterVariant:
dedicated:
scalingPolicyRef:
name: my-cpu-policy
namespace: my-project-namespace

Shared Compute Pool:

spec:
clusterVariant:
shared:
scalingPolicyRef:
name: my-cpu-policy
namespace: my-project-namespace

Apply the updated ComputePool manifest or patch the existing resource:

kubectl patch computepool my-compute-pool \
--namespace my-project-namespace \
--type merge \
--patch '{"spec":{"clusterVariant":{"dedicated":{"scalingPolicyRef":{"name":"my-cpu-policy","namespace":"my-project-namespace"}}}}}'

After the reference is applied, PaletteAI begins evaluating metrics and the Scaling Policy status reflects the associated Compute Pool.

Validate

  1. Confirm the ScalingPolicy resource exists and that Prometheus is available:

    kubectl get scalingpolicy my-cpu-policy --namespace my-project-namespace
    Example Output
    NAME            PROMETHEUS_AVAILABLE   PROCESSED_COMPUTEPOOLS   AGE
    my-cpu-policy True 1 5m

    The PROMETHEUS_AVAILABLE column indicates whether the controller can reach Prometheus to query metrics. If it shows False, verify that Prometheus is running and accessible from the hub cluster.

  2. Inspect the full status to confirm the associated Compute Pool is listed:

    kubectl describe scalingpolicy my-cpu-policy --namespace my-project-namespace

    Under Status, confirm that the processed pool count is greater than zero and that the name of your Compute Pool appears in the computePools list.

  3. Confirm the ComputePoolEvaluation resource is created for the associated Compute Pool:

    kubectl get computepoolevaluation --namespace my-project-namespace

    A ComputePoolEvaluation resource is created for each Compute Pool that references the Scaling Policy. It records the current scaling decision and target resource counts.

Pre-Defined Scaling Profiles

PaletteAI ships three pre-defined Scaling Policies that you can reference directly or use as a starting point for your own policies.

NameScale-Up ThresholdScale-Down ThresholdScale-Up DurationScale-Down DurationCooldown
aggressive50%15%2m3m10m
balanced75%20%5m8m20m
conservative85%10%10m15m30m

These profiles are installed into the system namespace of the hub cluster during PaletteAI installation and are re-applied during upgrades. Like all resources in the system namespace, they are managed exclusively by the platform and cannot be modified by users. Reference them in a Compute Pool the same way you would reference a custom Scaling Policy. To customize thresholds or bounds, clone a profile into your project namespace and edit the clone, then reference the clone from your Compute Pool.

tip

Use aggressive for latency-sensitive workloads that need fast scale-up response, balanced for general-purpose workloads, and conservative for stable workloads where over-provisioning is costly.

Update a Scaling Policy

You can update a Scaling Policy you created or cloned in your project namespace at any time. Changes take effect on the next reconciliation cycle. Pre-defined policies (aggressive, balanced, conservative) in the system namespace cannot be modified by users. To change their behavior, clone one to your project namespace and update the clone.

Update the manifest and re-apply it:

kubectl apply --filename cpu-scaling-policy.yaml

Or patch a specific field directly:

kubectl patch scalingpolicy my-cpu-policy \
--namespace my-project-namespace \
--type merge \
--patch '{"spec":{"cooldownDuration":"20m"}}'
info

Updating a Scaling Policy does not interrupt active scaling operations. The new configuration applies after the current scaling action completes.

Delete a Scaling Policy

Before you delete a Scaling Policy, remove the scalingPolicyRef from all Compute Pools that reference it. The webhook prevents deletion of a Scaling Policy that is still referenced by active Compute Pools.

  1. Remove the scalingPolicyRef from each referencing Compute Pool:

    kubectl patch computepool my-compute-pool \
    --namespace my-project-namespace \
    --type merge \
    --patch '{"spec":{"clusterVariant":{"dedicated":{"scalingPolicyRef":null}}}}'
  2. Delete the Scaling Policy:

    kubectl delete scalingpolicy my-cpu-policy --namespace my-project-namespace

    If any active Compute Pool still references the policy, the deletion is rejected with an error listing the referencing resources. Remove the remaining references and retry.

Next Steps