Version: v1.1.x

Create and Manage Scaling Policies

This guide describes how to create and manage ScalingPolicy resources using YAML manifests and kubectl. A Scaling Policy defines autoscaling behavior for a Compute Pool by configuring CPU and GPU utilization thresholds, scaling durations, resource bounds, and cooldown periods.

When a Compute Pool references a Scaling Policy, PaletteAI continuously monitors resource utilization and automatically adds or removes nodes to match workload demand.

Prerequisites

Before you create a Scaling Policy, confirm that you have the following resources available.

Access to the hub cluster with permissions to create ScalingPolicy resources in the target namespace
A Compute Pool in Running status to enable autoscaling on
Prometheus running and accessible from the hub cluster. The ScalingPolicy controller queries Prometheus for CPU and GPU utilization metrics. A PrometheusAvailable condition is set on the ScalingPolicy status when the connection is confirmed.

Refer to Configure Prometheus Agent Monitoring to configure global.metrics, spoke-side Prometheus agents, and GPU metric collection prerequisites.

Create a Scaling Policy

A ScalingPolicy is a namespaced resource in the spectrocloud.com/v1alpha1 API group. You must specify at least one of cpu or gpu scaling configurations.

CPU Scaling Policy

Use this configuration when your Compute Pool uses CPU-based workloads.

cpu-scaling-policy.yaml
apiVersion: spectrocloud.com/v1alpha1
kind: ScalingPolicy
metadata:
  name: my-cpu-policy
  namespace: my-project-namespace
spec:
  cpu:
    scaleUpThreshold: 80
    scaleDownThreshold: 20
    scaleUpDuration: 5m
    scaleDownDuration: 10m
  cpuResourceBounds:
    minCPUCount: 4
    maxCPUCount: 64
  cooldownDuration: 15m
  abortDuration: 30m

Apply the manifest to the hub cluster:

kubectl apply --filename cpu-scaling-policy.yaml

GPU Scaling Policy

Use this configuration when your Compute Pool uses GPU-based workloads. You must configure one gpu entry per GPU family. Each GPU family in gpuResourceBounds must have a corresponding entry in gpu.

gpu-scaling-policy.yaml
apiVersion: spectrocloud.com/v1alpha1
kind: ScalingPolicy
metadata:
  name: my-gpu-policy
  namespace: my-project-namespace
spec:
  gpu:
    - family: 'NVIDIA-H100'
      scaleUpThreshold: 85
      scaleDownThreshold: 15
      scaleUpDuration: 3m
      scaleDownDuration: 8m
    - family: 'NVIDIA-A100'
      scaleUpThreshold: 85
      scaleDownThreshold: 15
      scaleUpDuration: 3m
      scaleDownDuration: 8m
  gpuResourceBounds:
    - family: 'NVIDIA-H100'
      minGPUCount: 1
      maxGPUCount: 16
    - family: 'NVIDIA-A100'
      minGPUCount: 1
      maxGPUCount: 8
  cooldownDuration: 15m
  abortDuration: 30m

Apply the manifest to the hub cluster:

kubectl apply --filename gpu-scaling-policy.yaml

Combined CPU and GPU Scaling Policy

You can configure both CPU and GPU scaling in a single policy. This is useful when worker pools in the same Compute Pool contain both CPU-only and GPU nodes.

combined-scaling-policy.yaml
apiVersion: spectrocloud.com/v1alpha1
kind: ScalingPolicy
metadata:
  name: my-combined-policy
  namespace: my-project-namespace
spec:
  cpu:
    scaleUpThreshold: 80
    scaleDownThreshold: 20
    scaleUpDuration: 5m
    scaleDownDuration: 10m
  gpu:
    - family: 'NVIDIA-H100'
      scaleUpThreshold: 85
      scaleDownThreshold: 15
      scaleUpDuration: 3m
      scaleDownDuration: 8m
  cpuResourceBounds:
    minCPUCount: 4
    maxCPUCount: 64
  gpuResourceBounds:
    - family: 'NVIDIA-H100'
      minGPUCount: 1
      maxGPUCount: 16
  cooldownDuration: 15m
  abortDuration: 30m

Apply the manifest to the hub cluster:

kubectl apply --filename combined-scaling-policy.yaml

Scaling Policy Fields

The following tables describe all fields in the ScalingPolicy spec.

`spec`

Field	Type	Default	Required	Description
`cpu`	`CPUScaling`	—	❌	CPU scaling configuration. Required if `gpu` is not specified.
`gpu`	`[]GPUScaling`	—	❌	GPU scaling configuration per GPU family. Required if `cpu` is not specified.
`cpuResourceBounds`	`CPUResourceBounds`	—	❌	Aggregate CPU count bounds across all nodes in the pool.
`gpuResourceBounds`	`[]GPUResourceBounds`	—	❌	Aggregate GPU count bounds per GPU family. Each entry must correspond to a family in `gpu`.
`cooldownDuration`	`duration`	`15m`	❌	Waiting period after a successful scaling action before the next scaling decision.
`abortDuration`	`duration`	`30m`	❌	Timeout for an ongoing scale-up operation. When exceeded, pending nodes that have not reached `Healthy` status are removed; nodes that provisioned successfully are retained. Scale-down operations are never aborted.

`spec.cpu` and `spec.gpu[*]`

Both cpu and each gpu entry use the same set of scaling parameters.

Field	Type	Default	Required	Description
`scaleUpThreshold`	`integer`	`80`	❌	Utilization percentage above which scale-up is triggered. Must be between `1` and `100`. Must be greater than `scaleDownThreshold`.
`scaleDownThreshold`	`integer`	`20`	❌	Utilization percentage below which scale-down is triggered. Must be between `0` and `99`. Must be less than `scaleUpThreshold`.
`scaleUpDuration`	`duration`	`5m`	❌	Duration for which utilization must remain consistently above `scaleUpThreshold` before scale-up is triggered.
`scaleDownDuration`	`duration`	`10m`	❌	Duration for which utilization must remain consistently below `scaleDownThreshold` before scale-down is triggered.

Each gpu entry also requires:

Field	Type	Default	Required	Description
`family`	`string`	—	✅	GPU family identifier (for example, `"NVIDIA-H100"`). The controller matches this value against GPU families present on allocated hosts in the worker pool. Entries for GPU families not present in the pool are silently ignored.

`spec.cpuResourceBounds`

Field	Type	Default	Required	Description
`minCPUCount`	`integer`	—	❌	Minimum aggregate CPU count across all nodes in the pool. Must be at least `1`.
`maxCPUCount`	`integer`	—	❌	Maximum aggregate CPU count across all nodes in the pool. Must be at least `1`. Must be greater than or equal to `minCPUCount`.

`spec.gpuResourceBounds[*]`

Field	Type	Default	Required	Description
`family`	`string`	—	✅	GPU family identifier. Must match a family defined in `spec.gpu`.
`minGPUCount`	`integer`	—	❌	Minimum aggregate GPU count across all nodes for this family. Must be at least `1`.
`maxGPUCount`	`integer`	—	❌	Maximum aggregate GPU count across all nodes for this family. Must be at least `1`. Must be greater than or equal to `minGPUCount`.

Reference a Scaling Policy from a Compute Pool

To enable autoscaling on a Compute Pool, add a scalingPolicyRef to its clusterVariant configuration. Set scalingPolicyRef.namespace to the namespace where the ScalingPolicy is deployed. The Compute Pool and the ScalingPolicy can be in different namespaces.

Dedicated Compute Pool:

spec:
  clusterVariant:
    dedicated:
      scalingPolicyRef:
        name: my-cpu-policy
        namespace: my-project-namespace

Shared Compute Pool:

spec:
  clusterVariant:
    shared:
      scalingPolicyRef:
        name: my-cpu-policy
        namespace: my-project-namespace

Apply the updated ComputePool manifest or patch the existing resource:

kubectl patch computepool my-compute-pool \
  --namespace my-project-namespace \
  --type merge \
  --patch '{"spec":{"clusterVariant":{"dedicated":{"scalingPolicyRef":{"name":"my-cpu-policy","namespace":"my-project-namespace"}}}}}'

After the reference is applied, PaletteAI begins evaluating metrics and the Scaling Policy status reflects the associated Compute Pool.

Validate

Confirm the ScalingPolicy resource exists and that Prometheus is available:
```
kubectl get scalingpolicy my-cpu-policy --namespace my-project-namespace
```
Example Output
```
NAME            PROMETHEUS_AVAILABLE   PROCESSED_COMPUTEPOOLS   AGE
my-cpu-policy   True                   1                        5m
```
The PROMETHEUS_AVAILABLE column indicates whether the controller can reach Prometheus to query metrics. If it shows False, verify that Prometheus is running and accessible from the hub cluster.
Inspect the full status to confirm the associated Compute Pool is listed:
```
kubectl describe scalingpolicy my-cpu-policy --namespace my-project-namespace
```
Under Status, confirm that the processed pool count is greater than zero and that the name of your Compute Pool appears in the computePools list.
Confirm the ComputePoolEvaluation resource is created for the associated Compute Pool:
```
kubectl get computepoolevaluation --namespace my-project-namespace
```
A ComputePoolEvaluation resource is created for each Compute Pool that references the Scaling Policy. It records the current scaling decision and target resource counts. For an example-driven guide to reading this resource, refer to Debug Scaling Decisions with ComputePoolEvaluation.

Debug Scaling Decisions with ComputePoolEvaluation

When a Compute Pool references a Scaling Policy, PaletteAI records its scaling decisions in a ComputePoolEvaluation resource. The resource is a live decision record with two writers: the Scaling Policy controller writes the spec with the observed utilization and target resource counts for each worker pool, and the Compute Pool controller writes status.poolScalingStates as it executes the resulting scaling actions. When autoscaling behaves unexpectedly, inspect this resource first to understand what PaletteAI observed and why it acted, or did not act.

ComputePoolEvaluation Lifecycle

PaletteAI manages ComputePoolEvaluation resources automatically. You do not create, modify, or delete them.

Creation - The Scaling Policy controller creates a ComputePoolEvaluation with the same name and namespace as the Compute Pool once the Compute Pool references the Scaling Policy and utilization metrics for the pool's hosts are available from Prometheus. Until the first metrics arrive, no evaluation exists and the Compute Pool reports a ComputePoolEvaluationAvailable condition with the message Waiting for ComputePoolEvaluation.
Updates - The spec is refreshed on every metrics evaluation cycle. It is a snapshot of the latest decision, not a history of past decisions.
Removal - When you remove the scalingPolicyRef from a Compute Pool, the Compute Pool controller deletes the ComputePoolEvaluation. Because each evaluation is owned by its Compute Pool, it is also garbage collected automatically when the Compute Pool itself is deleted.

Follow a Scale-Up Decision

The following example uses a Compute Pool named my-compute-pool that references a combined CPU and GPU Scaling Policy. The worker pool contains two hosts, each with 32 CPUs and four NVIDIA H100 GPUs. GPU utilization has stayed above the 85 percent scale-up threshold for the full scale-up duration, so PaletteAI has decided to add a host.

List the evaluations in the Project namespace.
```
kubectl get computepoolevaluations --namespace my-project-namespace
```
Example Output
```
NAME              AGE
my-compute-pool   3d2h
```
Each evaluation shares the name and namespace of its Compute Pool. The resource also has the short name cpe, so kubectl get cpe --namespace my-project-namespace returns the same list.

Retrieve the full resource. You can also use kubectl describe computepoolevaluation my-compute-pool --namespace my-project-namespace, which prints the same information in a readable tree. This walkthrough uses the YAML output because the field names match the CRD.

kubectl get computepoolevaluation my-compute-pool --namespace my-project-namespace --output yaml

Example ComputePoolEvaluation
apiVersion: spectrocloud.com/v1alpha1
kind: ComputePoolEvaluation
metadata:
  name: my-compute-pool
  namespace: my-project-namespace
  ownerReferences:
    - apiVersion: spectrocloud.com/v1alpha1
      blockOwnerDeletion: true
      controller: true
      kind: ComputePool
      name: my-compute-pool
      uid: 4f8f3f6e-9c1a-4a7e-b1d2-0c9a6a1f2b3c
spec:
  poolEvaluations:
    ml-cluster/worker-pool-nvidia-amd64-0:
      cpu:
        count: 64
        target: 64
        utilization: 41
      evaluatedAt: '2026-07-03T10:12:04Z'
      gpu:
        'NVIDIA H100 | 80 GB':
          count: 8
          target: 12
          utilization: 92
      nodeWastes:
        - cpuCount: 32
          gpuCount:
            'NVIDIA H100 | 80 GB': 4
          host: host-uid-3
          wasteScore: 19100
        - cpuCount: 32
          gpuCount:
            'NVIDIA H100 | 80 GB': 4
          host: host-uid-4
          wasteScore: 23600
  cpuResourceBounds:
    minCPUCount: 4
    maxCPUCount: 128
  gpuResourceBounds:
    - variant: 'NVIDIA H100 | 80 GB'
      minGPUCount: 1
      maxGPUCount: 16
  cooldownDuration: 15m
  abortDuration: 30m
status:
  metricsLastObservedAt:
    ml-cluster/worker-pool-nvidia-amd64-0:
      host-uid-3: '2026-07-03T10:11:58Z'
      host-uid-4: '2026-07-03T10:11:58Z'
  poolScalingStates:
    ml-cluster/worker-pool-nvidia-amd64-0:
      state: Scaling
      details: 'Scaling in progress: 0 completed, 1 in progress, 0 failed. Will abort in 28m30s if not completed'
      actions:
        - type: add
          hostUID: host-uid-5
      scalingStartedAt: '2026-07-03T10:12:05Z'
      stateChangedAt: '2026-07-03T10:12:05Z'
      lastSuccessfulScalingAt: '2026-07-02T16:40:11Z'

Read the decision in spec.poolEvaluations. Entries are keyed by {cluster-name}/{worker-pool-name}, where the worker pool name matches the allocated machine pool name in the Compute Pool status.

Field	Description
`cpu`	CPU metrics and decision for the pool: `utilization` is the average CPU utilization percentage across the pool, `count` is the current total CPU count, and `target` is the total the controller wants.
`gpu`	The same metrics and decision per GPU variant, keyed by variant name, such as `NVIDIA H100 \| 80 GB`.
`nodeWastes`	A waste score for every host in the pool, used to pick removal candidates during scale-down. Unused GPUs weigh much more heavily than unused CPUs, and the host with the highest `wasteScore` is removed first.
`evaluatedAt`	When the metrics evaluation was performed.
`cpuResourceBounds`, `gpuResourceBounds`, `cooldownDuration`, `abortDuration`	Copied from the Scaling Policy at evaluation time. These are the exact bounds and timing parameters that applied to this decision, which is useful when the referenced policy has changed since.

Compare target with count for each resource: target greater than count means a scale-up decision, target less than count means a scale-down decision, and equal values mean no action is needed. In the example, GPU utilization of 92 percent exceeded the scale-up threshold for the full scale-up duration, so the controller raised the GPU target from 8 to 12 — one additional host with four GPUs.

Read the execution state in status.poolScalingStates, keyed by the same {cluster-name}/{worker-pool-name} as the spec. Each entry is a state machine with the following states.

State	Meaning
`Idle`	No scaling operation is in progress. On each evaluation cycle, the controller compares `target` with `count` and starts a scaling operation when they differ.
`Scaling`	Nodes are being added or removed. `actions` lists each `add` or `remove` action with the Edge Host UID, and `details` reports progress counts and the time remaining before a scale-up is aborted.
`Cooldown`	The pool waits for `cooldownDuration` before the next scaling decision. Pools enter cooldown after a scaling operation completes, after an abort, after a manual day-2 change to the pool, and when a pool is first evaluated.
`Aborting`	A scale-up exceeded `abortDuration`. Pending hosts that never reached `Healthy` status are removed, while hosts that provisioned successfully are kept. Scale-down operations are never aborted.
`Blocked`	Autoscaling is temporarily blocked by another operation in progress on the Compute Pool.

The remaining status fields add timing and health context:

details - A human-readable explanation of the current state, such as adding 2 node(s), Cooldown in progress, 8m30s remaining, or Abort timeout exceeded after 30m2s (1 completed, 1 in progress, 0 failed).
stateChangedAt, scalingStartedAt, lastSuccessfulScalingAt - When the pool entered the current state, when the in-progress scaling operation started, and when the last scaling operation completed successfully.
metricsLastObservedAt - Per worker pool and per Edge Host UID, the timestamp when utilization metrics for the host were last received from Prometheus. Old timestamps indicate that the host is not reporting metrics.

Confirm that the Compute Pool controller is consuming the evaluation. The Compute Pool reports two related conditions.

kubectl describe computepool my-compute-pool --namespace my-project-namespace

Example Output (Conditions Excerpt)
Conditions:
  Type:     NodeScalingEnabled
  Status:   True
  Reason:   ScalingPolicyConfigured
  Message:  ScalingPolicy is configured for this ComputePool

  Type:     ComputePoolEvaluationAvailable
  Status:   True
  Reason:   ComputePoolEvaluationFound
  Message:  ComputePoolEvaluation my-project-namespace/my-compute-pool found

Use ComputePoolEvaluation When Debugging

Symptom	What to check
No `ComputePoolEvaluation` exists	Check the `ComputePoolEvaluationAvailable` condition on the Compute Pool. Reason `WaitingForComputePoolEvaluation` means no metrics have been evaluated yet, and reason `ScalingPolicyNotFound` means the `scalingPolicyRef` points to a policy that does not exist. Also confirm `PROMETHEUS_AVAILABLE` is `True` on the Scaling Policy.
The pool does not scale despite high or low load	Compare `target` with `count` in `spec.poolEvaluations`. If they are equal, utilization has not strictly crossed the threshold for the full scale-up or scale-down duration, or the resource bounds cap the target. Also check the pool state: `Cooldown` pauses decisions, and `details` shows the remaining cooldown time.
An unexpected host was removed during scale-down	Review `nodeWastes`. The host with the highest `wasteScore` — the most idle capacity, with unused GPUs weighted most heavily — is removed first.
A scaling operation appears stuck	Check `state` and `details` for progress counts. A scale-up is aborted after `abortDuration`, keeping the hosts that provisioned successfully. A scale-down is never aborted and continues until all removals complete.
Scaling reacts slowly or not at all	Check `metricsLastObservedAt`. Stale timestamps mean hosts are not reporting metrics. Refer to Configure Prometheus Agent Monitoring to verify the metrics pipeline.

info

ComputePoolEvaluation resources are read-only from a debugging perspective. Manual edits are overwritten on the next evaluation cycle, and a validating webhook rejects evaluations that do not match an existing Compute Pool.

Pre-Defined Scaling Profiles

PaletteAI ships three pre-defined Scaling Policies that you can reference directly or use as a starting point for your own policies.

Name	Scale-Up Threshold	Scale-Down Threshold	Scale-Up Duration	Scale-Down Duration	Cooldown
`aggressive`	50%	15%	2m	3m	10m
`balanced`	75%	20%	5m	8m	20m
`conservative`	85%	10%	10m	15m	30m

These profiles are installed into the system namespace of the hub cluster during PaletteAI installation and are re-applied during upgrades. Like all resources in the system namespace, they are managed exclusively by the platform and cannot be modified by users. Reference them in a Compute Pool the same way you would reference a custom Scaling Policy. To customize thresholds or bounds, clone a profile into your project namespace and edit the clone, then reference the clone from your Compute Pool.

tip

Use aggressive for latency-sensitive workloads that need fast scale-up response, balanced for general-purpose workloads, and conservative for stable workloads where over-provisioning is costly.

Update a Scaling Policy

You can update a Scaling Policy you created or cloned in your project namespace at any time. Changes take effect on the next reconciliation cycle. Pre-defined policies (aggressive, balanced, conservative) in the system namespace cannot be modified by users. To change their behavior, clone one to your project namespace and update the clone.

Update the manifest and re-apply it:

kubectl apply --filename cpu-scaling-policy.yaml

Or patch a specific field directly:

kubectl patch scalingpolicy my-cpu-policy \
  --namespace my-project-namespace \
  --type merge \
  --patch '{"spec":{"cooldownDuration":"20m"}}'

info

Updating a Scaling Policy does not interrupt active scaling operations. The new configuration applies after the current scaling action completes.

Delete a Scaling Policy

Before you delete a Scaling Policy, remove the scalingPolicyRef from all Compute Pools that reference it. The webhook prevents deletion of a Scaling Policy that is still referenced by active Compute Pools.

Remove the scalingPolicyRef from each referencing Compute Pool:

kubectl patch computepool my-compute-pool \
  --namespace my-project-namespace \
  --type merge \
  --patch '{"spec":{"clusterVariant":{"dedicated":{"scalingPolicyRef":null}}}}'

Delete the Scaling Policy:
```
kubectl delete scalingpolicy my-cpu-policy --namespace my-project-namespace
```
If any active Compute Pool still references the policy, the deletion is rejected with an error listing the referencing resources. Remove the remaining references and retry.

Next Steps

Learn about autoscaling behavior in Compute Pool concepts.
View the full ComputePool configuration reference in Compute Pool Configuration.

Prerequisites​

Create a Scaling Policy​

CPU Scaling Policy​

GPU Scaling Policy​

Combined CPU and GPU Scaling Policy​

Scaling Policy Fields​

spec​

spec.cpu and spec.gpu[*]​

spec.cpuResourceBounds​

spec.gpuResourceBounds[*]​

Reference a Scaling Policy from a Compute Pool​

Validate​

Debug Scaling Decisions with ComputePoolEvaluation​

ComputePoolEvaluation Lifecycle​

Follow a Scale-Up Decision​

Use ComputePoolEvaluation When Debugging​

Pre-Defined Scaling Profiles​

Update a Scaling Policy​

Delete a Scaling Policy​

Next Steps​

Prerequisites

Create a Scaling Policy

CPU Scaling Policy

GPU Scaling Policy

Combined CPU and GPU Scaling Policy

Scaling Policy Fields

`spec`

`spec.cpu` and `spec.gpu[*]`

`spec.cpuResourceBounds`

`spec.gpuResourceBounds[*]`

Reference a Scaling Policy from a Compute Pool

Validate

Debug Scaling Decisions with ComputePoolEvaluation

ComputePoolEvaluation Lifecycle

Follow a Scale-Up Decision

Use ComputePoolEvaluation When Debugging

Pre-Defined Scaling Profiles

Update a Scaling Policy

Delete a Scaling Policy

Next Steps