Compute Pools
A Compute Pool is a group of shared Compute resources used to create Kubernetes clusters where your AI/ML applications run. In the hub-spoke architecture, each Compute Pool becomes a spoke cluster on which applications and models are deployed.
Types of Compute Pools
There are two types of Compute Pools, allowing you to deploy your AI/ML applications and models on the infrastructure that best suits your needs.
| Mode | Description | Palette Required | Use Case |
|---|---|---|---|
| Dedicated | A single cluster used to host a single App Deployment | Yes | - Isolated workloads - Specific hardware needs - Stringent compliance or security requirements |
| Shared | One or multiple clusters that share resources and are used to host multiple App Deployments | Yes | - Resource efficiency for workloads with similar needs - Maximize hardware utilization to reduce costs - Development or staging environments |
Compute Pool Provisioning
You can create shared or dedicated Compute Pools before deploying AI/ML applications. Provisioning Kubernetes clusters in advance reduces the number of steps involved when deploying applications and models by allowing data scientists to select an existing Compute Pool for their workload rather than create a new one.
The following table illustrates the types of Compute Pools you can create in specific workflows.
| Workflow | Dedicated | Shared |
|---|---|---|
| App Deployment | ✅ | ❌ |
| Compute Pool | ✅ | ✅ |
| Model Deployment | ✅ | ❌ |
When you create a dedicated or shared Compute Pool, PaletteAI provisions the underlying Kubernetes infrastructure:
- PaletteAI validates the configuration against available Compute resources.
- PaletteAI requests cluster provisioning from Palette using the Profile Bundle's Cluster Profile.
- Palette provisions the Kubernetes cluster.
- PaletteAI retrieves the cluster's kubeconfig from Palette.
- PaletteAI registers the cluster as an OCM spoke with the hub.
- PaletteAI installs the required controllers (Flux, OCM work agent) on the spoke.
Once complete, the Compute Pool is ready to receive applications using Application or Fullstack Profile Bundles.
To learn how applications are deployed to Compute Pools, refer to our App Deployments guide. For an in-depth look at the complete deployment flow, from Compute Pools to workload provisioning, refer to our Hub and Spoke Model guide.
Worker Pool Names
Each worker pool in a Compute Pool’s nodePoolRequirements.workerPools must include a name field. If you do not provide one, PaletteAI uses a mutating webhook to automatically generate a unique identifier (UUID) during validation. Worker pool names must not exceed 63 characters to comply with Kubernetes label constraints.
Worker pool names provide a stable reference for matching allocated machine pools back to their requirements. This enables accurate pool evaluation when requirements change, such as during scaling operations.
workerPools:
- name: gpu-workers # Automatically generated by the mutating webhook if not provided
gpu:
family: 'NVIDIA A100'
gpuCount: 8
minWorkerNodes: 2
Hardware Capacity and Allocation
PaletteAI tracks hardware resources and machine pool allocations in the Compute Pool CRD's status field. This information helps PaletteAI determine whether a Compute Pool can accept additional workloads and is used for GPU quota enforcement at the Project level.
status.hardwareCapacity- Total resources available across all control plane and worker nodes in the Compute Pool's clusters. This includes CPU count, architecture, memory, and GPU family and count.status.hardwareAllocation- Resources currently allocated to App Deployments running on the Compute Pool. Allocation is summed across all applications deployed to the pool.status.status- Overall health of the Compute Pool. Possible values includeRunning,Failed,Provisioning,Unhealthy,Updating, andDeleting.status.cloudConfigUUID- The UUID of the cloud configuration used to provision the Compute Pool. This is used to update the machine pools of the Compute Pool.status.allocatedMachinePools- Tracks which Edge Hosts are allocated to each machine pool within the Compute Pool. Each entry includes:name: The machine pool name (e.g., "control-plane-pool", "worker-pool-nvidia-amd64-0").nodeType: The type of nodes in this machine pool. Possible values:ControlPlaneOnly: Only control plane nodes.WorkerOnly: Only worker nodes.ControlPlaneAndWorker: Nodes serve as both control plane and worker.
hosts: Map of Edge Host UIDs to their details (architecture, CPU count, memory, GPU count and memory by family, status, and status timestamp). Each host includes:status: Provisioning and health status of the host. Possible values:Initial,Provisioning,Healthy,Unhealthy,Failed,Deleting,Unknown.statusUpdatedAt: Timestamp of the last status change for the host.
labels: Kubernetes labels applied to nodes in this pool (e.g.,["control-plane"],["worker", "gpu-family-nvidia"]).workerPoolRequirementsName: (Worker pools only) The name of the WorkerPool requirement that this machine pool was created from. Used to match allocated pools back to their requirements for scaling operations.
Machine Pool Lifecycle
PaletteAI automatically reconciles machine pools to match the Compute Pool's requirements. When you update pool requirements, PaletteAI classifies the changes and performs the appropriate operations:
- Create - Adds a machine pool for a requirement that does not have an allocated pool.
- Delete - Removes a machine pool that no longer matches any requirement. PaletteAI deletes individual machines first, marking their hosts as
Deleting. Once all machines are confirmed removed, the pool itself is deleted. - Scale - Adds or removes hosts from a pool to match updated requirements. When removing hosts, PaletteAI waits for at least one replacement host to reach
Healthystatus before removing the old hosts, preventing downtime. - Replace - Rebuilds a pool when all hosts are invalid (for example, due to hardware requirement changes). PaletteAI adds replacement hosts while keeping one "bridge" host from the old set. Once a replacement host reaches
Healthystatus, the bridge host is removed.
For single-node clusters (where singleNodeCluster is set to true in the control plane configuration), PaletteAI only manages the control plane machine pool. Worker pool requirements defined in the specification are not used for machine pool operations.
View hardware and machine pool allocation information associated with your Compute Pool with the following command.
kubectl get computepool <pool-name> --namespace <project-namespace> --output yaml
status:
status: Running
hardwareCapacity:
- architecture: AMD64
totalCPU: 32
totalMemory: '128Gi'
gpu:
- family: 'NVIDIA-A100'
gpuCount: 8
hardwareAllocation:
- architecture: AMD64
gpu:
- family: 'NVIDIA-A100'
gpuCount: 4
cloudConfigUUID: cloud-config-uuid-1
allocatedMachinePools:
- name: control-plane-pool
nodeType: ControlPlaneAndWorker
hosts:
host-uid-1:
architecture: AMD64
cpuCount: 16
memoryGB: 64
host-uid-2:
architecture: AMD64
cpuCount: 16
memoryGB: 64
labels:
- control-plane
- name: worker-pool-nvidia-amd64-0
nodeType: WorkerOnly
workerPoolRequirementsName: gpu-workers
hosts:
host-uid-3:
architecture: AMD64
cpuCount: 32
memoryGB: 128
gpuCountByFamily:
NVIDIA-A100: 4
gpuMemoryGBByFamily:
NVIDIA-A100: 160
status: Healthy
statusUpdatedAt: '2024-01-15T10:30:00Z'
host-uid-4:
architecture: AMD64
cpuCount: 32
memoryGB: 128
gpuCountByFamily:
NVIDIA-A100: 4
gpuMemoryGBByFamily:
NVIDIA-A100: 160
status: Healthy
statusUpdatedAt: '2024-01-15T10:32:00Z'
labels:
- worker
- gpu-family-nvidia
aiWorkloadRefs:
- name: training-job-1
namespace: project-a
Autoscaling
Compute Pools support automatic scaling based on CPU and GPU utilization metrics. Autoscaling adjusts the number of machines in response to workload demand, minimizing idle resources and handling demand spikes.
To enable autoscaling, reference a ScalingPolicy resource in the Compute Pool's clusterVariant configuration. Scaling policies define utilization thresholds, scaling durations, resource bounds, and cooldown periods.
spec:
clusterVariant:
dedicated:
scalingPolicyRef:
name: my-scaling-policy
namespace: default
Scaling Behavior
When autoscaling is enabled, PaletteAI continuously monitors resource utilization and makes scaling decisions based on sustained metric values:
- Scale Up - Triggered when the minimum average utilization over the scale-up duration exceeds the scale-up threshold. PaletteAI adds machines to the pool and waits for them to reach
Healthystatus. - Scale Down - Triggered when the maximum average utilization over the scale-down duration falls below the scale-down threshold. PaletteAI removes machines from the pool.
- Cooldown - After a successful scaling operation, the pool enters a cooldown period to allow metrics to stabilize before the next scaling decision.
Scaling is only triggered when utilization strictly crosses the configured thresholds (not when equal to the threshold). This prevents unnecessary scaling operations when utilization is at the boundary.
PaletteAI tracks the status of each scaling action in the ComputePoolEvaluation resource. Host status transitions through Provisioning, Healthy, Unhealthy, Failed, or Deleting states during scaling operations.
If a scale-up operation does not complete within the configured abort duration, PaletteAI aborts the operation by removing pending nodes that have not reached Healthy status. Successfully provisioned nodes are retained, and the pool transitions to a cooldown period. Scale-down operations are not aborted and continue until all node removals complete.
Scaling policies apply to both dedicated and shared Compute Pools. For more information about configuring scaling policies, refer to the ScalingPolicy CRD documentation.
Resource Groups
Resource groups let you restrict which machines a Compute Pool can use. This is useful when you need workloads to run on specific hardware, such as machines in a particular network zone or with high-performance storage.
To use resource groups, tag your machines in Palette with labels that begin with palette.ai.rg/. For example, palette.ai.rg/network-pool: '1' or palette.ai.rg/storage-tier: 'high-performance'. Then specify the same labels in the controlPlaneResourceGroups or workerResourceGroups fields in your ComputePool resource.
spec:
clusterVariant:
controlPlaneResourceGroups:
network-pool: '1'
workerResourceGroups:
storage-tier: 'high-performance'
The key-value pair in palette.ai.rg/<key>: "<value>" assigned to the machine must match the key-value pair defined in the Compute Pool in order for the machine to be added to the Compute Pool.
For single-node clusters (where singleNodeCluster is set to true in the control plane configuration), worker pool host selection uses controlPlaneResourceGroups instead of workerResourceGroups. This ensures that the single node selected for both control plane and worker roles matches the control plane resource group constraints.
Permissions
Compute Pool operations are controlled by role-based access control (RBAC) permissions. Depending on your role, a read-only view may be displayed or certain actions may be disabled.
Update Permissions
The spectrocloud.com/computepools:update permission controls your ability to:
- Edit Compute Pool settings (name, description, annotations, labels, auto-scaling policy)
- Modify the profile bundle associated with the Compute Pool
Users without this permission can view Compute Pool details and configuration but cannot modify settings or profile bundles. When viewing a Compute Pool without update permissions, a read-only view will be displayed. The Save and Discard buttons are not displayed when viewing profile bundles without the required permission.
Delete Permissions
The spectrocloud.com/computepools:delete permission controls your ability to delete Compute Pools. The Delete Compute Pool option is not displayed in the Settings menu for users without this permission.
If an action or button described in the documentation is not displayed, your role does not include the required permission. Contact your administrator to request access.
Resources
Refer to the following articles to learn more about the role Compute Pools play in PaletteAI:
- Compute - Discover hardware used to create Compute Pools
- Compute Config - Configure default cluster settings
- App Deployments - Deploy applications to Compute Pools
- Hub-Spoke Model - Learn how Compute Pools fit into PaletteAI's architecture