Skip to main content
Version: v1.1.x

Compute

A Compute resource provides a real-time inventory of the machines available for deploying AI/ML applications and models. It connects to Palette using the Palette integration configured in Settings, discovers machines that have been tagged for PaletteAI, and reports which ones are healthy and eligible for cluster deployment. A Compute can reference a Settings resource in the current namespace or fall back to the project's configured Settings if omitted.

When you create an App Deployment, PaletteAI checks the Compute resource to determine whether sufficient machines are available to fulfill the requested resources (GPUs, CPUs, memory) before provisioning a cluster.

tip

To learn how to register your physical or virtual machines as Palette-compatible edge nodes, use either the EdgeForge Workflow (Appliance Mode) or Agent Mode, and register the nodes with Palette.

Compute Status

The Compute resource reports discovered machines in two categories:

  • Control plane candidates - Machines eligible to run Kubernetes control plane components (typically CPU-only machines).
  • Worker candidates - Machines eligible to run AI/ML applications (typically GPU-equipped machines)

The Compute controller reconciles automatically every 30 seconds and updates its status with the available compute resources. Use the following command to check the available machines in a Project.

kubectl get compute <compute-name> --namespace <namespace> --output yaml

Machines are grouped by hardware profile, with per-host network details under edgeHostDetails. The following example shows:

  • 2 control plane candidates with 4 CPUs each (1 available, 1 in-use)
  • 1 worker candidate with 8 NVIDIA H100 GPUs (available)
  • 1 worker candidate with 8 NVIDIA A100 GPUs (unhealthy), assigned to specific resource groups
Example Compute status
status:
controlPlaneCompute:
- architecture: AMD64
cpuCount: 4
instances: 2
machines:
edge-5db0384219cfa0fa4ef97d53bf291b2e: available
edge-228638428bf0078309b65730b24101ee: in-use
workerCompute:
- architecture: AMD64
family: NVIDIA H100
gpuCount: 8
instances: 1
machines:
edge-e078384256765be6e92fc1118aa9f283: available
- architecture: AMD64
family: NVIDIA A100
gpuCount: 8
instances: 1
resourceGroups:
network-pool: '3'
storage-tier: 'high-performance'
machines:
edge-9baf38425dacad857c70ccdbabb48028: unhealthy
edgeHostDetails:
edge-5db0384219cfa0fa4ef97d53bf291b2e:
nics:
- name: ens160
ip: 10.10.142.73
subnet: 255.255.192.0
gateway: 10.10.128.1
dns: [10.10.128.8]
macAddr: 00:50:56:b8:58:61
isDefault: true
resolvedNIC: ens160
hasUsableNetworkValues: true
edge-228638428bf0078309b65730b24101ee:
nics:
- name: ens160
ip: 10.10.133.125
subnet: 255.255.192.0
gateway: 10.10.128.1
dns: [10.10.128.8]
macAddr: 00:50:56:b8:0f:fd
isDefault: true
resolvedNIC: ens160
hasUsableNetworkValues: true
edge-e078384256765be6e92fc1118aa9f283:
nics:
- name: ens160
ip: 10.10.138.209
subnet: 255.255.192.0
gateway: 10.10.128.1
dns: [10.10.128.8]
macAddr: 00:50:56:b8:71:93
isDefault: true
resolvedNIC: ens160
hasUsableNetworkValues: true
edge-9baf38425dacad857c70ccdbabb48028:
nics:
- name: ens160
ip: 10.10.144.12
subnet: 255.255.192.0
gateway: 10.10.128.1
dns: [10.10.128.8]
macAddr: 00:50:56:b8:9a:c2
isDefault: true
resolvedNIC: ens160
hasUsableNetworkValues: true

Machine Discovery

For PaletteAI to discover your Edge hosts, you must add the label palette.ai: true to each node.

You can do so by adding stylus.site.tags.palette.ai: true to your Edge host's user-data file or by adding palette.ai: true to each host through the Palette UI via Edge Host Grid View.

Palette's Edge agent (Stylus) automatically detects and reports the GPU metadata needed by PaletteAI. If the agent cannot detect a value, you can supply it with the tags below. Labels for gpu-family and cpus take precedence over agent-reported values when both are present; labels for gpus and gpu-memory are used only when the agent does not report them.

TagDescriptionExample
gpus: <count>Number of GPUsgpus: 8
cpus: <count>Number of CPUscpus: 6
gpu-memory: <size>GPU memory (M, MB, MiB, G, GB, GiB)gpu-memory: 80G
gpu-family: <family>GPU model familygpu-family: nvidia-a100

Refer to Edge Host Attributes for a full list of Edge host data automatically returned by Palette.

Role Eligibility Tags

By default, PaletteAI uses simple rules to determine machine eligibility:

  • Machines with GPUs - Worker candidates only
  • Machines without GPUs - Control plane candidates only

You can override these defaults by adding the following tags.

TagEffect
palette.ai/control-plane: trueAllows a GPU machine to serve as a control plane node
palette.ai/worker: true Allows a non-GPU machine to serve as a worker node
warning

Do not apply both tags to the same machine. If you do, PaletteAI treats it as a worker only.

Resource Groups

Resource groups let you organize machines for targeted workload placement. Resource groups appear in the Compute status and can be used by Compute Pools to select specific subsets of machines. Refer to our Compute Pool guide for more information.

GPU Optimization for Minimum Worker Requirements

When a Compute Config specifies minWorkerNodes, PaletteAI may need to provision more nodes than the GPU request requires. To avoid wasting GPU resources on filler nodes, PaletteAI uses the following selection order:

  1. Allocate GPU nodes to satisfy the GPU requirement.
  2. Fill remaining slots with machines tagged palette.ai/worker: true (non-GPU workers).
  3. If no non-GPU workers are available, select GPU machines with the lowest GPU count.

For example, you request 8 GPUs with minWorkerNodes: 3. One 8-GPU machine satisfies the GPU requirement. For the remaining two nodes, PaletteAI prefers machines tagged palette.ai/worker: true to avoid allocating additional GPUs unnecessarily.

NodeGPUsTagsRole
gpu-node-18N/AGPU workload
cpu-node-10palette.ai/worker: trueGeneral worker
cpu-node-20palette.ai/worker: trueGeneral worker

Resources

Refer to the following articles to learn more about the role Compute plays in PaletteAI:

  • Settings - Provide Palette credentials for machine discovery
  • Compute Config - Define cluster deployment defaults
  • Compute Pool - Group discovered machines into logical cluster pools for App Deployments