Version: v1.1.x

Compute

A Compute resource provides a real-time inventory of the machines available for deploying AI/ML applications and models. It connects to Palette using the Palette integration configured in Settings, discovers machines that have been tagged for PaletteAI, and reports which ones are healthy and eligible for cluster deployment. A Compute can reference a Settings resource in the current namespace or fall back to the project's configured Settings if omitted.

When you create an App Deployment, PaletteAI checks the Compute resource to determine whether sufficient machines are available to fulfill the requested resources (GPUs, CPUs, memory) before provisioning a cluster.

tip

To learn how to register your physical or virtual machines as Palette-compatible edge nodes, use either the EdgeForge Workflow (Appliance Mode) or Agent Mode, and register the nodes with Palette.

Compute Status

The Compute resource reports discovered machines in two categories:

Control plane candidates - Machines eligible to run Kubernetes control plane components (typically CPU-only machines).
Worker candidates - Machines eligible to run AI/ML applications (typically GPU-equipped machines)

The Compute controller reconciles automatically every 30 seconds and updates its status with the available compute resources. Use the following command to check the available machines in a Project.

kubectl get compute <compute-name> --namespace <namespace> --output yaml

Machines are grouped by hardware profile, with per-host network details under edgeHostDetails. The following example shows:

2 control plane candidates with 4 CPUs each (1 available, 1 in-use)
1 worker candidate with 8 NVIDIA H100 GPUs (available)
1 worker candidate with 8 NVIDIA A100 GPUs (unhealthy), assigned to specific resource groups

Example Compute status
status:
  controlPlaneCompute:
    - architecture: AMD64
      cpuCount: 4
      instances: 2
      machines:
        edge-5db0384219cfa0fa4ef97d53bf291b2e: available
        edge-228638428bf0078309b65730b24101ee: in-use
  workerCompute:
    - architecture: AMD64
      family: NVIDIA H100
      gpuCount: 8
      instances: 1
      machines:
        edge-e078384256765be6e92fc1118aa9f283: available
    - architecture: AMD64
      family: NVIDIA A100
      gpuCount: 8
      instances: 1
      resourceGroups:
        network-pool: '3'
        storage-tier: 'high-performance'
      machines:
        edge-9baf38425dacad857c70ccdbabb48028: unhealthy
  edgeHostDetails:
    edge-5db0384219cfa0fa4ef97d53bf291b2e:
      nics:
        - name: ens160
          ip: 10.10.142.73
          subnet: 255.255.192.0
          gateway: 10.10.128.1
          dns: [10.10.128.8]
          macAddr: 00:50:56:b8:58:61
          isDefault: true
      resolvedNIC: ens160
      hasUsableNetworkValues: true
    edge-228638428bf0078309b65730b24101ee:
      nics:
        - name: ens160
          ip: 10.10.133.125
          subnet: 255.255.192.0
          gateway: 10.10.128.1
          dns: [10.10.128.8]
          macAddr: 00:50:56:b8:0f:fd
          isDefault: true
      resolvedNIC: ens160
      hasUsableNetworkValues: true
    edge-e078384256765be6e92fc1118aa9f283:
      nics:
        - name: ens160
          ip: 10.10.138.209
          subnet: 255.255.192.0
          gateway: 10.10.128.1
          dns: [10.10.128.8]
          macAddr: 00:50:56:b8:71:93
          isDefault: true
      resolvedNIC: ens160
      hasUsableNetworkValues: true
    edge-9baf38425dacad857c70ccdbabb48028:
      nics:
        - name: ens160
          ip: 10.10.144.12
          subnet: 255.255.192.0
          gateway: 10.10.128.1
          dns: [10.10.128.8]
          macAddr: 00:50:56:b8:9a:c2
          isDefault: true
      resolvedNIC: ens160
      hasUsableNetworkValues: true

Machine Discovery

For PaletteAI to discover your Edge hosts, you must add the label palette.ai: true to each node.

You can do so by adding stylus.site.tags.palette.ai: true to your Edge host's user-data file or by adding palette.ai: true to each host through the Palette UI via Edge Host Grid View.

Palette's Edge agent (Stylus) automatically detects and reports the GPU metadata needed by PaletteAI. If the agent cannot detect a value, you can supply it with the tags below. Labels for gpu-family and cpus take precedence over agent-reported values when both are present; labels for gpus and gpu-memory are used only when the agent does not report them.

Tag	Description	Example
`gpus: <count>`	Number of GPUs	`gpus: 8`
`cpus: <count>`	Number of CPUs	`cpus: 6`
`gpu-memory: <size>`	GPU memory (M, MB, MiB, G, GB, GiB)	`gpu-memory: 80G`
`gpu-family: <family>`	GPU model family	`gpu-family: nvidia-a100`

Refer to Edge Host Attributes for a full list of Edge host data automatically returned by Palette.

Role Eligibility Tags

By default, PaletteAI uses simple rules to determine machine eligibility:

Machines with GPUs - Worker candidates only
Machines without GPUs - Control plane candidates only

You can override these defaults by adding the following tags.

Tag	Effect
`palette.ai/control-plane: true`	Allows a GPU machine to serve as a control plane node
`palette.ai/worker: true`	Allows a non-GPU machine to serve as a worker node

warning

Do not apply both tags to the same machine. If you do, PaletteAI treats it as a worker only.

Resource Groups

Resource groups let you organize machines for targeted workload placement. Resource groups appear in the Compute status and can be used by Compute Pools to select specific subsets of machines. Refer to our Compute Pool guide for more information.

GPU Optimization for Minimum Worker Requirements

When a Compute Config specifies minWorkerNodes, PaletteAI may need to provision more nodes than the GPU request requires. To avoid wasting GPU resources on filler nodes, PaletteAI uses the following selection order:

Allocate GPU nodes to satisfy the GPU requirement.
Fill remaining slots with machines tagged palette.ai/worker: true (non-GPU workers).
If no non-GPU workers are available, select GPU machines with the lowest GPU count.

For example, you request 8 GPUs with minWorkerNodes: 3. One 8-GPU machine satisfies the GPU requirement. For the remaining two nodes, PaletteAI prefers machines tagged palette.ai/worker: true to avoid allocating additional GPUs unnecessarily.

Node	GPUs	Tags	Role
`gpu-node-1`	8	N/A	GPU workload
`cpu-node-1`	0	`palette.ai/worker: true`	General worker
`cpu-node-2`	0	`palette.ai/worker: true`	General worker

Resources

Refer to the following articles to learn more about the role Compute plays in PaletteAI:

Settings - Provide Palette credentials for machine discovery
Compute Config - Define cluster deployment defaults
Compute Pool - Group discovered machines into logical cluster pools for App Deployments

Compute Status​

Machine Discovery​

Role Eligibility Tags​

Resource Groups​

GPU Optimization for Minimum Worker Requirements​

Resources​