Hub and Spoke Model
PaletteAI uses a hub-spoke architecture to separate the control plane from the data plane. The hub cluster is where you manage and configure applications. Spoke clusters are where your AI/ML applications actually run. This separation allows a single control plane to orchestrate workloads across many environments.
The hub-spoke model solves several challenges for AI/ML applications:
- Centralized management - Platform teams configure Profile Bundles, Tenants, and Projects in one place. Data scientists deploy App Deployments through a single UI or API. All configuration lives on the hub.
- Distributed execution - AI/ML applications run where the hardware is located. Each workload is executed on a spoke cluster. Workloads can have a dedicated cluster or share the cluster with other workloads.
- Resource isolation - Different teams or workloads run on separate spoke clusters, preventing resource contention and providing security boundaries.
- Flexible scaling - Add spoke clusters as your needs grow without changing your management workflow.
Hub Cluster
The hub cluster runs PaletteAI's control plane. This is where you interact with PaletteAI, whether through the UI, kubectl, or GitOps workflows.
The hub cluster is responsible for running the following components:
- PaletteAI controllers
- PaletteAI UI
- Open Cluster Management (OCM) control plane for multi-cluster orchestration
- All PaletteAI CRDs (Tenants, Projects, Settings, Profile Bundles, AI Workloads)
The hub cluster can be deployed on any Kubernetes cluster, including in the same cluster as self-hosted Palette. Each PaletteAI installation consists of one hub cluster only.
For development or small deployments, the hub cluster can also register itself as a spoke. This allows applications to run on the same cluster that manages them.
Spoke Clusters
Spoke clusters are where AI/ML applications run. In PaletteAI, spoke clusters are also known as Compute Pools. Each Compute Pool you create becomes a spoke when it is registered with the hub cluster. Typically, you do not interact with spoke clusters; you manage your applications through PaletteAI (hub cluster), and the spokes fetch the updated configurations from the hub cluster.
Spoke clusters are responsible for running the following components:
- AI/ML applications (deployed as App Deployments)
- Flux controllers for managing the lifecycles of AI/ML applications
- OCM work agents for spoke-hub communication (klusterlets)
While there is only one hub cluster per PaletteAI installation, there is no limit to the number of spoke clusters.
How Workloads Flow from Hub to Spoke
The following process illustrates the process of initiating an AI/ML deployment through PaletteAI (hub cluster) and instantiating it on a spoke cluster. For learning purposes, the following example deploys an AI application on an existing Compute Pool (spoke cluster) in a shared environment.
- AIWorkload created - You select a Profile Bundle and Compute Pool on the hub.
- WorkloadDeployment generated - PaletteAI combines the Workload Profile (within the Profile Bundle) with an Environment (placement policy).
- Placement resolved - The Environment determines which spokes receive the workload.
- Workload distributed - OCM sends the Workload to target spokes via ManifestWork resources.
Application Lifecycle with Flux
Once workloads reach spoke clusters, Flux handles their lifecycle.
- Resources rendered - The workload controller renders Workload Profiles into Kubernetes manifests and uploads them to an OCI registry.
- App deployed - Flux pulls the manifests from the OCI registry and applies them to the cluster.
- State monitored - Flux continuously monitors the deployed resources and corrects any drift from the desired state.
- Updates performed - When a Profile Bundle or Workload Profile changes, PaletteAI re-renders the manifests. Flux detects the change and updates the deployed resources.
This GitOps approach ensures workloads stay in sync with their definitions and provides automatic recovery from configuration drift. To learn more about how Flux operates in PaletteAI, refer to our OCI Registries guide.
The status then flows from spoke to hub, allowing you to monitor your workloads from the PaletteAI UI without connecting directly to spoke clusters.
For a high-level look at provisioning infrastructure provisioning and deploying workloads, refer to the following guides:
- Compute Pool Provisioning - How clusters are created
- App Deployment - How applications are deployed
Multi-Cluster Orchestration with OCM
PaletteAI uses Open Cluster Management (OCM) for multi-cluster orchestration. OCM provides the machinery for distributing workloads from hub to spokes.
Key OCM Concepts
| Concept | What It Does | PaletteAI Usage |
|---|---|---|
| ManagedCluster | Represents a registered spoke | Created automatically when Compute Pool is provisioned |
| ManifestWork | Contains resources to deploy on spoke | Created by PaletteAI to distribute Workloads |
| Placement | Selects which clusters receive workloads | Configured via Environments |
| Klusterlet | Agent running on spoke | Installed automatically on Compute Pools |
It is important to note that you do not interact with OCM resources directly. PaletteAI manages them based on your App Deployment and Environment configurations.
Environments and Placement
Environments control how workloads are distributed to spoke clusters. Each Environment contains a topology policy that determines:
- Which spoke clusters receive the workload
- How rollouts are performed
PaletteAI's default topology (topology-ocm) creates OCM Placement and ManifestWorkReplicaSet resources that handle cluster selection and workload distribution.
For most use cases, you specify the Compute Pool during the App Deployment workflow, and PaletteAI handles the Environment configuration automatically. Advanced users can create custom Environments for complex placement scenarios.
Communication and Security
Hub-spoke communication uses OCM with mutual TLS.
- Hub-to-spoke - Workload definitions sent via ManifestWork resources
- Spoke-to-hub - Status updates sent via OCM agent
Spoke clusters pull workload definitions from the hub cluster. The hub never pushes directly into spoke clusters. This pull-based model works well with firewalls and NAT, as spokes only need outbound connectivity to the hub.
Workload data (model weights, training data, inference requests) never passes through the hub cluster. The hub only manages control plane operations; your data stays on the spoke clusters where workloads run.
Refer to our Security page for more information on how security is handled in PaletteAI.