Skip to main content

Hub and Spoke Model

PaletteAI uses a hub-spoke architecture to separate the control plane from the data plane. The hub cluster is where you manage and configure applications. Spoke clusters are where your AI/ML applications actually run. This separation allows a single control plane to orchestrate workloads across many environments.

The hub-spoke model solves several challenges for AI/ML applications:

  • Centralized management - Platform teams configure Profile Bundles, Tenants, and Projects in one place. Data scientists deploy App Deployments through a single UI or API. All configuration lives on the hub.
  • Distributed execution - AI/ML applications run where the hardware is located. Each workload is executed on a spoke cluster. Workloads can have a dedicated cluster or share the cluster with other workloads.
  • Resource isolation - Different teams or workloads run on separate spoke clusters, preventing resource contention and providing security boundaries.
  • Flexible scaling - Add spoke clusters as your needs grow without changing your management workflow.

Hub Cluster

The hub cluster runs PaletteAI's control plane. This is where you interact with PaletteAI, whether through the UI, kubectl, or GitOps workflows.

The hub cluster is responsible for running the following components:

The hub cluster can be deployed on any Kubernetes cluster, including in the same cluster as self-hosted Palette. Each PaletteAI installation consists of one hub cluster only.

Spoke Clusters

Spoke clusters are where AI/ML applications run. In PaletteAI, spoke clusters are also known as Compute Pools. Each Compute Pool you create becomes a spoke when it is registered with the hub cluster. Typically, you do not interact with spoke clusters; you manage your applications through PaletteAI (hub cluster), and the spokes fetch the updated configurations from the hub cluster.

Spoke clusters are responsible for running the following components:

  • AI/ML applications and models
  • Flux controllers for managing the lifecycles of AI/ML applications
  • OCM work agents for spoke-hub communication (klusterlets)

While there is only one hub cluster per PaletteAI installation, there is no limit to the number of spoke clusters.

Deployment Patterns

PaletteAI supports two deployment patterns: hub-as-spoke and dedicated spokes. By default, PaletteAI is installed using the hub-as-spoke model.

The following table addresses key differences between the two patterns.

ConsiderationHub-as-SpokeDedicated Spokes
SetupEasier (single cluster)Complex (requires spoke RBAC setup and kubeconfig secrets)
Control plane isolationWorkloads share resources with PaletteAI controllersFull isolation (control plane unaffected by workload behavior)
Workload isolationControl plane failures can affect workloadsWorkloads isolated from hub issues
ScalingLimited by single cluster capacityAdd spoke clusters independently
Security boundariesSingle security domainWorkloads can run in separate accounts, VPCs, or networks
ResourcesWorkloads compete with PaletteAI for CPU, memory, GPUsDedicated resources per spoke
CostLower (single cluster)Higher (additional clusters for spokes)

For production deployments, we recommend using dedicated spoke clusters. You can start with hub-as-spoke for initial setup and add dedicated spokes later without reinstalling PaletteAI.

For platform-specific instructions on configuring dedicated spokes, refer to the appropriate EKS Environment Setup and GKE Spoke Setup guide.

Hub-as-Spoke

In the hub-as-spoke pattern, a single cluster acts as both the control plane (hub) and a workload cluster (spoke). This is the default installation method and is configured with inCluster: true and forceInternalEndpointLookup: true in the fleetConfig section of your Helm values.

warning

fleetConfig.spokes[i].name must be set to hub-as-spoke in order to install PaletteAI using the hub-as-spoke pattern.

fleetConfig:
spokes:
- name: hub-as-spoke
kubeconfig:
inCluster: true
klusterlet:
forceInternalEndpointLookup: true

Dedicated Spokes

In the dedicated spokes pattern, the hub cluster runs only the PaletteAI control plane, while separate spoke clusters run your AI/ML applications and models. Each spoke cluster requires a kubeconfig secret on the hub and appropriate RBAC permissions for the FleetConfig controller to bootstrap the Open Cluster Management (OCM) agent. Refer to the Multi-Cluster Orchestration with OCM section for details.

fleetConfig:
spokes:
- name: spoke-cluster-1
kubeconfig:
inCluster: false
secretReference:
name: spoke-1-kubeconfig
kubeconfigKey: kubeconfig
- name: spoke-cluster-2
kubeconfig:
inCluster: false
secretReference:
name: spoke-2-kubeconfig
kubeconfigKey: kubeconfig

How Workloads Flow from Hub to Spoke

The following process illustrates the process of initiating an AI/ML deployment through PaletteAI (hub cluster) and instantiating it on a spoke cluster. For learning purposes, the following example deploys an AI application on an existing Compute Pool (spoke cluster) in a shared environment.

  1. AIWorkload created - You select a Profile Bundle and Compute Pool on the hub.
  2. WorkloadDeployment generated - PaletteAI combines the Workload Profile (within the Profile Bundle) with an Environment (placement policy).
  3. Placement resolved - The Environment determines which spokes receive the workload.
  4. Workload distributed - OCM sends the Workload to target spokes via ManifestWork resources.

Application Lifecycle with Flux

Once workloads reach spoke clusters, Flux handles their lifecycle.

  1. Resources rendered - The workload controller renders Workload Profiles into Kubernetes manifests and uploads them to an OCI registry.
  2. App deployed - Flux pulls the manifests from the OCI registry and applies them to the cluster.
  3. State monitored - Flux continuously monitors the deployed resources and corrects any drift from the desired state.
  4. Updates performed - When a Profile Bundle or Workload Profile changes, PaletteAI re-renders the manifests. Flux detects the change and updates the deployed resources.

This GitOps approach ensures workloads stay in sync with their definitions and provides automatic recovery from configuration drift. To learn more about how Flux operates in PaletteAI, refer to our OCI Registries guide.

The status then flows from spoke to hub, allowing you to monitor your workloads from the PaletteAI UI without connecting directly to spoke clusters.

For a high-level look at provisioning infrastructure provisioning and deploying workloads, refer to the following guides:

Multi-Cluster Orchestration with OCM

PaletteAI uses Open Cluster Management (OCM) for multi-cluster orchestration. OCM provides the machinery for distributing workloads from hub to spokes.

Key OCM Concepts

ConceptWhat It DoesPaletteAI Usage
ManagedClusterRepresents a registered spokeCreated automatically when Compute Pool is provisioned
ManifestWorkContains resources to deploy on spokeCreated by PaletteAI to distribute Workloads
PlacementSelects which clusters receive workloadsConfigured via Environments
KlusterletAgent running on spokeInstalled automatically on Compute Pools

It is important to note that you do not interact with OCM resources directly. PaletteAI manages them based on your App Deployment and Environment configurations.

Environments and Placement

Environments control how workloads are distributed to spoke clusters. Each environment contains a topology policy that determines which spoke clusters receive the workload and how rollouts are performed.

PaletteAI's default topology (topology-ocm) creates OCM Placement and ManifestWorkReplicaSet resources that handle cluster selection and workload distribution.

For most use cases, you specify the Compute Pool during the App Deployment workflow, and PaletteAI handles the Environment configuration automatically. Advanced users can create custom Environments for complex placement scenarios, or when bringing their own clusters outside of Palette.

ManagedClusterSets and ManagedClusterSetBindings

Open Cluster Management organizes clusters into logical groups called ManagedClusterSets. ManagedClusterSets are bound to namespaces using ManagedClusterSetBindings, enabling fine-grained control over which clusters can be selected by Placements in those namespaces.

Cluster Selection

The cluster selection process works as follows:

  1. A namespaced Placement resource is created. It can only select clusters from ManagedClusterSets bound to that namespace.

  2. If no ManagedClusterSetBinding exists in the Placement's namespace, no clusters can be selected.

  3. If multiple ManagedClusterSetBindings exist in the Placement's namespace, clusters may be selected from the union of all clusters across all bound cluster sets.

  4. The Placement resource can target a specific cluster set, or it can select clusters across all available clusters sets using predicates (label selectors, taints and tolerations, and prioritizers).

ManagedClusterSets in PaletteAI

PaletteAI automatically creates three managed cluster sets and namespaces. The managed cluster sets are automatically bound to these namespaces to enable placements to function out-of-the-box.

Cluster SetBound NamespaceAdditional Details
defaultmanaged-cluster-set-defaultIncludes all ManagedClusters that have not been explicitly assigned a ManagedClusterSet via the cluster.open-cluster-management.io/clusterset label. Note that this label does not provide exclusivity: a managed cluster "assigned" to cluster set A can still be included in cluster set B via a label selector.
globalmanaged-cluster-set-globalAlways includes all ManagedClusters known to the hub.
spokesmanaged-cluster-set-spokesIncludes all ManagedClusters whose fleetconfig.open-cluster-management.io/managedClusterType label is neither hub nor hub-as-spoke. Useful for deployment of foundational workloads to all spoke clusters.

Communication and Security

Hub-spoke communication uses OCM with mutual TLS.

  • Hub-to-spoke - Workload definitions sent via ManifestWork resources
  • Spoke-to-hub - Status updates sent via OCM agent

Spoke clusters pull workload definitions from the hub cluster. The hub never pushes directly into spoke clusters. This pull-based model works well with firewalls and NAT, as spokes only need outbound connectivity to the hub.

Workload data (model weights, training data, inference requests) never passes through the hub cluster. The hub only manages control plane operations; your data stays on the spoke clusters where workloads run.

Refer to our Security page for more information on how security is handled in PaletteAI.