Create and Manage Model Deployments

A Model Deployment deploys AI/ML models to a Compute Pool for inference. PaletteAI supports two deployment approaches:

Model as a Service (MaaS) — Quickly deploy pre-built models from Hugging Face or NVIDIA NIMs with automatic Profile Bundle selection. Ideal for demos, experimentation, and research.
Custom Model Deployment — Bring your own model and deploy it with a manually selected Profile Bundle and infrastructure configuration. Designed for production workloads or advanced use cases that require hardware-optimized performance.

This guide describes how to create and delete Model Deployments using both approaches.

Deploy a Model (Model as a Service)

Deploy a pre-built model from Hugging Face or NVIDIA NIMs. PaletteAI automatically selects the appropriate Profile Bundle based on the Model as a Service Mappings configured in your Project settings.

Prerequisites

UI Workflow
YAML Workflow

A Project in Ready status
At least one model integration configured in the Project's Settings:
- Hugging Face — A Hugging Face API token with at least read access, configured in the Project settings. Refer to Create and Manage Projects for setup instructions.
- NVIDIA NGC — An NVIDIA NGC API key for authenticating with the nvcr.io container registry, configured in the Project settings. Refer to Create and Manage Projects for setup instructions.
At least one Model as a Service Mapping configured in the Project's Model Management settings. The mapping links a model source and filters to a Profile Bundle.
One of the following Compute Pool options:
- An existing Compute Pool in Running status, or
- A Profile Bundle of type Infrastructure or Fullstack if you want to create a new Compute Pool during the Model Deployment workflow
A user with Project Editor or Admin permissions

kubectl installed and available in your $PATH.
The KUBECONFIG environment variable set to the path of the PaletteAI hub cluster's kubeconfig file
```
export KUBECONFIG=<kubeconfig-location>
```
A Project namespace for the Project you are deploying to
Integration credentials configured in the Project's Settings:
- Hugging Face — A Kubernetes Secret containing a Hugging Face API token.
- NVIDIA NGC — A Kubernetes Secret containing an NGC API key and an image pull secret for nvcr.io.
A Profile Bundle that matches the model source and any Model as a Service Mappings configured in the Project

Enablement

UI Workflow
YAML Workflow

Log in to PaletteAI, and then open your Project.
In the left main menu, select Model Deployments.
Select Deploy Model.
In the General Info screen, set the name and metadata for your deployment.
- Enter a unique Name. The name must be 3-33 characters, start with a lowercase letter, end with a lowercase letter or number, and contain only lowercase letters, numbers, and hyphens. The name must be unique within the Project.
- (Optional) Add a Description.
- (Optional) Select Add link to attach external URLs (for example, a model registry or documentation link). Each link must start with http:// or https://.
- (Optional) Expand Metadata to add labels and annotations for sorting and filtering.
Select Next.
In the Model Setup screen, select your deployment approach and model.
- Select Model-as-a-Service as the deployment approach.
info
The Model-as-a-Service option is only available when at least one model integration (Hugging Face or NVIDIA NGC) is configured in the Project settings. If no integrations are available, contact your administrator to enable them and configure Model as a Service Mappings.
- Select the Model Repository:
  - Hugging Face — Browse and select models from the Hugging Face Hub.
  - NVIDIA NIMs — Browse and select NVIDIA Inference Microservice container images from the NGC catalog.
- Select the Model name. Select the Select button to open the model selection drawer.
Select a Hugging Face Model

When you select Hugging Face as the model repository, the selection drawer displays models from the Hugging Face Hub.
- Use the search bar to find models by name.
- (Optional) Use the App filter to narrow results by inference framework (for example, Ollama or vLLM).
- (Optional) Use the Sort by menu to order results by Downloads, Likes, Created, or Trending.
- Select a model from the results list.
Select an NVIDIA NIMs Model

When you select NVIDIA NIMs as the model repository, the selection drawer displays available NIM container images from the NGC catalog.
- Use the search bar to find NIMs models by name.
- Select a model from the results. After selecting a model, choose the Version from the available tags.
Select Next.

In the Select Compute Pool screen, choose the compute environment where your model runs.

Field	Description	Required
Compute Pool type	Choose Dedicated for exclusive access to physical resources with no resource contention, or Shared to share resources across multiple workloads.	✅
Select from existing	Select an existing Compute Pool in `Running` status from the selection drawer.	Conditional
Create new	Provision a new Compute Pool as part of this deployment. An additional setup step appears after this screen.	Conditional

If you selected Select from existing, choose a Compute Pool from the selection drawer.
If you selected Create new, you configure the new Compute Pool in the next step.

Select Next.

(Conditional) In the Configure Compute Pool screen, configure the infrastructure for the new Compute Pool.

info
This step appears only when you select Create new in the previous step. If you selected an existing Compute Pool, the wizard skips this step.

The configuration includes control plane pools, worker pools, edge configuration, and deployment settings. These fields are the same as the Compute Pool creation wizard. Refer to Create and Manage Compute Pools for detailed field descriptions and the Compute Pool Configuration Reference for YAML-level details.

Select Next.

In the Configure Variables screen, configure variables for the selected Profile Bundle. The system automatically populates variables based on the model you selected (for example, model name, model repository, or NIM image and tag).

The variables table displays all configurable variables with Name, Value, and Description columns.
Required variables are marked with an asterisk (*) next to the name. Enter or update the Value for each variable.

(Optional) Select the gear icon in the top-right to open the deployment settings drawer.

Field	Description	Required
Namespace	The namespace where workloads are deployed. Defaults to the Project namespace. Must start and end with alphanumeric characters and can only contain lowercase letters, numbers, hyphens, and periods.	✅
Merge variables	Controls how variables with the same name across multiple profiles are handled. When enabled (the default), each variable name appears once and the provided value applies to all profiles. When disabled, each profile source has its own row and values are set per profile.	❌
Labels	Key-value pairs applied to the workload. Expand Metadata to configure.	❌
Annotations	Key-value pairs applied to the workload. Expand Metadata to configure.	❌

Select Confirm.

Select Next.

In the Review screen, review and confirm your deployment configuration. The summary displays an overview of your general information, model selection, Compute Pool, and deployment settings.
- Review your settings. The summary is read-only. To make changes, select a previous step in the left sidebar to navigate back.
- Select Submit to create the Model Deployment.

Create an AIWorkload resource in your Project namespace with the palette.ai/aiworkload-type: model label. The AIWorkload references a Profile Bundle and either an existing Compute Pool or an inline cluster variant definition.

Deploy a Hugging Face model to an existing Compute Pool:

apiVersion: spectrocloud.com/v1alpha1
kind: AIWorkload
metadata:
  name: my-hf-model
  namespace: <project-namespace>
  labels:
    palette.ai/project: '<project-namespace>'
    palette.ai/aiworkload-type: 'model'
    palette.ai/model-deployment-type: 'HF'
spec:
  computePoolRef:
    name: '<existing-computepool-name>'
    namespace: '<project-namespace>'
  profileBundles:
    - name: '<profilebundle-name>'
      namespace: '<project-namespace>'
  workloadDeploymentConfigs:
    - workloadProfileRef:
        name: '<workload-profile-name>'
        namespace: 'mural-system'
      targetWorkload:
        name: '<deployment-name>'
        namespace: '<project-namespace>'
      variables:
        modelName: '<hugging-face-model-name>'
        modelRepo: '<hugging-face-repo>'

Deploy an NVIDIA NIM to an existing Compute Pool:

apiVersion: spectrocloud.com/v1alpha1
kind: AIWorkload
metadata:
  name: my-nim-model
  namespace: <project-namespace>
  labels:
    palette.ai/project: '<project-namespace>'
    palette.ai/aiworkload-type: 'model'
    palette.ai/model-deployment-type: 'NGC'
spec:
  computePoolRef:
    name: '<existing-computepool-name>'
    namespace: '<project-namespace>'
  profileBundles:
    - name: '<profilebundle-name>'
      namespace: '<project-namespace>'
  workloadDeploymentConfigs:
    - workloadProfileRef:
        name: '<workload-profile-name>'
        namespace: 'mural-system'
      targetWorkload:
        name: '<deployment-name>'
        namespace: '<project-namespace>'
      variables:
        ngcApiKey: '<ngc-api-key>'
        nimImage: '<nim-container-image>'
        nimTag: '<nim-image-tag>'

Apply the manifest:

kubectl apply --filename aiworkload.yaml

Key fields:

Field	Description	Required
`metadata.labels["palette.ai/aiworkload-type"]`	Must be `model` to identify this as a Model Deployment.	✅
`metadata.labels["palette.ai/model-deployment-type"]`	The model source. Use `HF` for Hugging Face or `NGC` for NVIDIA NIMs.	✅
`spec.profileBundles`	References to Profile Bundles. At least one required.	✅
`spec.workloadDeploymentConfigs`	One entry per workload to deploy. Each requires a `workloadProfileRef` and `targetWorkload` with name and namespace.	✅
`spec.computePoolRef`	Reference to an existing Compute Pool. Mutually exclusive with `clusterVariant`.	Conditional
`spec.clusterVariant`	Inline Compute Pool definition (dedicated, shared, or imported). Mutually exclusive with `computePoolRef`.	Conditional
`spec.workloadDeploymentConfigs[].variables`	Variable overrides for the workload profile. For Hugging Face models, set `modelName` and `modelRepo`. For NVIDIA NIMs, set `ngcApiKey`, `nimImage`, and `nimTag`.	✅
`spec.hardwareRequests`	CPU, memory, and GPU requirements per architecture. Required for shared Compute Pools.	Conditional

hardwareRequests example (required for shared Compute Pools):

spec:
  hardwareRequests:
    - architecture: AMD64
      totalCPU: 4
      totalMemory: '16 GB'
      gpu:
        - family: 'NVIDIA A100'
          gpuCount: 1

For inline Compute Pool configuration fields (clusterVariant), refer to the Compute Pool Configuration Reference.

Validate

UI Workflow
YAML Workflow

In the left main menu, select Model Deployments.
Confirm that the Model Deployment appears with status Provisioning.
Confirm that the status changes to Running. If you created a new Compute Pool, provisioning includes both the Compute Pool and the model deployment, which may take 10-15 minutes.
If the status remains Provisioning or changes to Failed, select the Model Deployment and review its events for errors.

Check that the AIWorkload status transitions to Running:

kubectl get aiworkload <name> --namespace <project-namespace> --watch

Verify the WorkloadDeploymentCreated and WorkloadDeploymentReady conditions:

kubectl get aiworkload <name> --namespace <project-namespace> --output jsonpath='{.status.conditions}' | jq

If you created a new Compute Pool inline, verify the Compute Pool is also Running:
```
kubectl get computepool --namespace <project-namespace>
```
If the AIWorkload is stuck in Provisioning or shows Failed, check events for errors:
```
kubectl describe aiworkload <name> --namespace <project-namespace>
```

Import Your Own Model to Deploy (Custom Model Deployment)

Deploy a custom model with full control over the Profile Bundle and infrastructure configuration. Use this approach when you need hardware-optimized performance, a specific inference framework, or a model not available through the Model as a Service catalog.

Prerequisites

UI Workflow
YAML Workflow

A Project in Ready status
A Profile Bundle of type Application that defines the model workload to deploy (for example, a Profile Bundle containing a vLLM, Ollama, or custom inference server Workload Profile)
One of the following Compute Pool options:
- An existing Compute Pool in Running status, or
- A Profile Bundle of type Infrastructure or Fullstack if you want to create a new Compute Pool during the Model Deployment workflow
A user with Project Editor or Admin permissions

kubectl installed and available in your $PATH.
The KUBECONFIG environment variable set to the path of the PaletteAI hub cluster's kubeconfig file
```
export KUBECONFIG=<kubeconfig-location>
```
A Project namespace for the Project you are deploying to
One or more Profile Bundles that define the model workload to deploy

Enablement

UI Workflow
YAML Workflow

Log in to PaletteAI, and then open your Project.
In the left main menu, select Model Deployments.
Select Deploy Model.
In the General Info screen, set the name and metadata for your deployment.
- Enter a unique Name. The name must be 3-33 characters, start with a lowercase letter, end with a lowercase letter or number, and contain only lowercase letters, numbers, and hyphens. The name must be unique within the Project.
- (Optional) Add a Description.
- (Optional) Select Add link to attach external URLs. Each link must start with http:// or https://.
- (Optional) Expand Metadata to add labels and annotations for sorting and filtering.
Select Next.
In the Model Setup screen, select your deployment approach.
- Select Custom model deployment as the deployment approach.
Select Next.
In the Select Profile Bundles screen, select the Profile Bundles to apply to this deployment.
- Select Add Profile Bundle. In the selection drawer, choose a Profile Bundle from the table. Select Add Bundle.
- If you selected an Infrastructure Profile Bundle, you must also add an Application Profile Bundle. Select Add Profile Bundle again to add it.
- For each Profile Bundle, select the Version. If the Profile Bundle supports multiple cloud types, select the Cloud Type.
info
If you selected an Infrastructure Profile Bundle without an Application Profile Bundle, the wizard requires you to add one before proceeding. Fullstack Profile Bundles include application layers and do not require a separate Application Profile Bundle.

Select Next.

In the Select Compute Pool screen, choose the compute environment where your model runs.

Field	Description	Required
Compute Pool type	Choose Dedicated for exclusive access to physical resources with no resource contention, or Shared to share resources across multiple workloads.	✅
Select from existing	Select an existing Compute Pool in `Running` status from the selection drawer.	Conditional
Create new	Provision a new Compute Pool as part of this deployment. An additional setup step appears after this screen.	Conditional

If you selected Select from existing, choose a Compute Pool from the selection drawer.
If you selected Create new, you configure the new Compute Pool in the next step.

Select Next.

(Conditional) In the Configure Compute Pool screen, configure the infrastructure for the new Compute Pool.

info
This step appears only when you select Create new in the previous step. If you selected an existing Compute Pool, the wizard skips this step.

The configuration includes control plane pools, worker pools, edge configuration, and deployment settings. These fields are the same as the Compute Pool creation wizard. Refer to Create and Manage Compute Pools for detailed field descriptions and the Compute Pool Configuration Reference for YAML-level details.

Select Next.

In the Configure Variables screen, configure variables for the selected Profile Bundles. Variables are parameters defined by the workload profile, such as namespace names, Helm chart values, model paths, or resource limits.

The variables table displays all configurable variables with Name, Value, and Description columns. When Merge variables is disabled, the source profile name appears as a prefix in the Name column.
Required variables are marked with an asterisk (*) next to the name. Enter or update the Value for each variable.

(Optional) Select the gear icon in the top-right to open the deployment settings drawer.

Field	Description	Required
Namespace	The namespace where workloads are deployed. Defaults to the Project namespace. Must start and end with alphanumeric characters and can only contain lowercase letters, numbers, hyphens, and periods.	✅
Merge variables	Controls how variables with the same name across multiple profiles are handled. When enabled (the default), each variable name appears once and the provided value applies to all profiles. When disabled, each profile source has its own row and values are set per profile.	❌
Labels	Key-value pairs applied to the workload. Expand Metadata to configure.	❌
Annotations	Key-value pairs applied to the workload. Expand Metadata to configure.	❌

Select Confirm.

Select Next.

In the Review screen, review and confirm your deployment configuration. The summary displays an overview of your general information, Profile Bundles, Compute Pool selection, and deployment settings.
- Review your settings. The summary is read-only. To make changes, select a previous step in the left sidebar to navigate back.
- Select Submit to create the Model Deployment.

Create an AIWorkload resource in your Project namespace with the palette.ai/aiworkload-type: model label. The AIWorkload references Profile Bundles and either an existing Compute Pool or an inline cluster variant definition.

Deploy a custom model to an existing Compute Pool:

apiVersion: spectrocloud.com/v1alpha1
kind: AIWorkload
metadata:
  name: my-custom-model
  namespace: <project-namespace>
  labels:
    palette.ai/project: '<project-namespace>'
    palette.ai/aiworkload-type: 'model'
spec:
  computePoolRef:
    name: '<existing-computepool-name>'
    namespace: '<project-namespace>'
  profileBundles:
    - name: '<profilebundle-name>'
      namespace: '<project-namespace>'
  workloadDeploymentConfigs:
    - workloadProfileRef:
        name: '<workload-profile-name>'
        namespace: 'mural-system'
      targetWorkload:
        name: '<deployment-name>'
        namespace: '<project-namespace>'
      variables:
        <variable-name>: '<value>'

Create a new dedicated Compute Pool inline:

apiVersion: spectrocloud.com/v1alpha1
kind: AIWorkload
metadata:
  name: my-custom-model
  namespace: <project-namespace>
  labels:
    palette.ai/project: '<project-namespace>'
    palette.ai/aiworkload-type: 'model'
spec:
  clusterVariant:
    dedicated:
      paletteClusterDeploymentConfig:
        cloudType: 'edge-native'
        deletionPolicy: 'delete'
        nodePoolRequirements:
          controlPlanePool:
            nodeCount: 1
          workerPools:
            - architecture: 'AMD64'
              cpu:
                cpuCount: 4
              memory:
                memory: '16 GB'
              gpu:
                family: 'NVIDIA A100'
                gpuCount: 1
        edge:
          vip: '<vip-address>'
  profileBundles:
    - name: '<profilebundle-name>'
      namespace: '<project-namespace>'
      cloudType: 'edge-native'
  workloadDeploymentConfigs:
    - workloadProfileRef:
        name: '<workload-profile-name>'
        namespace: 'mural-system'
      targetWorkload:
        name: '<deployment-name>'
        namespace: '<project-namespace>'
      variables:
        <variable-name>: '<value>'

Apply the manifest:

kubectl apply --filename aiworkload.yaml

Key fields:

Field	Description	Required
`metadata.labels["palette.ai/aiworkload-type"]`	Must be `model` to identify this as a Model Deployment.	✅
`spec.profileBundles`	References to Profile Bundles. At least one required.	✅
`spec.workloadDeploymentConfigs`	One entry per workload to deploy. Each requires a `workloadProfileRef` and `targetWorkload` with name and namespace.	✅
`spec.computePoolRef`	Reference to an existing Compute Pool. Mutually exclusive with `clusterVariant`.	Conditional
`spec.clusterVariant`	Inline Compute Pool definition (dedicated, shared, or imported). Mutually exclusive with `computePoolRef`.	Conditional
`spec.workloadDeploymentConfigs[].variables`	Variable overrides for the workload profile. Required if the workload profile defines mandatory variables.	❌
`spec.workloadDeploymentConfigs[].syncWithPalette`	Sync variables with Palette cluster profile variables. Default: `false`.	❌
`spec.hardwareRequests`	CPU, memory, and GPU requirements per architecture. Required for shared Compute Pools.	Conditional

workloadProfileRef points to the profile definition in mural-system that describes the workload structure. targetWorkload defines where the resulting workload is deployed (name and namespace in your Project).

For inline Compute Pool configuration fields (clusterVariant), refer to the Compute Pool Configuration Reference.

Validate

UI Workflow
YAML Workflow

In the left main menu, select Model Deployments.
Confirm that the Model Deployment appears with status Provisioning.
Confirm that the status changes to Running. If you created a new Compute Pool, provisioning includes both the Compute Pool and the model deployment, which may take 10-15 minutes.
If the status remains Provisioning or changes to Failed, select the Model Deployment and review its events for errors.

Check that the AIWorkload status transitions to Running:

kubectl get aiworkload <name> --namespace <project-namespace> --watch

Verify the WorkloadDeploymentCreated and WorkloadDeploymentReady conditions:

kubectl get aiworkload <name> --namespace <project-namespace> --output jsonpath='{.status.conditions}' | jq

If you created a new Compute Pool inline, verify the Compute Pool is also Running:
```
kubectl get computepool --namespace <project-namespace>
```
If the AIWorkload is stuck in Provisioning or shows Failed, check events for errors:
```
kubectl describe aiworkload <name> --namespace <project-namespace>
```

Delete Model Deployment

Delete a Model Deployment when you no longer need it. Deleting a Model Deployment removes the deployed workloads. If the Model Deployment created a new Compute Pool with deletionPolicy: "delete", the Compute Pool is also deleted. If the policy is "orphan" or the deployment references an existing Compute Pool, the Compute Pool remains.

danger

Deleting a Model Deployment removes the deployed workloads and cannot be undone. Back up important data before you delete.

Prerequisites

UI Workflow
YAML Workflow

A user with Project Editor or Admin permissions

kubectl installed and available in your $PATH.
The KUBECONFIG environment variable set to the path of the PaletteAI hub cluster's kubeconfig file
```
export KUBECONFIG=<kubeconfig-location>
```

Enablement

UI Workflow
YAML Workflow

Log in to PaletteAI, and then open your Project.
In the left main menu, select Model Deployments.
In the Model Deployment row, select the action menu and then select Delete.
In the confirmation dialog, review the warning that this action cannot be undone.
Select Delete to confirm, or Cancel to keep the Model Deployment.

kubectl delete aiworkload <name> --namespace <project-namespace>

Validate

UI Workflow
YAML Workflow

In the left main menu, select Model Deployments.
Verify the Model Deployment no longer appears in the list.

Confirm the AIWorkload resource is removed.

kubectl get aiworkload --namespace <project-namespace>

If the Model Deployment created a Compute Pool inline with deletionPolicy: "orphan", the Compute Pool remains. Delete it separately if it is no longer needed.
```
kubectl delete computepool <name> --namespace <project-namespace>
```

Next Steps

After you create a Model Deployment, you can Expose Deployments to make your model accessible outside the cluster. Review the Create and Manage Projects guide to configure Hugging Face and NVIDIA NGC integrations, Model as a Service Mappings, and model access control lists. Refer to Create and Manage Compute Pools to prepare infrastructure for future deployments.

Deploy a Model (Model as a Service)​

Prerequisites​

Enablement​

Select a Hugging Face Model​

Select an NVIDIA NIMs Model​

Validate​

Import Your Own Model to Deploy (Custom Model Deployment)​

Prerequisites​

Enablement​

Validate​

Delete Model Deployment​

Prerequisites​

Enablement​

Validate​

Next Steps​

Deploy a Model (Model as a Service)

Prerequisites

Enablement

Select a Hugging Face Model

Select an NVIDIA NIMs Model

Validate

Import Your Own Model to Deploy (Custom Model Deployment)

Prerequisites

Enablement

Validate

Delete Model Deployment

Prerequisites

Enablement

Validate

Next Steps