Version: v1.1.x

Model Deployments

A Model Deployment deploys AI/ML models to a Compute Pool for inference. It abstracts the infrastructure required to host and serve models, simplifying the user experience for data scientists by eliminating direct infrastructure management. It is implemented as an AIWorkload resource with the palette.ai/aiworkload-type: model label.

PaletteAI supports two approaches for deploying models:

Model as a Service (MaaS) — Deploy pre-built models from Hugging Face or NVIDIA NIMs with automatic Profile Bundle selection. PaletteAI matches the model to a Profile Bundle using the Model as a Service Mappings configured in your Project settings. This approach is ideal for demos, experimentation, and research.
Custom Model Deployment — Deploy your own model with a manually selected Profile Bundle and infrastructure configuration. This approach is designed for production workloads or advanced use cases that require hardware-optimized performance or a specific inference framework.

Model as a Service

The Model as a Service (MaaS) workflow streamlines model deployment by automatically selecting the appropriate Profile Bundle based on mappings configured in your Project settings. Each mapping links a model source (Hugging Face or NVIDIA NGC) and optional filters to a specific Profile Bundle.

When you select a model from the catalog, PaletteAI evaluates the configured mappings and selects the Profile Bundle that matches the model source and attributes. This removes the need for manual Profile Bundle selection and ensures that the correct inference engine and infrastructure configuration are applied.

Supported Model Repositories

PaletteAI integrates with two model repositories:

Hugging Face — Browse and deploy models from the Hugging Face Hub. A Hugging Face API token with at least read access must be configured in the Project settings.
NVIDIA NIMs — Browse and deploy NVIDIA Inference Microservice container images from the NGC catalog. An NVIDIA NGC API key for authenticating with the nvcr.io container registry must be configured in the Project settings.

Model integrations and Model as a Service Mappings are configured per Project. Refer to Create and Manage Projects for setup instructions.

Inference Engine Compatibility

PaletteAI uses the palette.ai/inference-engine label on Profile Bundles to validate that the compute infrastructure is compatible with the model being deployed. When deploying a model to an existing Compute Pool, PaletteAI checks that the Application Profile Bundle's inference engine label matches the Compute Pool's Infrastructure Profile Bundle label.

For the Model as a Service workflow, Profile Bundles are auto-selected based on mappings configured in Project Settings. These mappings match model source attributes to Profile Bundle labels, including the inference engine label. Refer to Profile Bundles — Inference Engine Label for more details.

GPU Quotas

Tenants and Projects can enforce GPU quotas to prevent over-allocation. If your model's resource request exceeds the allotted quota, you cannot deploy the model. Refer to GPU Quotas for more information.

Update Model Deployments

After a Model as a Service deployment is running, PaletteAI does not treat every model source the same for in-place model changes:

Hugging Face — You cannot change the Hugging Face model or revision from the deployment overview after creation. To use a different model or version, create a new Model Deployment.
NVIDIA NIMs — You can change the NIM container image tag (the deployed version) from the Model Deployment Overview tab. All available tags for the deployed NIM are listed from the NGC catalog; choosing a new tag updates the workload to that container version. For step-by-step instructions for changing a NIM tag, refer to Update the NIM container version.

For both Hugging Face and NIMs, the Deployment tab offers the same functionality: changing the Model Deployment's Profile Bundle version and editing its variables.

Which bundle layers and variables apply to your Model Deployment depends on Profile Bundle type and compute choice, as described under Supported workflows and Fullstack Profile Bundle deployment behavior.

Custom Model Deployments

Custom Model Deployments enable full control over the Profile Bundle, infrastructure configuration, and model. Use this approach when you need hardware-optimized performance, a specific inference framework, or a model not available through the Model as a Service catalog. Refer to Import Your Own Model to Deploy (Custom Model Deployment) for further information.

Resources

Refer to the following articles to learn more about how Model Deployments interact with other PaletteAI concepts:

Compute Pools - The infrastructure where AI/ML models run
Profile Bundles - Packaged application and infrastructure definitions
Projects - Provide namespace isolation and GPU quotas
Create and Manage Model Deployments - Step-by-step guide for creating Model Deployments

Model as a Service​

Supported Model Repositories​

Inference Engine Compatibility​

GPU Quotas​

Update Model Deployments​

Custom Model Deployments​

Resources​