Model Deployments

A Model Deployment deploys AI/ML models to a Compute Pool for inference. It abstracts the infrastructure required to host and serve models, simplifying the user experience for data scientists by eliminating direct infrastructure management. It is implemented as an AIWorkload resource with the palette.ai/aiworkload-type: model label.

PaletteAI supports two approaches for deploying models:

Model as a Service (MaaS) — Deploy pre-built models from Hugging Face or NVIDIA NIMs with automatic Profile Bundle selection. PaletteAI matches the model to a Profile Bundle using the Model as a Service Mappings configured in your Project settings. This approach is ideal for demos, experimentation, and research.
Custom Model Deployment — Deploy your own model with a manually selected Profile Bundle and infrastructure configuration. This approach is designed for production workloads or advanced use cases that require hardware-optimized performance or a specific inference framework.

Model as a Service

The Model as a Service (MaaS) workflow streamlines model deployment by automatically selecting the appropriate Profile Bundle based on mappings configured in your Project settings. Each mapping links a model source (Hugging Face or NVIDIA NGC) and optional filters to a specific Profile Bundle.

When you select a model from the catalog, PaletteAI evaluates the configured mappings and selects the Profile Bundle that matches the model source and attributes. This removes the need for manual Profile Bundle selection and ensures that the correct inference engine and infrastructure configuration are applied.

Supported Model Repositories

PaletteAI integrates with two model repositories:

Hugging Face — Browse and deploy models from the Hugging Face Hub. A Hugging Face API token with at least read access must be configured in the Project settings.
NVIDIA NIMs — Browse and deploy NVIDIA Inference Microservice container images from the NGC catalog. An NVIDIA NGC API key for authenticating with the nvcr.io container registry must be configured in the Project settings.

Model integrations and Model as a Service Mappings are configured per Project. Refer to Create and Manage Projects for setup instructions.

Inference Engine Compatibility

PaletteAI uses the palette.ai/inference-engine label on Profile Bundles to validate that the compute infrastructure is compatible with the model being deployed. When deploying a model to an existing Compute Pool, PaletteAI checks that the Application Profile Bundle's inference engine label matches the Compute Pool's Infrastructure Profile Bundle label.

For the Model as a Service workflow, Profile Bundles are auto-selected based on mappings configured in Project Settings. These mappings match model source attributes to Profile Bundle labels, including the inference engine label. Refer to Profile Bundles — Inference Engine Label for more details.

GPU Quotas

Tenants and Projects can enforce GPU quotas to prevent over-allocation. If your model's resource request exceeds the allotted quota, you cannot deploy the model. Refer to GPU Quotas for more information.

Resources

Refer to the following articles to learn more about how Model Deployments interact with other PaletteAI concepts:

Compute Pools - The infrastructure where AI/ML models run
Profile Bundles - Packaged application and infrastructure definitions
Projects - Provide namespace isolation and GPU quotas
Create and Manage Model Deployments - Step-by-step guide for creating Model Deployments

Model as a Service​

Supported Model Repositories​

Inference Engine Compatibility​

GPU Quotas​

Resources​

Model as a Service

Supported Model Repositories

Inference Engine Compatibility

GPU Quotas

Resources