Skip to main content
Version: v1.1.x

Install PaletteAI on Kubernetes

This guide covers installing PaletteAI on self-managed Kubernetes clusters where you have full control over the API server configuration. The deployment uses the hub-as-spoke pattern with Zot as the Open Container Initiative (OCI) registry. Use this guide if installing PaletteAI on:

  • Self-managed clusters except on AWS EC2 instances (kubeadm, k3s, RKE, etc.)
  • Self-hosted Kubernetes deployments
  • Edge environments
  • Any cluster where you can configure the Kubernetes API server to trust Dex as an OIDC provider
warning

If you are installing PaletteAI on AWS (IaaS or EKS) or GKE, use the dedicated guides instead:

  • AWS IaaS — (Self-managed Kubernetes on EC2) Includes AWS load balancer and Traefik ingress configuration
  • AWS EKS - Includes IRSA configuration, AWS load balancer annotations, and impersonation proxy setup
  • GKE - Uses native GKE Ingress and includes impersonation proxy setup

Prerequisites

  • Use a Kubernetes cluster as the PaletteAI hub.

  • Access to the hub cluster using the built-in Kubernetes cluster-admin ClusterRole.

  • Minimum Kubernetes versions

    Cluster TypeKubernetes Version
    Hub>= 1.31.0
    Spoke>= 1.31.0
  • Minimum resource requests

    Cluster TypeCPUMemoryStorage
    Hub3388m2732 Mi10Gi
    Spoke1216m972 Mi10Gi
  • Ensure the hub cluster can reach the public AWS Elastic Container Registry (ECR) that hosts the mural and mural-crds charts.

  • Access to the hub cluster kubeconfig file.

  • Install Flux controllers on the hub cluster if you plan to use the recommended Flux-managed workflow.

  • Install the following tools on the machine you use to install or upgrade PaletteAI:

    • curl or wget to download the Helm values file.

    • A text editor, such as vi, to edit the Helm values file.

    • kubectl version >= 1.31.0.

    • Helm version >= 3.17.0 if you plan to use the manual Helm workflow instead of the recommended Flux-managed workflow.

  • Configure the hub cluster Kubernetes API server to trust Dex as an identity provider. PaletteAI deploys Dex as part of the installation. This requirement applies only to the hub cluster, not to spoke clusters. For details, refer to Configure Kubernetes API Server to Trust OpenID Connect (OIDC) Provider.

  • Your hub cluster must be able to provision load balancer services. For self-hosted or bare-metal clusters, this requires a load balancer implementation such as MetalLB. For cloud-hosted clusters, ensure the appropriate cloud controller manager is configured.

Enablement

    1. Download the latest Helm chart values file. This example uses curl.

      curl --output values.yaml --silent https://docs.palette-ai.com/resources/assets/hosted/helm/values.yaml
    2. Open the Helm chart values file in a text editor of your choice and complete the following sections. This example uses vi.

      vi values.yaml

    Global

  1. Use the global section to configure overarching settings for the PaletteAI deployment. Review and modify the following values as necessary.

    1. Set global.dns.domain to the primary domain for the deployment. Do not include a protocol. For example, use example.org, not https://example.org.

      global:
      dns:
      domain: 'example.acme.org'
    2. In global.auditLogging.basicAuth, change the default username and password for audit logging. These credentials secure the Alertmanager instance that receives audit events. You reuse them when you configure the Base64-encoded Authorization header in the alertmanager section.

      global:
      auditLogging:
      basicAuth:
      username: '<your-username>'
      password: '<your-password>'

      Refer to Audit Logging to learn more about configuring audit logging, querying audit events, and forwarding logs to long-term storage.

    3. Configure the metrics collection settings. Provide an existing, external Prometheus server that is reachable from the hub cluster and every spoke cluster. Spoke clusters use Prometheus agents to ship metrics to the server via remote_write.

      Set global.metrics.prometheusBaseUrl to the external Prometheus server URL (for example, https://your-external-prometheus:9090). Include only the protocol, host, and port — do not include any API paths.

      global:
      metrics:
      prometheusBaseUrl: 'https://your-external-prometheus:9090'
      timeout: '5s'
      scrapeInterval: '15s'
      agentType: 'prometheus-agent-minimal'
      username: ''
      password: ''

      By default, global.metrics.agentType is set to prometheus-agent-minimal. The minimal agent configuration only collects spoke cluster CPU and GPU utilization metrics. You may change global.metrics.agentType to prometheus-agent to ship all node-exporter and dcgm-exporter metrics from spoke clusters for comprehensive observability.

      If your Prometheus server requires basic authentication, configure the username and password fields. Leave these fields blank if authentication is not required.

      Refer to Configure Prometheus Agent Monitoring for guidance on agent types, Prometheus and Grafana prerequisites, and GPU metrics.

      tip

      If you need to set up a Prometheus server, you may find the Deploy Monitoring Stack guide helpful.

    4. Set global.instanceName to a stable identifier for this PaletteAI installation (for example, an environment or tenant name). It is used to uniquely identify metrics related to this PaletteAI installation.

      global:
      instanceName: 'prod-paletteai-east'

      Refer to Configure Prometheus Agent Monitoring for more detail.

    />

    Complete global configuration section

    Alertmanager

  2. Navigate to the alertmanager section. Update credentials for the alertmanager instance based on the credentials you configured in the global section.

    You must provide a Base64-encoded string for the Authorization header. Use the interactive encoder to generate your Base64-encoded string and copy the value to the clipboard.

    Base64 Encoded String:

    Alternatively, generate the Base64-encoded string using the following command. Replace username and password with the username and password you configured in the global section.

    echo -n "username:password" | base64

    The following example shows the livenessProbe and readinessProbe sections with the Base64-encoded string. Replace <your-base64-encoded-string> with the Base64-encoded string you generated.

    alertmanager:
    livenessProbe:
    httpGet:
    path: /-/healthy
    port: http
    scheme: HTTPS
    httpHeaders:
    - name: Authorization
    value: 'Basic <your-base64-encoded-string>'
    readinessProbe:
    httpGet:
    path: /-/ready
    port: http
    scheme: HTTPS
    httpHeaders:
    - name: Authorization
    value: 'Basic <your-base64-encoded-string>'
    Complete alertmanager configuration section

    For further instructions on accessing audit logs and configuring long-term storage, refer to Audit Logging.

    Canvas

  3. Canvas controls the user interface. Review and modify the following values as necessary.

    1. To configure the ingress for Canvas, set canvas.ingress.enabled to true. Enter your own domain name for canvas.ingress.domain, omitting the HTTP/HTTPS prefix.

      canvas:
      ingress:
      enabled: true
      annotations: {}
      ingressClassName: traefik
      domain: replace.with.your.domain # No HTTP/HTTPS prefix.
      matchAllHosts: false
      tls: []
      paths:
      - path: /ai
      pathType: ImplementationSpecific
      backend:
      service:
      name: canvas
      port:
      number: 2999
    2. Set canvas.enableHTTP to true. This supports TLS termination at the load balancer. canvas.ingress.tls remains empty as a result.
    3. canvas:
      enableHTTP: true
    4. The last portion of the Canvas configuration is the OIDC configuration. If you defer configuring OIDC for Dex, you may do the same for Canvas and configure it later.

      In the canvas.oidc section, enter a unique string for the sessionSecret. For redirectURL, replace <your-domain> with your domain. Do not remove the /ai/callback path.

      canvas:
      oidc:
      sessionSecret: '<your-session-secret>'
      sessionDir: '/app/sessions'
      issuerK8sService: 'https://dex.mural-system.svc.cluster.local:5554/dex'
      skipSSLCertificateVerification: true
      redirectURL: 'https://<your-domain>/ai/callback'

      If you did not configure your Kubernetes cluster to trust Dex as an OIDC provider, then you must configure the canvas.impersonationProxy section to enable user impersonation.

      The example below shows how to configure the local Dex user admin@example.com to be mapped to an example Kubernetes group admin. Refer to our Configure User Impersonation guide to learn more about how to configure user impersonation for OIDC groups and other use cases.

      Example user impersonation setup
      canvas:
      impersonationProxy:
      enabled: true
      userMode: 'passthrough'
      groupsMode: 'map'
      userMap: {}
      groupMap: {}
      dexGroupMap:
      'admin@example.com': [ 'admin' ]
      Complete canvas configuration section

    Dex

  4. Dex authenticates users to PaletteAI through SSO. You can configure Dex to connect to an upstream OIDC provider or to a local user database. In this guide, you will configure Dex to connect to an upstream OIDC provider. If you want to configure an OIDC provider later, you can do so; however, Dex still requires basic configuration.

    1. Set dex.config.issuer to your domain. Do not remove the /dex path.

      dex:
      config:
      issuer: 'https://replace.with.your.domain/dex'
    2. You can defer this step, but we strongly recommend configuring at least one connector during installation. Set dex.config.connectors to the connectors you want to use. The Dex documentation has examples for each of the connectors.

      Below is an example of an OIDC connector that connects to AWS Cognito. The oidc type can be used for any OIDC provider that does not have a native Dex connector. Different OIDC providers may require different configurations.

      Example AWS Cognito configuration
      dex:
      config:
      connectors:
      - type: oidc
      id: aws
      name: AWS Cognito
      config:
      issuer: https://cognito-idp.us-east-1.amazonaws.com/us-east-1_xxxxxx
      clientID: xxxxxxxxxxxxxxx
      clientSecret: xxxxxxxxxxxxxxxxx
      redirectURI: https://replace.with.your.domain/dex/callback # Dex callback URL for the authorization code flow; redirects to the application callback URL
      getUserInfo: true
      userNameKey: email
      insecureSkipEmailVerified: true
      insecureEnableGroups: true
      scopes:
      - openid
      - email
      - profile
      promptType: consent
      claimMapping:
      groups: groups
    3. Proceed to the dex.config.staticClients section. Replace REPLACE_WITH_A_UNIQUE_STRING with a unique string and replace.with.your.domain with your domain. Do not remove the /ai/callback path for the mural client.

      dex:
      config:
      staticClients:
      - id: mural
      redirectURIs:
      - 'https://replace.with.your.domain/ai/callback'
      name: 'mural'
      secret: 'REPLACE_WITH_A_UNIQUE_STRING'
      public: false
      trustedPeers:
      - kubernetes
      - id: kubernetes
      redirectURIs:
      - 'https://replace.with.your.domain'
      name: kubernetes
      secret: 'REPLACE_WITH_A_UNIQUE_STRING'
      public: false
      trustedPeers:
      - mural
    4. Next, configure the dex.config.staticPasswords section in values.yaml. This is the Day 1 customization point for Dex local users and also defines the shipped default admin user. When Dex local users are enabled, Dex creates a default admin user with the credentials admin@example.com / password. We strongly recommend changing these values before installation. The hash value must be a bcrypt hash of the desired password, and the userID can be any unique string. For post-install updates and additional local-user considerations, refer to Local Dex Users.

      warning

      If you did not configure any OIDC connectors, you must configure at least one static user to access the PaletteAI UI. Without user impersonation, local Dex users inherit the same permissions as the Canvas service account. Dex does not support groups for local static users. To map local static users to Kubernetes groups, use the User Impersonation feature.

      dex:
      config:
      staticPasswords:
      - email: 'admin@example.com'
      hash: '$2a$12$Ot2dJ0pmdIC2oXUDW/Ez1OIfhkSzLZIbsumsxkByuU3CUr02DtiC.'
      username: 'admin'
      userID: '08a8684b-db88-4b73-90a9-3cd1661f5466'
    5. Configure the dex.ingress section to expose Dex. For host, replace replace.with.your.domain with your domain. Do not change the path. Set className to traefik (or your cluster’s IngressClass). Because TLS is terminated at the load balancer, the tls section is empty.

      dex:
      ingress:
      enabled: true
      className: 'traefik'
      annotations: {}
      hosts:
      - host: replace.with.your.domain
      paths:
      - path: /dex
      pathType: ImplementationSpecific
      tls: []
      Complete dex configuration section

    Flux2

  5. Set flux2.policies.create to false to disable the Flux network policies. These policies, if enabled, prevent ingress traffic from reaching their target services.

    flux2:
    policies:
    create: false
    info

    This step is not required if the hub and all spoke clusters are configured to use a common, external OCI registry. An external OCI registry is configured in the fleetConfig.spokes[*].ociRegistry and hue.ociRegistry sections of the values.yaml file.

    Complete flux2 configuration section

    Ingress (Traefik)

  6. PaletteAI uses Traefik as the ingress controller. Create Ingress resources with ingressClassName: traefik (or the class your release configures) so traffic is routed correctly.

    When TLS terminates at your load balancer, Traefik should receive plain HTTP on the web entrypoint and trust forwarded client headers (for example X-Forwarded-Proto: https). Enable forwardedHeaders.insecure on the entrypoints that receive that traffic, and set websecure.targetPort: web when HTTPS is handled at the load balancer and the controller only sees HTTP. Add traefik.service annotations that match your cloud load balancer (for example ACM on AWS).

    traefik:
    enabled: true
    ports:
    web:
    forwardedHeaders:
    insecure: true
    websecure:
    targetPort: web
    forwardedHeaders:
    insecure: true

    On AWS (IaaS or EKS), attach your ACM certificate and load balancer annotations on traefik.service as described in the install guide for that platform.

    Helm Install

  7. Install PaletteAI with Flux to let Flux manage chart ordering and the Custom Resource Definition (CRD) lifecycle for both Helm charts.

    1. Create mural-crds-oci-repository.yaml for the mural-crds chart.

      cat << EOF > mural-crds-oci-repository.yaml
      apiVersion: source.toolkit.fluxcd.io/v1
      kind: OCIRepository
      metadata:
      name: mural-crds
      namespace: mural-system
      spec:
      interval: 10m
      ref:
      semver: "0.7.8-hotfix.1"
      url: oci://public.ecr.aws/mural/mural-crds
      EOF
    2. Create mural-oci-repository.yaml for the mural chart.

      cat << EOF > mural-oci-repository.yaml
      apiVersion: source.toolkit.fluxcd.io/v1
      kind: OCIRepository
      metadata:
      name: mural
      namespace: mural-system
      spec:
      interval: 10m
      ref:
      semver: "1.1.2"
      url: oci://public.ecr.aws/mural/mural
      EOF
    3. Apply both OCIRepository resources to your cluster.

      kubectl apply --filename mural-crds-oci-repository.yaml
      kubectl apply --filename mural-oci-repository.yaml
    4. Create mural-crds-helm-release.yaml for the mural-crds chart.

      cat <<'EOF' > mural-crds-helm-release.yaml
      apiVersion: helm.toolkit.fluxcd.io/v2
      kind: HelmRelease
      metadata:
      name: mural-crds
      namespace: mural-system
      spec:
      interval: 10m
      chartRef:
      kind: OCIRepository
      name: mural-crds
      namespace: mural-system
      install:
      crds: Create
      upgrade:
      crds: CreateReplace
      EOF
    5. Create mural-helm-release.yaml for the mural chart. The dependsOn field ensures that Flux installs mural-crds before mural.

      cat <<'EOF' > mural-helm-release.yaml
      apiVersion: helm.toolkit.fluxcd.io/v2
      kind: HelmRelease
      metadata:
      name: mural
      namespace: mural-system
      spec:
      interval: 10m
      chartRef:
      kind: OCIRepository
      name: mural
      namespace: mural-system
      dependsOn:
      - name: mural-crds
      values:
      # Paste the contents of your values.yaml file here.
      EOF
    6. Open mural-helm-release.yaml and replace the placeholder comment under spec.values with the contents of the values.yaml file for your environment. Keep the inserted YAML indented under spec.values.

    7. Apply both HelmRelease resources to your cluster.

      kubectl apply --filename mural-crds-helm-release.yaml
      kubectl apply --filename mural-helm-release.yaml

    Install with Helm

    warning

    If you do not use Flux, manage the mural-crds chart separately from the mural chart. Apply or upgrade Custom Resource Definitions (CRDs) out of band before you install or upgrade the mural chart. For the manual Helm workflow, refer to Upgrade Manually.

    1. Install the mural-crds Helm chart first.

      helm install mural-crds oci://public.ecr.aws/mural/mural-crds --version 0.7.8-hotfix.1 \
      --namespace mural-system --create-namespace --wait
      Example Output
      NAME: mural-crds
      LAST DEPLOYED: Tue May 27 09:34:33 2025
      NAMESPACE: mural-system
      STATUS: deployed
      REVISION: 1
    2. Install PaletteAI from the mural chart by using your environment's values.yaml file.

      helm install mural oci://public.ecr.aws/mural/mural --version 1.1.2 \
      --namespace mural-system --create-namespace --values values.yaml --wait
      Example Output
      NAME: mural
      LAST DEPLOYED: Tue May 27 09:39:48 2025
      NAMESPACE: mural-system
      STATUS: deployed
      REVISION: 1

    DNS

  8. Once PaletteAI is deployed, fetch the external hostname or EXTERNAL-IP of the Traefik LoadBalancer service in mural-system. With the default Helm release name mural, the service is often named mural-traefik.

    kubectl get service --namespace mural-system -l app.kubernetes.io/name=traefik
    Example output
    NAME            TYPE           CLUSTER-IP      EXTERNAL-IP                                                              PORT(S)                      AGE
    mural-traefik LoadBalancer 10.96.x.x a1b2c3d4e5f6g7.elb.us-east-1.amazonaws.com 80:3xxxx/TCP,443:3xxxx/TCP 10m
  9. Create a DNS record pointing your canvas.ingress.domain configured in values.yaml to the external address of the Traefik LoadBalancer service from the previous step. Use an A record for IP addresses or a CNAME/alias record for hostnames, depending on your DNS provider's capabilities.

You have now deployed PaletteAI on your Kubernetes cluster. The cluster trusts Dex as an identity provider. If you configured Dex with an OIDC connector, log in to PaletteAI using your Identity Provider (IdP). Alternatively, if Dex local users are enabled, refer to Local Dex Users for the default admin credentials and customization options.

If you need to make changes to PaletteAI, review the Helm Chart Configuration page. Trigger an upgrade to the PaletteAI installation by updating the values.yaml file with the changes you want and running the following command.

helm upgrade mural oci://public.ecr.aws/mural/mural --version 1.1.2 \
--namespace mural-system --values values.yaml --wait

Validate

Take the following steps to verify that PaletteAI is deployed and configured correctly.

  1. Open a browser and navigate to the domain URL you configured for PaletteAI.

  2. Log in with the default username and password. If you configured Dex with an OIDC connector, log in with your identity provider.

Next Steps

Once PaletteAI is installed on your cluster, you must integrate Palette with PaletteAI using PaletteAI's Settings resource. This resource requires a Palette tenant, project, and API key in order to communicate with Palette and deploy AI/ML applications and models to the appropriate location.

Proceed to the Integrate with Palette guide to learn how to prepare your Palette environment.