Troubleshooting Tenants
This page provides troubleshooting guidance for common issues when creating and managing Tenants.
Tenant Not Ready
Symptom: Tenant READY is False.
Possible causes:
-
The referenced Settings resource does not exist or is not ready.
-
PaletteAI cannot create the controller-created Tenant namespace (
tenant-<tenant-name>). -
GPU usage exceeds limits.
Resolution:
-
View Tenant conditions.
kubectl get tenant <tenant-name> --output jsonpath='{.status.conditions}' | jq -
If
SettingsConfiguredisFalse, verify the Settings resource.Check the Settings resource in the auto-generated namespace referenced by
.spec.settingsRef.namespace(must betenant-<tenant-name>).# First, get the Settings namespace from the Tenant spec
kubectl get tenant <tenant-name> --output jsonpath='{.spec.settingsRef.namespace}'
# Then check the Settings resource
kubectl get settings <settings-name> --namespace <settings-namespace>
kubectl describe settings <settings-name> --namespace <settings-namespace>If the Settings resource shows an
IntegrationsConfiguredcondition withstatus=Falseandreason="IntegrationsNotReady", one or more configured integrations have invalid or missing secrets. The condition message lists which integrations are affected. See Settings Integrations Not Ready for resolution steps. -
If
TenantNamespaceCreatedisFalse, verify the controller-created namespace and review events.# Check the controller-created namespace (tenant-<name>)
kubectl get namespace tenant-<tenant-name>
kubectl describe tenant <tenant-name>Debug namespace creation failures:
First, check recent events to display the actual denial reason:
# If the namespace exists but Tenant is not Ready
kubectl get events --namespace tenant-<tenant-name> --sort-by=.lastTimestamp | tail -20
# If namespace does not exist, check controller logs
# First, find the tenant controller pod (label selectors may vary by installation)
kubectl get pods --namespace mural-system
# Then inspect logs (example using common label selector)
kubectl logs --namespace mural-system --selector app=hue-controller --tail=100 | grep --ignore-case "tenant.*<tenant-name>"infoThe exact pod label selector may differ in your installation. Use
kubectl get pods --namespace mural-systemto identify the tenant controller pod and inspect its logs directly.Common causes for namespace creation failure:
-
Namespace already exists with conflicting ownership.
-
Pod Security Admission (PSA) policies blocking namespace creation. Check for
pod-security.kubernetes.iolabels. -
Admission webhooks rejecting the namespace. Look for webhook names in events.
-
Finalizers may be blocking deletion. Check for finalizers with the following command:
kubectl get namespace tenant-<tenant-name> --output yaml | grep finalizers
-
-
If
TenantOversubscribedisTrue, reduce GPU usage.kubectl get tenant <tenant-name> --output jsonpath='{.status.gpuUsage}' | jq
Cannot Delete Tenant
Symptom: The Tenant delete request fails due to child Projects.
Cause: PaletteAI blocks deletion of Tenants with child Projects.
Resolution:
-
List the Projects under the Tenant.
kubectl get projects --all-namespaces --selector palette.ai/tenant-name=<tenant-name>infoProjects are automatically labeled with
palette.ai/tenant-nameduring creation. If a Project is missing this label, it may indicate a problem with the Project creation process or a Project created with an older version of PaletteAI. -
Delete the Projects.
kubectl delete project <project-name> --namespace <project-namespace> -
Verify the child Project count is
0.kubectl get tenant <tenant-name> --output jsonpath='{.status.childProjectCount}' -
Delete the Tenant.
kubectl delete tenant <tenant-name>
GPU Quota Exceeded
Symptom: App Deployments or ComputePools remain pending due to GPU requests.
Cause: Total GPU usage across all Projects exceeds Tenant limits.
Resolution:
-
Check current GPU usage.
kubectl get tenant <tenant-name> --output jsonpath='{.status.gpuUsage}' | jq -
Check Tenant GPU limits.
kubectl get tenant <tenant-name> --output jsonpath='{.spec.gpuResources}' | jq -
Reduce GPU usage or increase Tenant limits.
To increase limits, update the Tenant manifest and apply it.
spec:
gpuResources:
limits:
'NVIDIA-A100': 128kubectl apply --filename tenant.yaml
Invalid Settings Namespace
Symptom: Tenant creation or update is rejected at admission time with an error message similar to: settingsRef.namespace must be 'tenant-<tenant-name>'.
Cause: PaletteAI's webhook validation enforces that settingsRef.namespace must equal the auto-generated tenant namespace format tenant-<tenant-name>. Tenant creation or update requests are rejected at admission time if the namespace does not match. This validation cannot be overridden.
Resolution:
Update the settingsRef.namespace field in the Tenant manifest to match the auto-generated tenant namespace format: tenant-<tenant-name>, where <tenant-name> is the name of your Tenant resource.
Example:
For a Tenant named my-tenant, the correct configuration is:
spec:
settingsRef:
name: my-settings
namespace: tenant-my-tenant
Apply the corrected manifest:
kubectl apply --filename tenant.yaml
Project Name and Namespace Mismatch
Symptom: Project creation or update is rejected at admission time with an error message: project name must match its namespace: name="<project-name>", namespace="<namespace-name>".
Cause: PaletteAI's webhook validation enforces that a Project's metadata.name must match its metadata.namespace. Project creation or update requests are rejected at admission time if the name and namespace do not match. This validation cannot be overridden.
Resolution:
Update the Project manifest so that metadata.name matches metadata.namespace.
Example:
For a Project in namespace my-project, the correct configuration is:
apiVersion: spectrocloud.com/v1alpha1
kind: Project
metadata:
name: my-project
namespace: my-project
spec:
# ... project spec
Apply the corrected manifest:
kubectl apply --filename project.yaml
Settings Integrations Not Ready
Symptom: Settings resource shows IntegrationsConfigured condition with status=False and reason="IntegrationsNotReady". The condition message lists which integrations have invalid or missing secrets.
Cause: One or more configured integrations (Palette, Hugging Face, or NVIDIA) reference secrets that are missing, have incorrect names, or lack required fields.
Resolution:
-
Check the Settings conditions to identify affected integrations.
kubectl get settings <settings-name> --namespace <settings-namespace> --output jsonpath='{.status.conditions}' | jqThe condition message identifies which integrations are failing. Example output:
{
"type": "IntegrationsConfigured",
"status": "False",
"reason": "IntegrationsNotReady",
"message": "The following integrations have invalid or missing secrets: Hugging Face, NVIDIA"
} -
Verify the referenced secrets exist in the correct namespace.
kubectl get secrets --namespace <settings-namespace> -
For Palette integrations, verify the secret exists and matches the name in the Settings spec.
kubectl get secret <palette-secret-name> --namespace <settings-namespace> -
For Hugging Face integrations, verify the secret exists and contains the required
tokenfield.kubectl get secret <huggingface-secret-name> --namespace <settings-namespace> --output jsonpath='{.data}' | jqThe secret must be an Opaque Kubernetes secret with the field specified in the Settings
spec.integrations.huggingFace.apiKey.key. -
For NVIDIA integrations, verify both the NGC API key and image pull secrets exist.
# Check NGC API key secret
kubectl get secret <ngc-api-key-secret-name> --namespace <settings-namespace> --output jsonpath='{.data}' | jq
# Check NGC image pull secret
kubectl get secret <ngc-image-pull-secret-name> --namespace <settings-namespace>The API key secret must contain the field specified in
spec.integrations.nvidia.ngc.apiKey.key. -
Ensure secret names match exactly what is specified in the Settings resource.
kubectl get settings <settings-name> --namespace <settings-namespace> --output yaml | grep -A 5 integrations -
After correcting the secrets, verify the Settings resource becomes ready.
kubectl get settings <settings-name> --namespace <settings-namespace>The
READYcolumn should showTrueand theIntegrationsConfiguredcondition should havestatus=Truewithreason="IntegrationsValid".
Tenant Admin Groups Do Not Work
Symptom: Users in Tenant admin groups do not have expected permissions in Project namespaces.
Possible causes:
-
OIDC groups do not match your identity provider configuration.
-
PaletteAI did not create RBAC resources.
-
Dex configuration does not match (static users).
Resolution:
-
Verify Dex static user configuration (if applicable).
kubectl get configmap dex --namespace mural-system --output yaml | grep --after-context=10 staticClients -
Verify RBAC resources exist in the Project namespace.
kubectl get roles --namespace <project-namespace> | grep tnt-adm
kubectl describe role prj-<project-name>-tnt-adm --namespace <project-namespace> -
Verify the RoleBinding includes the expected groups.
kubectl get rolebinding --namespace <project-namespace> | grep tnt-adm
kubectl describe rolebinding <rolebinding-name> --namespace <project-namespace> -
Verify group membership using impersonation (if OIDC is configured).
# Test as a user in the tenant admin group
kubectl auth can-i create projects --as=<username> --as-group=<tenant-admin-group>
# Expected output: yesIf the above test succeeds but real user login fails, the issue is likely an identity provider misconfiguration:
-
The user's token does not include the expected group claim. Verify IdP group claim mapping.
-
The Kubernetes API server is not configured to pass through group claims. Verify OIDC configuration in API server flags.
-
Compare
--as-grouptest results with the actual groups returned in a user's token payload.
To diagnose, decode a user's JWT token and verify the groups claim matches your
tenantRoleMapping.groupsconfiguration. -
-
If RBAC resources are missing, delete and recreate the Project.
WarningDeleting and recreating a Project will remove all workloads, deployments, and resources in that Project's namespace. Before proceeding:
-
Back up any important data.
-
Document deployed workloads.
-
Ensure you can recreate the Project configuration.
-
Verify no production workloads are running.
-