Troubleshooting Tenants
This page provides troubleshooting guidance for common issues when creating and managing Tenants.
Tenant Not Ready
Symptom: Tenant Ready is False.
Possible causes:
-
The referenced Settings resource does not exist or is not ready.
-
PaletteAI cannot create the controller-created Tenant namespace (
tenant-<tenant-name>). -
GPU usage exceeds limits.
Resolution:
-
View Tenant conditions.
kubectl get tenant <tenant-name> --output jsonpath='{.status.conditions}' | jq -
If
SettingsConfiguredisFalse, verify the Settings resource.Check the Settings resource in the namespace referenced by
.spec.settingsRef.namespace(typically your user namespace, nottenant-<name>).# First, get the Settings namespace from the Tenant spec
kubectl get tenant <tenant-name> --output jsonpath='{.spec.settingsRef.namespace}'
# Then check the Settings resource
kubectl get settings <settings-name> --namespace <settings-namespace>
kubectl describe settings <settings-name> --namespace <settings-namespace> -
If
TenantNamespaceCreatedisFalse, verify the controller-created namespace and review events.# Check the controller-created namespace (tenant-<name>)
kubectl get namespace tenant-<tenant-name>
kubectl describe tenant <tenant-name>Debug namespace creation failures:
First, check recent events to display the actual denial reason:
# If the namespace exists but Tenant is not Ready
kubectl get events --namespace tenant-<tenant-name> --sort-by=.lastTimestamp | tail -20
# If namespace does not exist, check controller logs
# First, find the tenant controller pod (label selectors may vary by installation)
kubectl get pods --namespace mural-system
# Then inspect logs (example using common label selector)
kubectl logs --namespace mural-system --selector app=hue-controller --tail=100 | grep --ignore-case "tenant.*<tenant-name>"infoThe exact pod label selector may differ in your installation. Use
kubectl get pods --namespace mural-systemto identify the tenant controller pod and inspect its logs directly.Common causes for namespace creation failure:
-
Namespace already exists with conflicting ownership.
-
Pod Security Admission (PSA) policies blocking namespace creation. Check for
pod-security.kubernetes.iolabels. -
Admission webhooks rejecting the namespace. Look for webhook names in events.
-
Finalizers may be blocking deletion. Check for finalizers with the following command:
kubectl get namespace tenant-<tenant-name> --output yaml | grep finalizers
-
-
If
TenantOversubscribedisTrue, reduce GPU usage.kubectl get tenant <tenant-name> --output jsonpath='{.status.gpuUsage}' | jq
Cannot Delete Tenant
Symptom: The Tenant delete request fails due to child Projects.
Cause: PaletteAI blocks deletion of Tenants with child Projects.
Resolution:
-
List the Projects under the Tenant.
kubectl get projects --all-namespaces --selector palette.ai/tenant-name=<tenant-name> -
Delete the Projects.
kubectl delete project <project-name> --namespace <project-namespace> -
Verify the child Project count is
0.kubectl get tenant <tenant-name> --output jsonpath='{.status.childProjectCount}' -
Delete the Tenant.
kubectl delete tenant <tenant-name>
GPU Quota Exceeded
Symptom: App Deployments or ComputePools remain pending due to GPU requests.
Cause: Total GPU usage across all Projects exceeds Tenant limits.
Resolution:
-
Check current GPU usage.
kubectl get tenant <tenant-name> --output jsonpath='{.status.gpuUsage}' | jq -
Check Tenant GPU limits.
kubectl get tenant <tenant-name> --output jsonpath='{.spec.gpuResources}' | jq -
Reduce GPU usage or increase Tenant limits.
To increase limits, update the Tenant manifest and apply it.
spec:
gpuResources:
limits:
'NVIDIA-A100': 128kubectl apply --filename tenant.yaml
Tenant Admin Groups Do Not Work
Symptom: Users in Tenant admin groups do not have expected permissions in Project namespaces.
Possible causes:
-
OIDC groups do not match your identity provider configuration.
-
PaletteAI did not create RBAC resources.
-
Dex configuration does not match (static users).
Resolution:
-
Verify Dex static user configuration (if applicable).
kubectl get configmap dex --namespace mural-system --output yaml | grep --after-context=10 staticClients -
Verify RBAC resources exist in the Project namespace.
kubectl get roles --namespace <project-namespace> | grep tnt-adm
kubectl describe role prj-<project-name>-tnt-adm --namespace <project-namespace> -
Verify the RoleBinding includes the expected groups.
kubectl get rolebinding --namespace <project-namespace> | grep tnt-adm
kubectl describe rolebinding <rolebinding-name> --namespace <project-namespace> -
Verify group membership using impersonation (if OIDC is configured).
# Test as a user in the tenant admin group
kubectl auth can-i create projects --as=<username> --as-group=<tenant-admin-group>
# Expected output: yesIf the above test succeeds but real user login fails, the issue is likely an identity provider misconfiguration:
-
The user's token does not include the expected group claim. Verify IdP group claim mapping.
-
The Kubernetes API server is not configured to pass through group claims. Verify OIDC configuration in API server flags.
-
Compare
--as-grouptest results with the actual groups returned in a user's token payload.
To diagnose, decode a user's JWT token and verify the groups claim matches your
tenantRoleMapping.groupsconfiguration. -
-
If RBAC resources are missing, delete and recreate the Project.
WarningDeleting and recreating a Project will remove all workloads, deployments, and resources in that Project's namespace. Before proceeding:
-
Back up any important data.
-
Document deployed workloads.
-
Ensure you can recreate the Project configuration.
-
Verify no production workloads are running.
-