Skip to content

Troubleshooting

Troubleshooting

Kubernetes

Accessing Cluster Nodes

For clusters >=1.17:

ssh core@nodename

For clusters <=1.15:

ssh fedora@nodename

CephFS / Manila share stuck in Pending

There are several possible reasons for this to happen, the logs should be helpful to get more details.

First step, check the manila-provisioner logs:

kubectl -n kube-system logs deployment.apps/manila-provisioner

A common cause is missing quota, which should be clear from the message in the output of the command above.

If there is no obvious cause in the logs above, check the logs of the cephfs plugin corresponding to the Pending Pod. As an example (below we consider hub-87f785cd9-kqdwr, replace with your own):

kubectl describe pod/hub-87f785cd9-kqdwr | grep Node
Node:               cci-jupyterhub-010-5mq46zzli545-minion-8/188.184.93.39

kubectl -n kube-system get pod -o wide | grep cephfs | grep cci-jupyterhub-010-5mq46zzli545-minion-8
csi-cephfsplugin-3psw7                 2/2     Running                 0          33d   188.184.93.39    cci-jupyterhub-010-5mq46zzli545-minion-8    <none>           <none>

kubectl -n kube-system logs csi-cephfsplugin-3psw7 -c driver-registrar
...

kubectl -n kube-system logs csi-cephfsplugin-3psw7 -c csi-cephfsplugin
...

If none of the logs above indicates what the error might be, please open a Service Desk ticket, including all the logs above.

In lxplus you might get a similar error:

kubectl create --validate -f https://gitlab.com/kubernetes/kubernetes/raw/9eaf1aa38f40b1009352a3a5436fdb729b044917/test/fixtures/doc-yaml/user-guide/walkthrough/pod-nginx.yaml
error: error validating "https://gitlab.com/kubernetes/kubernetes/raw/9eaf1aa38f40b1009352a3a5436fdb729b044917/test/fixtures/doc-yaml/user-guide/walkthrough/pod-nginx.yaml": error validating data: link /afs/cern.ch/user/r/rbritoda/.kube/schema/v1.5.2/schema794544141 /afs/cern.ch/user/r/rbritoda/.kube/schema/v1.5.2/api/v1/schema.json: invalid cross-device link; if you choose to ignore these errors, turn validation off with --validate=false

This is due to the lack of support for hard links in AFS, which kubectl requires. To overcome this you can disable validation:

kubectl create --validate=false ...

Or use a different kubernetes schema directory:

mkdir -p /tmp/$USER/.kube/schema
kubectl create --schema-cache-dir=/tmp/$USER/.kube/schema -f ...

Last update: March 12, 2021