GPU
GPU
Warning
This documentation is deprecated, please check here for its new home
Clusters in the CERN cloud container service have built-in support to detect and configure NVIDIA GPUs. Follow these instructions to request access and quota for GPU resources.
This is officially supported for kubernetes cluster >=1.18.x.
Configuration
Clusters with GPU resources should have the following label set:
- nvidia_gpu_enabled=true
That's it, the cluster deployment will handle the detection and configuration of GPU nodes.
Usage
This is workload dependent, but in general the container image should have the required nvidia drivers available.
In case the image expects the drivers to be available outside - an example of this is the default tensorflow image - you can also bind mount the drivers that are installed in the cluster nodes under /opt/nvidia-driver. Example using a Pod with the tensorflow image:
apiVersion: v1
kind: Pod
metadata:
name: tf-gpu
spec:
containers:
- name: tf
image: tensorflow/tensorflow:latest-gpu
command: ["sleep", "inf"]
resources:
limits:
nvidia.com/gpu: 1
env:
- name: PATH
value: "/bin:/usr/bin:/usr/local/bin:/opt/nvidia-driver/bin"
- name: LD_LIBRARY_PATH
value: "/opt/nvidia-driver/lib64"
securityContext:
privileged: true
seLinuxOptions:
type: spc_t
volumeMounts:
- name: nvidia-driver
mountPath: /opt/nvidia-driver
volumes:
- name: nvidia-driver
hostPath:
path: /opt/nvidia-driver
type: DirectoryOrCreate
Check it all works from a python shell:
$ kubectl exec -it tf-gpu bash
root@tf-gpu:/# python
Python 3.6.9 (default, Jul 17 2020, 12:50:27)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.test.is_gpu_available( cuda_only=False, min_cuda_compute_capability=None )
...
True