Skip to content

GPU

GPU

Clusters in the CERN cloud container service have built-in support to detect and configure NVIDIA GPUs. Follow these instructions to request access and quota for GPU resources.

This is officially supported for kubernetes cluster >=1.18.x.

Configuration

Clusters with GPU resources should have the following label set:

  • nvidia_gpu_enabled=true

That's it, the cluster deployment will handle the detection and configuration of GPU nodes.

Usage

This is workload dependent, but in general the container image should have the required nvidia drivers available.

In case the image expects the drivers to be available outside - an example of this is the default tensorflow image - you can also bind mount the drivers that are installed in the cluster nodes under /opt/nvidia-driver. Example using a Pod with the tensorflow image:

apiVersion: v1
kind: Pod
metadata:
  name: tf-gpu
spec:
  containers:
    - name: tf
      image: tensorflow/tensorflow:latest-gpu
      command: ["sleep", "inf"]
      resources:
        limits:
          nvidia.com/gpu: 1
      env:
        - name: PATH
          value: "/bin:/usr/bin:/usr/local/bin:/opt/nvidia-driver/bin"
        - name: LD_LIBRARY_PATH
          value: "/opt/nvidia-driver/lib64"
      securityContext:
        privileged: true
        seLinuxOptions:
          type: spc_t
      volumeMounts:
        - name: nvidia-driver
          mountPath: /opt/nvidia-driver
  volumes:
  - name: nvidia-driver
    hostPath:
      path: /opt/nvidia-driver

Check it all works from a python shell:

$ kubectl exec -it tf-gpu bash
root@tf-gpu:/# python
Python 3.6.9 (default, Jul 17 2020, 12:50:27)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.test.is_gpu_available( cuda_only=False, min_cuda_compute_capability=None )
...
True

Last update: July 8, 2021