Skip to content

GPU Overview

GPU resource requests are handled slightly differently from what was described in the Projects section. In case you need to request GPUs, the first step is to open a ticket to the GPU Platform Consultancy functional element. The consultants will help you decide which of the services better suits your needs.

Services


OpenStack Project with GPU Flavors

This option is identical to the one described in the Projects section, except that GPU flavors will be assigned to your project. You can then launch instances with GPUs. The available flavors are:

Flavor Name GPU RAM vCPUs Disk Ephemeral Comments
g1.xlarge V100 16 GB 4 56 GB 96 GB -
g1.4xlarge V100 (4x) 64 GB 16 80 GB 528 GB -
g2.xlarge T4 16 GB 4 64 GB 192 GB -
g2.5xlarge T4 168 GB 28 160 GB 1200 GB -
g3.xlarge V100S 16 GB 4 64 GB 192 GB -
g3.4xlarge V100S (4x) 64 GB 16 128 GB 896 GB -
g4.p1.40g A100 (1x) 120 GB 16 600 GB - AMD CPUs
g4.p2.40g A100 (2x) 240 GB 32 1200 GB - AMD CPUs
g4.p4.40g A100 (4x) 480 GB 64 2400 GB - AMD CPUs
vg1.xlarge T4 (vGPU) 16 GB 4 64 GB 192 GB Specific configuration here

Note: Adequate GPU drivers have to be installed (detailed here).

Note: Baremetal nodes with GPUs are also possible in certain cases, please open a ticket for these requests.

Policies

GPU resources are a rare resource and expensive resources, and the PCI-passthrough model limits ITs possibilities to monitor their (efficient) usage, which needs to be done on the guests. GPU resoures can be allocated for testing periods of up to 4 months, after which resources are claimed back and a usage report is expected. Longer loan times are possible but require a justification and management approval.

Container Service Clusters

After having GPU resources allocated to your OpenStack project, you can deploy clusters with GPUs by setting a label (explained here).

Batch Service GPU jobs

The Batch service at CERN already allows the submission of GPU jobs (examples here). Batch not only allows to submit jobs in the typical batch system form, but also using docker, singularity and interactive jobs (including running GUI applications).

GitLab (Continuous Integration)

A number of shared runners in CERN GitLab offer GPUs.

Check here for configuration information and examples.

lxplus

The lxplus service offers lxplus-gpu.cern.ch for shared GPU instances - with limited isolation and performance.

VM Configuration


When using GPUs directly in virtual machines you need to handle driver installation and configuration.

Driver Installation

Note: Virtual GPU driver installation is different (see here).

To install NVIDIA drivers, open the CUDA Toolkit Downloads page and select the options related to your system. As an installer type, we recommend choosing the 'network' option. Having selected all options, you will be prompted with a succinct installation instructions box.

As a rule of thumb, you can verify that the drivers have been correctly installed if you can successfully run 'nvidia-smi' in a terminal (Linux) or if you see the GPU model you have assigned in the device manager, under display adapters (Windows).

For more detailed instructions, such as pre- and post-installation actions, see the Installation Guide for Linux or the Installation Guide for Microsoft Windows.

Trouble shooting

Drivers do not find the GPU

Depending on the GOá¹”U used, GPU drivers on CS8 and CS9 guests (maybe others) sometimes fail to initialise the GPU due to PCI bus address space issues. CC7 guests usually work fine. The root cause of this is under investigation. There are two known work arounds:

  • Boot the guest in BIOS mode instead of UEFI. See the documentation how this can be changed.
  • Add the kernel boot options pci=realloc pci=nocrs at boot time.

Virtual GPUs

Note: Running Windows is not supported with the current license.

For the vGPUs to operate at full capacity, licensing is required. For CC7, CS8 and C9 we offer an rpm package which installs the required software takes care of the getting a lease. It should work as well on Redhat Enterprise server.

For puppet managed machines, simply include

include gpu

For other operating systems or non-centrally managed machines please get in touch with the cloud team by opening a support call.

Installing CUDA Toolkit in a vGPU VM

The first step in installing the CUDA Toolkit is to check which is the latest compatible version with vGPU in this table (currently deployed vGPU software release: 13.0). Then, from the downloads archive you can find the corresponding CUDA Toolkit download link.

From the downloads page, pick the runfile installer after selecting your target OS. Not using the runfile can result in deploying an unsupported version of CUDA and overriding the vGPU driver.

During the interactive installation of the runfile, it is important to deselect the driver install option. Alternatively, you can run the installer non-interactively with the following flags:

$ sudo <CudaInstaller>.run --silent --toolkit --samples

Please check the detailed installation steps as there are relevant pre- and post-installation actions (such as installing g++ and altering the PATH environment variable).

To uninstall this runfile type of installation, simply run:

$ cuda-uninstaller

GPU accelerated Docker containers

The NVIDIA Container Toolkit is required to run GPU accelerated Docker containers (not required for Kubernetes). Installation instructions are available here.

Additional resources:


Last update: October 19, 2022