Skip to content

Known issues and workarounds

This chapter describes a list of known issues and how to fix them in the CERN cloud.

Rebuild UEFI based VMs (e.g. RHEL9) with non-UEFI (e.g. CC7)

A bug has been found in the OpenStack service that prevents users from rebuilding UEFI based virtual machines with non-UEFI enabled images.

Given the set of public images published in OpenStack and their configuration, the following is a summary of both categories:

  • UEFI images: ALMA8, RHEL8, ALMA9 and RHEL9.
  • non-UEFI images: Windows, CC7 and CernVM.

The error shown is Requested operation is not valid: cannot undefine domain with nvram.

If the machine gets into this state, users should reach out to the Cloud Infrastructure support line.

CC7 machine clock on localtime

Issue

As part of an internal audit on the cloud infrastructure service, we have discovered an issue on some of the instances that have been wrongly configured.

All linux images are configured to rely on UTC for the hypervisor clock, and all those instances were booted from an image that had a bug configuring the clock that set it up to localtime.

If those machines get recreated by one of our campaigns or directly by the end user, they will lose network connectivity through kerberos.

Fix

To address this, we encourage you to follow the next steps to fix the clock:

  • log into the VM
  • adjust the file /etc/adjtime and replace LOCAL by UTC
  • reboot the VM

CC7 instance configured to static network

Issue

As part of the preparations for the recreation campaign, we have identified an inconsistency in the way that some CC7 images configure the network. Some of them rely on the metadata service while the newer ones rely on DHCP.

Fix

In order to find a common and consistent way of configuring the network and following the recommendation from the Linux Team, we would like you to review the network configuration on your VM, you can just run the following commands on your CC7 VM:

# disable cloud-init managing network
cat >> /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg << EOF
network: {config: disabled}
EOF

# initscripts don't like this file to be missing.
cat > /etc/sysconfig/network << EOF
NETWORKING=yes
NOZEROCONF=yes
EOF

# simple eth0 config, again not hard-coded to the build hardware
cat > /etc/sysconfig/network-scripts/ifcfg-eth0 << EOF
NAME="eth0"
DEVICE="eth0"
ONBOOT="yes"
NETBOOT="yes"
IPV6INIT="yes"
BOOTPROTO="dhcp"
TYPE="Ethernet"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
DHCPV6_DUID="llt"
USERCTL="yes"
PEERDNS="yes"
PERSISTENT_DHCLIENT="1"
EOF

Cannot access RHEL/ALMA machines via SSH

RedHat has deprecated DSA starting with their 8 release. Depending how your existing keypair was created, it could be using the deprecated system.

OpenStack supports multiple keypairs, so if you face this situation we recommend the creation of an additional keypair using the RSA system:

lxplus8$ ssh-keygen -t rsa -f ~/.ssh/keypair_rsa
lxplus8$ openstack key create --public-key ~/.ssh/keypair_rsa.pub rsa-keypair
+-------------+-------------------------------------------------+
| Field       | Value                                           |
+-------------+-------------------------------------------------+
| fingerprint | d5:ae:f0:6b:d2:50:48:d1:1a:28:a2:6c:8b:b5:11:18 |
| name        | rsa-keypair                                     |
| user_id     | fernandl                                        |
+-------------+-------------------------------------------------+

Once the key is created, you can spawn new machines specifying this key to be configured for SSH access.

Microsoft Windows

Boot error 0xc0000001 or 0xC00000BB

We have recently found Windows virtual machines running into boot problems with te error codes 0xc0000001 or 0xC00000BB. Common methods to fix the Windows boot process doesn't seem to help fixing these issues.

The root cause for these cases look connected to the use of nested virtualisation (e.g. Hyper-V or Windows Subsystem for Linux) and the CPU model exposed to the virtual machines by the virtualisation layer.

If users experience this issue, they should reach out to the Cloud Infrastructure support line to assist in updating the CPU model of the machines.

Windows VM does not start up after a hypervisor reboot or intervention

Recent Windows versions (specifically Windows 11) require access to a credential called TPM when booting via secure boot. This credential is typically owned by the user who created the machine. When such a VM goes down, e.g. due to an issue on the hypervisor, it can happen that the system cannot access the credentials, and the VM fails to boot.

If you see this, reboot the machine as yourself. This should fix the issue.