Known issues and workarounds
This chapter describes a list of known issues and how to fix them in the CERN cloud.
Rebuild UEFI based VMs (e.g. RHEL9) with non-UEFI (e.g. CC7)
A bug has been found in the OpenStack service that prevents users from rebuilding UEFI based virtual machines with non-UEFI enabled images.
Given the set of public images published in OpenStack and their configuration, the following is a summary of both categories:
- UEFI images:
ALMA8
,RHEL8
,ALMA9
andRHEL9
. - non-UEFI images:
Windows
,CC7
andCernVM
.
The error shown is Requested operation is not valid: cannot undefine domain with nvram
.
If the machine gets into this state, users should reach out to the Cloud Infrastructure support line.
CC7 machine clock on localtime
Issue
As part of an internal audit on the cloud infrastructure service, we have discovered an issue on some of the instances that have been wrongly configured.
All linux images are configured to rely on UTC for the hypervisor clock, and all those instances were booted from an image that had a bug configuring the clock that set it up to localtime.
If those machines get recreated by one of our campaigns or directly by the end user, they will lose network connectivity through kerberos.
Fix
To address this, we encourage you to follow the next steps to fix the clock:
- log into the VM
- adjust the file
/etc/adjtime
and replace LOCAL by UTC - reboot the VM
CC7 instance configured to static network
Issue
As part of the preparations for the recreation campaign, we have identified an inconsistency in the way that some CC7 images configure the network. Some of them rely on the metadata service while the newer ones rely on DHCP.
Fix
In order to find a common and consistent way of configuring the network and following the recommendation from the Linux Team, we would like you to review the network configuration on your VM, you can just run the following commands on your CC7 VM:
# disable cloud-init managing network
cat >> /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg << EOF
network: {config: disabled}
EOF
# initscripts don't like this file to be missing.
cat > /etc/sysconfig/network << EOF
NETWORKING=yes
NOZEROCONF=yes
EOF
# simple eth0 config, again not hard-coded to the build hardware
cat > /etc/sysconfig/network-scripts/ifcfg-eth0 << EOF
NAME="eth0"
DEVICE="eth0"
ONBOOT="yes"
NETBOOT="yes"
IPV6INIT="yes"
BOOTPROTO="dhcp"
TYPE="Ethernet"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
DHCPV6_DUID="llt"
USERCTL="yes"
PEERDNS="yes"
PERSISTENT_DHCLIENT="1"
EOF
Cannot access RHEL/ALMA machines via SSH
RedHat has deprecated DSA starting with their 8 release. Depending how your existing keypair was created, it could be using the deprecated system.
OpenStack supports multiple keypairs, so if you face this situation we recommend the creation of an additional keypair using the RSA system:
lxplus8$ ssh-keygen -t rsa -f ~/.ssh/keypair_rsa
lxplus8$ openstack key create --public-key ~/.ssh/keypair_rsa.pub rsa-keypair
+-------------+-------------------------------------------------+
| Field | Value |
+-------------+-------------------------------------------------+
| fingerprint | d5:ae:f0:6b:d2:50:48:d1:1a:28:a2:6c:8b:b5:11:18 |
| name | rsa-keypair |
| user_id | fernandl |
+-------------+-------------------------------------------------+
Once the key is created, you can spawn new machines specifying this key to be configured for SSH access.
Microsoft Windows
Boot error 0xc0000001
or 0xC00000BB
We have recently found Windows virtual machines running into boot problems with te
error codes 0xc0000001
or 0xC00000BB
. Common methods to fix the Windows boot
process doesn't seem to help fixing these issues.
The root cause for these cases look connected to the use of nested virtualisation
(e.g. Hyper-V
or Windows Subsystem for Linux
) and the CPU model exposed to the
virtual machines by the virtualisation layer.
If users experience this issue, they should reach out to the Cloud Infrastructure support line to assist in updating the CPU model of the machines.
Windows VM does not start up after a hypervisor reboot or intervention
Recent Windows versions (specifically Windows 11) require access to a credential called TPM when booting via secure boot. This credential is typically owned by the user who created the machine. When such a VM goes down, e.g. due to an issue on the hypervisor, it can happen that the system cannot access the credentials, and the VM fails to boot.
If you see this, reboot the machine as yourself. This should fix the issue.