Reboot VM on failure
Virtual machines can be configured to reboot in the event of common failure scenarios. Standard techniques are described below which are supported on the CERN private cloud.
Reboot after panic on Linux
Using the Kernel parameter kernel.panic, you can set the machine to reboot after a number of seconds when there is a kernel panic.
kernel.panic = 30
A value of 0 means that it will not reboot.
This can be implemented by modifying the /etc/sysctl.conf file or using a configuration management tool such as Puppet to set the sysctl parameters as described in the configuration management user guide.
Using hardware watchdog on Linux
Linux provides a facility for monitoring that a VM is responsive called a Watchdog Timer. This will reboot the VM automatically if there is not a regular call to turn off the timer.
This covers cases such as kernel panic but also can detect blocked kernel scenarios also.
In OpenStack, this feature is enabled by setting the property hw_watchdog_action
on the image used to boot the VM.
$ openstack image set --property hw_watchdog_action=reset 106cbee3-49ea-4241-a74d-413008251b4a
+------------------+--------------------------------------------------------------------------------------------------+
| Field | Value |
+------------------+--------------------------------------------------------------------------------------------------+
| checksum | c26f04a69c1229279f518c83858e36fe |
| container_format | bare |
| created_at | 2014-09-17T10:27:51 |
| deleted | False |
| deleted_at | None |
| disk_format | qcow2 |
| id | 106cbee3-49ea-4241-a74d-413008251b4a |
| is_public | False |
| min_disk | 0 |
| min_ram | 0 |
| name | centos7 |
| owner | 841615a3-ece9-4622-9fa0-fdc178ed34f8 |
| properties | {u'hw_watchdog_action': u'reset', u'hw_scsi_model': u'virtio-scsi'} |
| protected | False |
| size | 343368704 |
| status | active |
| updated_at | 2014-12-11T07:24:32 |
| virtual_size | None |
+------------------+--------------------------------------------------------------------------------------------------+
The full list of possible settings is in the OpenStack CLI guide.
openstack image show
will give the current settings. If the property is not set, the watchdog monitoring will not be activated on the VMs from this image.
Once this is set, a VM can be booted in the usual way from this image. The setting cannot be changed for a running VM.
Testing the watchdog
Once a VM has been booted with the watchdog activated, this can then be enabled inside the VM.
With the watchdog enabled, a device /dev/watchdog
should exist. If it is not present, the image settings above should be checked.
The software for the watchdog monitoring should be installed as follows
yum install -y watchdog
The watchdog can then be enabled by defining the watchdog device by editing /etc/watchdog.conf
and uncommenting the watchdog device line
watchdog-device = /dev/watchdog
The configuration can be tested using
wd_identify --config /etc/watchdog.conf
To run the daemon, starting the daemon and chkconfig it on for the next reboot.
service watchdog start