Skip to content

Reboot VM on failure

Virtual machines can be configured to reboot in the event of common failure scenarios. Standard techniques are described below which are supported on the CERN private cloud.

Reboot after panic on Linux

Using the Kernel parameter kernel.panic, you can set the machine to reboot after a number of seconds when there is a kernel panic.

kernel.panic = 30

A value of 0 means that it will not reboot.

This can be implemented by modifying the /etc/sysctl.conf file or using a configuration management tool such as Puppet to set the sysctl parameters as described in the configuration management user guide.

Using hardware watchdog on Linux

Linux provides a facility for monitoring that a VM is responsive called a Watchdog Timer. This will reboot the VM automatically if there is not a regular call to turn off the timer.

This covers cases such as kernel panic but also can detect blocked kernel scenarios also.

In OpenStack, this feature is enabled by setting the property hw_watchdog_action on the image used to boot the VM.

$ openstack image set --property hw_watchdog_action=reset 106cbee3-49ea-4241-a74d-413008251b4a
+------------------+--------------------------------------------------------------------------------------------------+
| Field            | Value                                                                                            |
+------------------+--------------------------------------------------------------------------------------------------+
| checksum         | c26f04a69c1229279f518c83858e36fe                                                                 |
| container_format | bare                                                                                             |
| created_at       | 2014-09-17T10:27:51                                                                              |
| deleted          | False                                                                                            |
| deleted_at       | None                                                                                             |
| disk_format      | qcow2                                                                                            |
| id               | 106cbee3-49ea-4241-a74d-413008251b4a                                                             |
| is_public        | False                                                                                            |
| min_disk         | 0                                                                                                |
| min_ram          | 0                                                                                                |
| name             | centos7                                                                                          |
| owner            | 841615a3-ece9-4622-9fa0-fdc178ed34f8                                                             |
| properties       | {u'hw_watchdog_action': u'reset', u'hw_scsi_model': u'virtio-scsi'}                              |
| protected        | False                                                                                            |
| size             | 343368704                                                                                        |
| status           | active                                                                                           |
| updated_at       | 2014-12-11T07:24:32                                                                              |
| virtual_size     | None                                                                                             |
+------------------+--------------------------------------------------------------------------------------------------+

The full list of possible settings is in the OpenStack CLI guide.

openstack image show

will give the current settings. If the property is not set, the watchdog monitoring will not be activated on the VMs from this image.

Once this is set, a VM can be booted in the usual way from this image. The setting cannot be changed for a running VM.

Testing the watchdog

Once a VM has been booted with the watchdog activated, this can then be enabled inside the VM.

With the watchdog enabled, a device /dev/watchdog should exist. If it is not present, the image settings above should be checked. The software for the watchdog monitoring should be installed as follows

yum install -y watchdog

The watchdog can then be enabled by defining the watchdog device by editing /etc/watchdog.conf and uncommenting the watchdog device line

watchdog-device = /dev/watchdog

The configuration can be tested using

wd_identify --config /etc/watchdog.conf

To run the daemon, starting the daemon and chkconfig it on for the next reboot.

service watchdog start