One of the basic principles of cloud computing is that infrastructure can go wrong. The computer centre infrastructure, the hardware, the operating system, the network can all have failure scenarios. With cloud computing and a modern application architecture, these failure scenarios are handled at the application level by some of the techniques described in this chapter such as
- Restarting VMs on operating system failure
- Load balancing
- Placing virtual machines in different compute availability zones of the computer centre
- Placing volumes in different storage availability zones of the computer centre
Also, the presentation Achieving maximum uptime: architecting resilient services in the CERN Cloud Infrastructure gives a technical overview of the CERN Cloud Infrastructure in order to build resilient services.
Last update: November 11, 2021