High availability in cloud computing prevents a SPOF


You can’t control cloud hardware like you can in a private data center, so take a new approach to app deployment decisions that enable resiliency with elasticity on IaaS.

By default, when an administrator creates a VM in Microsoft Azure or another public cloud infrastructure, there is no real protection against downtime from a single point of failure (SPOF). If the physical node in that cloud fails, the VM will restart on a separate physical node — the user cannot control it and must make deployment decisions that protect the VM’s workloads.

There are issues — and fixes — to know about high availability in cloud computing before an enterprise deploys its applications.

Editor’s note: This article uses Microsoft Azure infrastructure as a service (IaaS) throughout as an example. While the information provided here is meant as a general guide, check with your public cloud provider of choice for specifics relating to the infrastructure.

Zoned for high availability

Cloud concepts underpin Azure designs. Availability zones — seen as single locations within Azure — essentially are numerous physical data centers in relatively close geographic location to one another. Therefore, an admin working on a cloud application deployment can choose to configure the system to fail over to an alternate data center within the zone in the event of a large-scale outage.

Clusters that run within each data center associate thousands of networked physical nodes. When one of these nodes experiences a failure, its hosted VMs restart on another physical node within the cluster. There are hundreds — or thousands — of clusters per site with public cloud computing.

Just as hypervisors in private data centers require patching, so do hypervisors that run on physical blade servers in public clouds. That’s bad news for a single machine that hosts a cloud app deployment. Several times a year, the VM must move while the underlying hypervisor is upgraded. Administrators do not have to worry about maintenance affecting services from the cloud provider, such as Microsoft’s DNS and Azure Directory, but the customer’s workloads are their responsibility.

Azure handles the issue of outages for customer VMs with availability sets. An availability set enables an administrator to allocate several machines, such as front-end web servers placed in different zones. This dispersion creates high availability in cloud computing architectures because the front-end service can handle requests regardless of a single failure or server update. Microsoft defines availability sets as logical groupings of VMs; availability sets give the cloud provider information on how the cloud application deployment is structured.

Any administrator who wants to hold Microsoft to its 99.95% uptime service-level agreement should use these groups for availability. One of the biggest lessons an administrator needs to get his head around for high availability in cloud computing is to treat hosting infrastructure like cattle — not pets. Administrators must design for a single node to fail with minimal repercussions on the cloud app deployment’s functionality.

Disks are presented in Azure differently than they are in on-premises managed IT infrastructures. Even storage nodes need upgrades, so the disks that the administrator sees as a single node are, in fact, replicated at least three times locally in an Azure data center. And additional levels of redundancy are available at a hefty cost — high availability in cloud computing is not free.

Availability sets take fault domains and upgrade domains into account for cloud app deployments. A fault domain is a group of infrastructure items that could be affected by a single issue, such as localized power failure. An upgrade domain comprises machines grouped to provide patching in a controlled manner.

An availability set ensures that a cloud-hosted service performs its duties on the requests it receives.

VMs cannot exist in isolation and still serve traffic, so the VMs in a given availability set rely on load balancers for networking control. Microsoft’s Azure load balancing ensures that traffic is routed to live servers — not failed ones or those in the midst of an upgrade.

Make high availability a reality

The bad news is that you cannot just add a VM to an availability zone after the initial cloud application deployment. An existing VM must be destroyed and recreated.

Cloud services are all about tiers — web tier, app tier, database tier — and each tier should have its own availability set. Don’t mix and match when designing a cloud app deployment; all the machines within the set should be identical with the same purpose.

To create an availability set, make a new machine in the Azure resource management console. Use managed disks to give Azure VMs persistent, secured disk storage replicated in three places. Select the option to create an availability set, and if this is a new availability set, give it a useful name. Repeat as needed to build the cloud application deployment.

Once the servers are created in the availability set, address the load balancer. Without the load balancer, no front-end IP will provide services. There are several ways to provide load balancing, including with PowerShell. This is the cloud operations world, where code is a first-class citizen.

High availability in cloud computing is as attainable as it is at your local data center, but it won’t happen on its own. Ensure availability of service with your cloud deployment choices. Administrators moving into the cloud need to understand that it is no longer about servers, but about services.

Source link