What Your Business Can Learn from Cloud Outages

One of the biggest concerns about the infrastructure-as-a-service model is the loss of control and management upon moving into a multi-tenant environment. Though it has been over a year now, Amazon Elastic Cloud Computing (EC2) outage will live on and provide important lessons for businesses as they look to cloud computing for their IT future.

While there have been high profile cloud outages before, the scale and length of Amazon's unexpected downtime, as well as the profile of some of the clients that were dragged down with it, make it all the more impactful. For several days due to Amazon.com's cloud outage many high-profile Web sites went offline and lost millions of dollars.

In April 2011 because of server problems at Amazon's data center, the outage partially disabled or knocked out popular websites including Reddit, Foursquare, Netflix, Quora and HootSuite. Then in August 2011 in Dublin, Ireland, a lightning strike took down service for some key European cloud computing hubs, users of Amazon and Microsoft’s cloud services. Then again in June 2012 because of massive thunderstorms Amazon’s U.S. East data center went down while other cloud services hosted in the same area kept running. This outage — the second this month – took down Netflix , Instagram, Pinterest, and Heroku. If Amazon is the market-leading cloud service provider then many of us wonder why Amazon’s data center did not cut over to generator power while others did.

While many cloud computing vendors are relying on cloud computing platforms, like amazon.com, to host their cloud computing offering, Other companies with deep pockets — Microsoft, Rackspace, Google, Hewlett-Packard — are competing with Amazon. Amazon.com, however, is having difficulty to keep their data centers online.

There are ways to mitigate some of the potential challenges of an outage like Amazon's. With some care and forethought, small businesses can still turn to the cloud as a way to reduce time and money.

Before rushing into any new cloud infrastructure deal, take the following steps to mitigate the risk of infrastructure-as-a-service failure.

  1. Plan to fail. One cannot/should not rely on one solution for life. You need to develop detailed cloud breakdown scenarios and perform recovery run-throughs. "Put your risk-mitigation strategy firmly in place before moving into the cloud environment," says Phil Fersht, founder of an outsourcing analyst firm HfS Research.
  2. Keep some expertise in house. One of the allures of cloud sourcing is the notion that you no longer have to maintain internal knowledge of the technologies that support as-a-service solutions. However, when you need to prepare for and react to cloud problems you should consider hiring consultants to create a disaster recovery and business continuity plan.
  3. Test that plan. Then test it again. You can easily create a staged environment that mirrors production and test your systems by killing running services and evaluating how your system performs under failure. "The cloud is the perfect place to test failures in a completely staged environment," says Donald Flood, vice president of engineering for Bizo.
  4. Create internal back-up options. IT leaders must maintain internal contingency capabilities. No one can survive downtime and internal back-up is the only way to prevent it. It took about two days for Amazon to locate and repair the problems at its data center in northern Virginia. U.S. Tennis Association CIO Larry Bonfante began to notice application sluggishness. Therefore he and his team migrated the USTA's critical systems to their own server.
  5. Reexamine your sourcing strategy. You need to have your own IT staff get smart about how cloud works, or you really do risk potentially losing control over your own IT environment. As more services get built on top of cloud computing infrastructures, a seemingly isolated outage can have a domino effect, taking down many services or an entire application environment, says Fersht. IT leaders have embraced multi-sourcing, but that model can make cloud continuity confusing.
  6. Don't be cheap. The ROI of redundancy investments skyrockets in cloud collapse scenarios. Many of the companies affected by Amazon's failure could not or would not pay to run parallel systems in the cloud. Critical data should be replicated across multiple availability zones and backed up or live replicated across regions; active servers should be distributed geographically, and there should be enough active capacity to shift locations should one data center implode, advises Thorsten Von Eicken, CTO and co-founder of cloud management vendor RightScale. "Of course all this has costs, so each business needs to determine which costs are justified for each service being offered," he adds.
  7. Put your provider on the hook. Make sure your cloud vendors have some skin in the game with a contract that ties outages to service levels. "If you are subcontracting to a third-party cloud provider, ensure they are responsible for these outages and can't [absolve] themselves of responsibility," Fersht says.

At easySERVICE™, we want to improve your backup strategy. And we believe that the safest way to store your data is through a multi-layered strategy involving multiple storage locations and real-time backups. But should disaster strike anyway, you can trust easySERVICE™ to recover your data and help you get back to business.

Source: PC World, CIO