11436 SSO

Apigee and the AWS Maintenance Reboot: All Clear

Apigee Product Team
Oct 09, 2014

On Sept. 24th, our cloud hosting provider Amazon Web Services notified us that it would roll out an urgent and critical patch to hosts that would entail a maintenance reboot of EC2 instances from Sept. 26 through Sept. 30.

This all happened because of a hypervisor vulnerabliity. Amazon scrambled to patch it, and did so within a week, accross hundreds of thousands of servers. The amount of operational work AWS did to make that happen sets a pretty high bar; that's why Apigee keeps its cloud there. At the same time, we marshalled  a pretty big response to ensure we were prepared in case anything unexpected happened.

A big part of our architecture is the ability to route API traffic to the nearest region, to let customers choose their geographic region, to ensure data is consistent over regions so they can get access if one region is down. Our customers don't want to see any downtime; the high reliability and redundancy provided by Cassandra and Zookeeper has thus become core to our technology stack.

We're glad to report that our customers didn't experience any service disruptions during the reboot. In this video, Apigee's Greg Brail (@GBrail), Sridhar Rajagopalan (@sri_at_apigeeand Bala Kasiviswanathan (@BalaK) discuss how Apigee prepared and how our products were architected to handle these kinds of situations. 

Scaling Microservices