Handling Powerball Night: Scaling Patch.com in 48 Hours to Survive Being #1 in Google Results
Patch.com had a problem many would like to have after the first Powerball drawing: the Google juice was flowing so fast that standard datacenter links (~2gbps) were saturating. And, because links were maxing out, we couldn't even know just how much more capacity we needed for the next drawing. We had a couple days to prepare -- on a site that requires nearly instant invalidation to keep its content fresh.
There weren't many options we could turn around so quickly, so we talked to CDN providers and investigated other traffic-management options. It wasn't just an issue of getting a solution set up but also configuring, testing, and transitioning over with time to spare. This is the 48-hour story of surviving Google's catapult into the top 100 sites.
In this presentation, we will cover:
- Services and configurations you can deploy on short notice for site scaling, including costs and time-to-availability
- Running HTTPS at scale (and avoiding man-in-the-middle attacks when communicating with the origin)
- Configuring different cache lifetimes along the stack to allow precise invalidation control -- while still handling spikes elegantly
- Origin shields: balancing geographic proximity with hitting the cache
- How HTTP-standard Surrogate-Control headers manage differing cache lifetimes along the stack
- Using Varnish's grace mode (in this case via Fastly) as a Plan B if the origin experiences issues
- At each stage, making this all work well with Drupal's own cache invalidation hooks and header system