Faster and More Scalable than Memcache/Redis: Tiered Storage with LCache
Scalable caching in Drupal is broken. Once cache access saturates a network link, the main options are Memcache sharding (which has broken coherency during and after network splits) and Redis clustering (immature in multi-master and as complex as MySQL replication in master/replica modes).
We can do better. We can have better performance, scale, and operational simplicity. We just need to take a lesson from multicore processor architectures and their use of L1/L2 caches. Drupal doesn't even need full-scale coherency management; it just needs the cache writes on an earlier request to be guaranteed readable on a later request.
Inspired by how processors handle core-local caches and the design of Pantheon's Valhalla file system, I've written a library and module called LCache with the following properties:
- Uses the database for canonical cache storage and coherency management. This is the L2.
- Opportunistically uses APCu (which is local to each PHP-FPM pool and absent for CLI) as an L1.
- Before Drupal bootstraps, it freshens APCu (if present) using L2's causality and sequencing information.
- Falls back to just using the database cache if APCu isn't present or functional.
- There is nothing extra to deploy or configure, just enabling the APCu extension for PHP.
- Support for tag-based invalidation.
Initial results show 20% less page-generation time than with datacenter-local (but not host-local) Redis access. This is primarily because there's only one round-trip to the cache for each request, and that request only returns data for cache updates.
In this presentation, I will cover:
- The coherency protocol, including some discussion of vector clocks
- Benchmarks
- The unit testing model for the cache coherency implementation
- How fallback to pure database caching works
- Why core should adopt this as the default cache implementation in, say, Drupal 8.3 or 8.4.