Brian Bulkowski, Co-founder and CTO of Aerospike
Sunil Sayyaparaju, Tech Lead, Aerospike
Internet environments for consumer-facing applications routinely demand high throughput and sub-millisecond latencies for read/write transactions against terabytes of data, and service-level agreements demand 100% uptime. This session will review 10 proven practices for ensuring the high performance and availability that interactive Internet applications demand—even during power outages or natural disasters. These real-world lessons come from supporting the largescale, multiple data center deployments of CTOs delivering platforms for the high-stakes ad sector, where speed means responses in 5 milliseconds or less, scale ranges from 200,000 to 2 million TPS against terabytes of data, and downtime is not an option. The lessons include:
#1. When scaling, keep it the architecture simple, so there are fewer points of failure. For instance, load balancers may fail at high transaction rates even as the database is cruising.
#2. Provide full end-to-end automation. People make mistakes, and anything that’s not automated will have production issues.
#3. Keep the system asynchronous; otherwise one small failure will quickly snowball into an avalanche of degradation.
#4. Keep metrics of everything, because scale tends to creep up from behind, and no one wants to be caught blind.
#5. Ensure full intra-data center redundancy because servers fail…often.
#6. Extend full data redundancy across multiple data centers, so storms like Sandy don’t put operations out of commission.
#7. Have a back-up plan for a remote graceful shutdown that accounts for IP-based security.
#8. Make sure code is testable, so there’s a way to let the world know what’s going on.
#9. Divide intelligence into online and offline, so all the heavy lifting with predictive modeling is offline.
#10. Use the right data management tool for the job; too often “all-in-one” means mediocre for all.
Dinner will be provided.