The growth of ephemeral infrastructure has had an astounding impact on IT; from slashing the cost of product development to obliterating traditional notions about the lines between engineering disciplines. Today, distributed systems architecture is composed of a dizzying array of components like message queues and stream processing systems that were unheard of in contemporary infrastructure design 5 years ago. In this talk we'll examine Librato's system architecture, which is entirely implemented on AWS and relies heavily on Apache’s Storm and Cassandra projects to sustain several hundred thousand http POST operations per second and store around 10 billion data points. We give infrastructure details, recounting our initial design decisions as well as various scaling challenges that have forced us to refactor our storage designs. Finally we relate the system and performance metrics that we’ve found useful in monitoring our storage infrastructure.
About Dave Josephsen
As the developer evangelist for Librato, Dave Josephsen hacks on tools and documentation, writes about statistics, systems monitoring, alerting, metrics collection and visualization, and generally does anything he can to help engineers and developers close the feedback loop in their systems. He's written books for Prentice Hall and O'Reilly, speaks shell, Go, C, Python, Perl and a little bit of Spanish (in that order), and has never lost a game of Calvinball.