This meetup focuses on Scalability and technologies to enable handling large amounts of data: Hadoop, HBase, distributed NoSQL databases, and more!
There's not only a focus on technology, but also everything surrounding it including operations, management, business use cases, and more.
We've had great success in the past, and are growing quickly! Previous guests were from Twitter, LinkedIn, Amazon, Cloudant, Microsoft, 10gen/MongoDB, and more.
This month's guests:
Shu Wu, Facebook - 700 Million Videos in 4 Days
For Facebook’s 10 year anniversary, we produced 700 million personalized “lookback” videos. Without any marketing or promotions within Facebook, these videos were shared by over 100M people in the first two days. To make this happen, we scanned through 4 billion years of timelines and pre-rendered about 11 petabytes of videos. I’ll talk about the algorithms used to select the stories, how the videos were rendered and the systems used to scale it. The development time for this project was 3 weeks, including a few days for the infrastructure to compute and store the output. If there’s time, I’ll talk about how we built the system to “edit” the videos in real-time.
Claudiu Barbura, Atigeo - xPatterns on Spark, Shark, Tachyon and Mesos
xPatterns is a big data analytics platform that enables rapid development of enterprise-grade analytical applications through built in apis and tools, driven from a management console with data, application and system monitoring. We will showcase the entire lifecycle of one of the xPatterns applications built for our largest production customer (20 billion healthcare records, 200 TB of compressed hdfs data) while evolving our infrastructure from Hadoop and Hive to Spark, Shark, Tachyon and Mesos. We will provide detailed ELT pipeline stats with lessons learned (Hive vs Shark vs Shark w/ Tachyon), live demos of Jaws, our Rest SharkServer and web console for exploring the warehouse through Shark queries, Mesos providing resource management for multiple workloads (Hadoop/Hive, Spark, Jaws), the Export to NoSql API console (generates geo-replicated apis for real-time access to Cassandra data exported from the warehouse through Spark jobs), the Referral Provider Network, a user-facing dashboard application (D3.js) and finally monitoring and instrumentations consoles (Nagios, Ganglia and Graphite)
Our format is flexible: We usually have 2 speakers who talk for ~30 minutes each and then do Q+A plus discussion (about 45 minutes each talk) finish by 8:45.
There'll be beer afterwards, of course!
WhitePages,[masked]th Avenue #1600, Seattle, WA
Rock Bottom Brewery.
Doors open 30 minutes ahead of show-time. Please show up at least 15 minutes early out of respect for our first speaker.