- Serving Direct Messages At Twitter Scale
I'm happy to announce Twitter will once again be hosting our meetup in June. If you're not familiar with the Twitter office in Boulder they are doing some really great work with some really big data. Chris Hayes and Mark Henderson from the Boulder Twitter team will be giving "lightning talks" on the Twitter infrastructure for handling Direct Messages. Come and hear some of the challenges and solutions the local Twitter team is working on. Agenda 6:00 – 6:45 - Socialize over food and drinks 6:45 – 7:00 - Welcome, opening remarks and announcements 7:00 – 8:30 – Chris Hayes, Mark Henderson & Shai Wilson - Lighting Talks 8:30 - 9:00 - Networking About the event Twitter’s Direct Messaging backend engineering team will deliver 3 lightning talks covering the DM engineering+product landscape. 1. When a server is overloaded by its clients, it’s a common reaction to begin rejecting traffic and expect the clients to back off in response. But what if you’re a client, and only a well-defined subset of your traffic is getting rejected? There are strategies for backing off only the subset, and Chris will explain one such strategy in use at Twitter today. 2. Twitter delivers a massive amount of Direct Messages sent every day. How do we reliably deliver every single message while maintaining consistent state across multiple clients and downstream systems? Mark will introduce the reliable delivery architecture behind Direct Messages. 3. Compared to pure infra projects, product projects have more stakeholders. There are engineers, researchers, the PM, executives, designers, lawyers, and many other parties who care about how the feature gets implemented. Everyone also has a different angle; however, working on a product feature requires quick decision making and process to reach a consensus. About the Speakers Chris Hayes Chris Hayes has worked on myriad products and technologies over the years, from front-end to back-end and from media editing/flowchart tools to database abstraction layers. But, his passion has always been for human interactions and psychology, and he always tries to think about how the users will respond to a feature or use it unexpectedly. Mark Henderson Mark has been designing distributed backend systems since 2012. Before Twitter, he helped keep Amazon's fulfillment backend running, including integrations such as Kiva robotics and Amazon Fresh. He more recently worked on Google's payment processing systems. Outside the 9-5, he's an avid swing dancer in the Denver area. Shai (pronounced shay) Wilson Shai has been at Twitter for almost 3 years. She’s also made appearances at the United Nations, the Internet Archive, and Square. What really draws her into a particular project is how it aims to connect people. She sees user experience and personal detail as a potential to make a way bigger impact.
- The evolution of low latency analytics at Risk Management Solutions (RMS)
I'm excited to announce that Chris George is going to come and speak about the big data work they have been doing for years over at Risk Management Solutions. Chris has a ton of experience with big data technologies and RMS is a contributor to the big data open source community through projects like Apache Arrow. Come hear how Chris and his team have built a low latency query system and how their approach has evolved over time. There will be a sponsored food truck from “Big Daddy's Texas BBQ” as well as soda and beer. Zoom link for the meetup: https://rms.zoom.us/j/472255273 Agenda 6:00 – 6:45 - Socialize over food and drinks 6:45 – 7:00 - Welcome, opening remarks and announcements 7:00 – 8:30 – Chris George - The evolution of low latency analytics at RMS. 8:30-9:00 - Networking About the event Learn about the evolution of RMS' low latency analytics. Our evolution on persistent spark contexts to building our query engine with Apache Arrow when pre-computed dimensions are not an option while still being cost effective. A continuation of a talk that Chris gave at the Big Data Conference 2 years ago see we started as heavy contributors and users of Apache Kudu and have since moved on to parquet and snowflake with our own custom query engine. See real numbers and code as well as the pitfalls of the various approaches to this challenge. About the Speaker Chris George is Senior Director and Architect of Core Platform at RMS. He has worked with big data technologies for over 15 years including building legal citation detection and storage systems, large scale affiliate attribution and payment systems, real-time metrics for large scale ad tech brokerage and most recently building low latency insurance exposure systems. Chris has used a variety of big data technologies in production settings and brings that experience to the challenges at RMS.
- How Alphabet's Verily plans to ride the health data tsunami
I am excited to have Jay Gengelbach come talk about what Verily Life Sciences is working on right here in Boulder! For those of you that don't know Verily is an Alphabet company focused on using technology to better understand health, as well as prevent, detect, and manage disease. Jay is the site lead for Verily Boulder and I had the good fortune to work for Jay briefly at Google. He is incredibly smart, a great communicator and an even better leader. Come learn about the work the local Verily team is doing and how they are building solutions that solve important health care problems now and scale to solve them for everyone! Come on, how often do you get to hear about the future of Health care being created right here in Boulder! Great opportunity to get insight into some impressive work being done and how a seasoned Google infrastructure leader thinks about and approaches the problems. Parking details at the end of the announcement. Agenda 6:00 – 6:45 - Socialize over food and drinks 6:45 – 7:00 - Welcome, opening remarks and announcements 7:00 – 8:30 – Jay Gengelbach - How Alphabet's Verily plans to ride the health data tsunami 8:30-9:00 - Networking About the event Can your phone's accelerometer detect early warning signs of Parkinson's disease? Can your FitBit detect heartbeat irregularities that could signal an impending heart attack? Sensors get faster and cheaper all the time, which means that the range of health-relevant data that can be cheaply acquired via wearables or passive household sensors is starting to explode. Verily Life Sciences dreams of a world where health management moves beyond the walls of clinics and hospitals and into your homes and daily lives. In this world, diseases can be caught in their earliest stages, when they are the most manageable. This dream isn't without its challenges, though. When the scope of your health data isn't just captured by the results of an annual blood test and sporadic clinical notes by your doctors but by a stream of high-fidelity round-the-clock sensor data, or by the expansive results of sequencing your entire genome (plus your ever-changing microbiome), then the complete collection of your health data is no longer something that you could carry around on a single DVD. Come hear about how Verily is preparing for a world where there could be terabytes of data on every human's health, and what we hope becomes feasible with data sets of that size. About the Speaker Jay Gengelbach is the site lead for Verily Boulder. This Alphabet company opened its presence in Colorado in 2018, and is expanding its local presence while still being housed in the offices of its sibling company, Google. Jay is a 12-year veteran of Google infrastructure teams, and is bringing his infrastructure expertise to Verily's software organization as it seeks to scale up its software offerings to more customers, more patients, more locations, and more data. We're hiring! Check out http://verily.com/careers for our open roles! Jay's a native of the midwest who failed to fall in love with the chaos of Silicon Valley. He moved to Colorado on a whim in 2008 and has been putting down roots ever since. He lives in Boulder with his wife, 3 kids, and 2 miniature schnauzers. Parking Logistics There's ample parking on site underneath the Google buildings. Use the employee entrance accessible from 30th Street and turn right into the garage entrance. There will either be an attendant there to buzz you into the garage or you can press the call button to let security know you're there for the event. Once parked, find the elevator to building B and take it to the first floor reception area, from which you'll find the event space. (You'll need an employee badge to access building A or any other floors of building B.) Alternatively, if on foot or arriving by transit, you can use the main entrance, found off the courtyard between the two buildings. There is call button outside to get buzzed into reception.
- Strava Data + Scale + Community = Impact
About the Event: Kate, a Senior Product Analysts at Strava, will be sharing how Strava works with big data at scale in order to impact each and every athletes experience on the platform. She will show how Strava leverages Snowflake and AWS to build and share out their “Year in Sport” videos created for each athlete and how that same data is used to help athletes train to their greatest potential. 6:00 – 6:45 - Socialize over food and drinks 6:45 – 7:00 - Welcome, opening remarks and announcements 7:00 – 8:30 – Kate Treadwell - Data + Scale + Community = Impact 8:30-9:00 - Networking About the Speaker Kate Treadwell is a Senior Product Analyst at Strava. Having worked in the data industry for over 15 years, Kate has a wealth of experience in every aspect of data processes. She has worked with companies such as Nike, Google, Airbnb, Pinterest, and many more to help them build, understand, and leverage their data. Today she works at Strava in order to help everyday athletes better understand themselves and their strengths.
- Big Data Transformation: Moving from Hadoop and data-streaming to micro-batch
About the Event: Over the last year Sovrn has shifted our big data workloads from bare-metal hosted Hadoop to mirco-batch leveraging updated ETL and data lake constructs in AWS S3 and EMR. We will discuss our transformation, key learnings, and our roadmap for evolution. 6:00 – 6:45 - Socialize over food and drinks 6:45 – 7:00 - Welcome, opening remarks and announcements 7:00 – 8:30 – Kyle, Nick & Rob - Big Data Transformation @ Sovrn 8:30-9:00 - Networking About the Speakers Kyle Gilliland, VP of Platform Engineering at Sovrn Holdings Kyle has a deep background in driving transformation across software development teams through leading DevOps, Cloud, and now Big Data transformations. He has experince across industries in various technology roles across financial, retail, education, and adTech based companies. He is a driven and transparent leader that loves to build diverse teams focused on solving interesting problems. https://linkedin.com/in/kyle-gilliland Nick Vedder, Data Science Consultant for Slalom Consulting LLC Nick Vedder was the technical lead on Sovrn's Data Science team during their migration of services to AWS. He designed and developed the workflow model that powers all of Sovrn's Data Science jobs including their real time price floor optimization algorithm using Airflow, EMR, and Docker. He was also a key developer for Sovrn's DaaS product and has recently worked on developing their Identity Graph capabilities. He has an academic background in econometrics and applies that discipline to forecasting, causal analysis, and testing frameworks. https://linkedin.com/in/nick-vedder Rob Cuthbertson, Director of Engineering for the Data Platform at Sovrn Holdings Rob has worked in Ad Tech for 8 years, and a string of innovative teams and startups for decades prior. Rob has worked in Telecom, Energy, NLP/ML analytics, and consulting. He loves to lead amazing engineering teams based on a culture of continuous learning. https://linkedin.com/in/ColoradoRob
- Introducing the Data River: Apache Druid is the next analytics platform
Oracle Broomfield Office: Bldg 1 Conference Room 1 and 2
About the Event: Have you heard the term "Data River" before? If not this is meetup should be a great introduction to Data Rivers. Daniel Rose, formerly of Qubole, is on the leading edge of what is new and interesting in the big data world. Apache Druid is the newest tool in your analytics toolkit. It is a distributed, column orientated OLAP database with native SQL support. It delivers sub-second ad-hoc queries against both streaming and batch (Hadoop) data. Come and learn about Data Rivers and how you can use Apache Druid to build your own. Gian Merlino, the Druid founder, will be speaking about Druid and Imply. We will also have one or more engineering teams talking about how they are implementing Druid, today. 6:00 – 6:15 - Socialize over food and drinks 6:15 – 6:30 - Welcome, opening remarks and announcements 6:30 – 715 – Gian Merlino - Apache Druid PMC 7:15 - 8:00 - Nate Vogel and Andy Amick - Druid at Charter 8:00-8:30 - Networking About the Speakers Gian Merlino Gian Merlino is an Apache Druid (incubating) PMC member and a co-founder of Imply. Previously, Gian led the data ingestion team at Metamarkets and held senior engineering positions at Yahoo. He holds a BS in Computer Science from Caltech. Nate Vogel Nate has nearly 20 years’ experience developing software and 10 managing teams. A majority of that time has been focused on large scale, distributed systems architectures to manage the end to end lifecycle (ingest to data mart/warehouse) of considerable data sets and real time reporting requirements. He currently works at Charter on the Product Intelligence team managing three Platform teams responsible for building the backbone of Charter Product’s reporting infrastructure. His team has recently invested substantial time and capital developing a proof-of-concept imply druid cluster to address the ever-growing need for real time and historical (years) reporting needs across many custom aggregates. Andy Amick Andy Amick has over 20 years of software development experience. The last 13 years, he has focused on data warehouse and analytics solutions. He currently works at Charter on the Product Intelligence team, leading the development of a real-time analytics pipeline.
- Usage of Apache Phoenix Statistics to Scale the Salesforce Platform
About the Event: Hey wait! Did you know Salesforce has an office in Lousiville? Well, now you do! What's even better is Brian Esserlieu is doing some really great work with Big Data out of the Louisville office. Come hear how Salesforce uses Apache Phoenix to scale the Salesforce Platform. If you've been hoping for a deep technical discussion you won't be disappointed! Agenda 6:00 – 6:15 - Socialize over food and drinks 6:15 – 6:30 - Welcome, opening remarks and announcements 6:30 – 8:00 – Apache Phoenix - Brian Esserlieu 8:00-8:30 - Networking About the Presentation Salesforce's Platform Data Services team has been in charge of introducing massively scalable data solutions to Salesforce's internal teams, as well as exposing this technology to external customers on the Salesforce Platform through custom big objects. Customers can now create custom business objects capable of storing billions of records, all while being able to use all of the standard APIs they've been used to using for the past many years. A major feature of the big object offering is the ability to run queries on big objects that have real time, low latency requirements, such as those needed by UI components. This talk will cover how Salesforce is using Apache Phoenix's statistics feature to greatly expand our query support to help us strike a balance between deterministic query times and potentially unbounded scans. About the Speaker Brian Esserlieu Brian Esserlieu is a Lead Member of Technical Staff currently working on the Platform Data Services team at Salesforce. He has been at the company for over 6 years, and has been on his current team for 4 1/2. He has built Salesforce's first big data features from conception to being fully released to the public. We continue to expose other big data concepts like map-reduce, compute, and more to customers through the Salesforce platform. I'm a former Colorado native who has spent the past 15 years in California, but have finally returned this year! I've worked in several technology areas, including a mobile app development startup (pre-smartphone era!), and a government defense subcontracting company. I eventually went back to school and graduated from the University of California, San Diego 6 years ago, and have been happily working for Salesforce since.
- LogRhythm’s Road to Cloud Analytics
About the Event: I'm excited to have Joel Holsteen from LogRhythm speak at our October meetup! For those of you that don't know LogRhythm they are a security intelligence company that relies on tons of data to provide security analytics. Come hear Joel talk about how they built one of their products, CloudAI, and all of the challenges along the way. Agenda 6:00 – 6:15 - Socialize over food and drinks 6:15 – 6:30 - Welcome, opening remarks and announcements 6:30 – 8:00 – LogRhythm’s Road to Cloud Analytics - Joel Holsteen 8:00-8:30 - Networking Presentations Over the past few years, LogRhythm has been developing a cloud analytics platform, CloudAI, to perform security focused analytics on customer data. We have been faced with the challenge of taking a go-to market product and scaling it for the future while reducing operational overhead. Along the way we have learned a lot about the challenges that come with building a scalable multi-tenant distributed architecture. I want to discuss problems that we faced while scaling CloudAI such as the tension between latency and correctness of results, the tradeoffs between batch and stream processing, or the difficulties surrounding out of order data. About the Speaker Joel Holsteen Joel Holsteen is a senior software engineer at LogRhythm and is the technical lead on the cloud analytics team. He has played a key role building and scaling CloudAI over the past three years and has worked on the entire stack to bring CloudAI to market. He is an advocate of technologies such as apache beam and is passionate about building distributed analytics solutions.
- KSQL: Stream Processing Was Never This Easy
I'm excited to have Tim Berglund come and talk about KSQL! Tim is a great speaker with incredible technical depth. Come and hear Tim talk about the KSQL and do a deep dive into technical details with live live coding on live streaming data. How cool is that? For those of you hoping for a deep technical dive this should be just the ticket! Looking forward to seeing everyone! Agenda 6:00 – 6:15 - Socialize over food and drinks 6:15 – 6:30 - Welcome, opening remarks and announcements 6:30 – 8:00 – KSQL: Stream Processing Was Never This Easy - Tim Berglund 8:00 - 8:30 - Networking About the talk Kafka now offers KSQL, a declarative, SQL-like stream processing language that lets you define powerful stream-processing applications easily. What once took some moderately sophisticated Java code can now be done at the command line with a familiar and eminently approachable syntax. Come to this talk for an overview of KSQL with live coding on live streaming data. About the speaker Tim is a teacher, author, and technology leader with Confluent, where he serves as the Senior Director of Developer Experience. He can frequently be found at speaking at conferences in the United States and all over the world. He is the co-presenter of various O’Reilly training videos on topics ranging from Git to Distributed Systems, and is the author of Gradle Beyond the Basics. He tweets as @tlberglund, blogs very occasionally at http://timberglund.com, is the co-host of the http://devrelrad.io podcast, and lives in Littleton, CO, USA with the wife of his youth and their youngest child, the other two having mostly grown up.
- The Amazon Infrastructure team - Big Data of Data Centers
Did you know that the Amazon AWS Infrastructure team has a presence here in Colorado? I didn't which is why I'm unbelievably excited to have the Amazon AWS Infrastructure team come and talk about the great things they are working on. The Amazon AWS Infrastructure team will come and talk about big data of data centers. It takes a lot of power and cooling equipment to run a modern data center. Come learn about how this equipment is used, what data it produces, and how that data is used to enable and improve the operation of the data center. This should be a fantastic talk! Please join us and learn more about Amazon AWS Infrastructure and the team. Looking forward to seeing you there! Agenda 6:00 – 6:15 - Socialize over food and drinks 6:15 – 6:30 - Welcome, opening remarks and announcements 6:30 – 8:00 – The Big Data of Data Centers - Jamey Wood 8:00 - 8:30 - Networking About the speaker Jamey Wood is a Senior Manager in the AWS Infrastructure organization within Amazon Web Services. His team is responsible for writing and running software that applies modern Internet Of Things technologies to inventory, monitor, and manage critical infrastructure devices in Amazon data centers (which, in turn, house everything from EC2 and S3 to Amazon.com). Prior to his time at Amazon, Jamey was the CTO of Wayin. There, he drove the development of a system that processed approximately one billion social media messages every day.