R+ at Scale, Google & Apache Beam


Details
This meetup focuses on engineering large systems for Scalability. Our current focus is on technologies to do data science at scale: Distributed Systems, machine learning, AI, Blockchain, databases, and more!
We are heavily focused on deep, technical talks. No marketing pitches, no light use case discussions, no pitches. We want to see architecture diagrams, code, and hear real stories from the trenches.
Besides distributed systems and Big Data, we're also interested in hearing about high-performance engineering techniques and futuristic technologies.
We've had great success in the past, and are growing quickly! Previous guests were from Facebook, Twitter, LinkedIn, Amazon, Cloudant, Microsoft, MongoDB, and others. We love hearing from practitioners.
This month's guests:
Esin Saka, PhD. Software Engineer at Microsoft
R+ at Scale
Almost one month ago, we announced the general availability of Azure Data Lake (https://azure.microsoft.com/en-us/blog/the-intelligent-data-lake/), an environment to store huge datasets and run on-demand analytics that instantly scales for the users’ needs. The announcement included Azure Data Lake Analytics (https://azure.microsoft.com/en-us/services/data-lake-analytics/), the first cloud analytics job service where users can easily develop and run massively parallel data transformation and processing programs in U-SQL, R, Python and .NET over petabytes of data. It has rich built-in cognitive capabilities such as image tagging, emotion detection, face detection, deriving meaning from text, and sentiment analysis with the ability to extend to any type of analytics.
In our talk, we will focus on R support and share examples.
Dr. Esin Saka works at Microsoft since 2011 and was the founding Vice Chair of ACM SIGKDD Seattle Chapter. Her research interests include distributed machine learning, Web mining, recommender systems, swarm intelligence, multi-agent systems, and genetic programming.
Frances Perry, PhD. Google Engineer, Committer on Apache Beam (incubating)
Fundamentals of Stream Processing with Apache Beam Apache Beam (unified Batch and strEAM processing!) is a new Apache incubator project. Originally based on years of experience developing Big Data infrastructure within Google (such as MapReduce, FlumeJava, and MillWheel), it has now been donated to the OSS community at large. Come learn the fundamentals of out-of-order stream processing, and how Beam’s powerful tools for reasoning about time greatly simplify this complex task.
Beam provides a model that allows developers to focus on the four important questions that must be answered by any stream processing pipeline:
- What results are being calculated?
- Where in event time are they calculated?
- When in processing time are they materialized?
- How do refinements of result relate?
By cleanly separating these questions from runtime characteristics, Beam programs become portable across multiple runtime environments, both proprietary (e.g., Google Cloud Dataflow) and open-source (e.g., Apache Flink, Apache Spark, et al).
Our format is flexible: We usually have 2 speakers who talk for ~30 minutes each and then do Q+A plus discussion (about 45 minutes each talk) finish by 8:45.
There'll be beer afterwards, of course!
Meetup Location:
Whitepages (http://maps.google.com/maps?q=1301+5th+Avenue+%231700%2C+Seattle%2C+WA), 1301 5th Avenue #1600, Seattle, WA
After-beer Location: Rockbottom is the location for tomorrow’s after-beer c/o Greythorn team.
Doors open 30 minutes ahead of show-time. Please show up at least 15 minutes early out of respect for our first speaker.
Parking is available in the building and is valet only. Cost is $8.00 after 6pm. (Enter on Union between 4th & 5th) Additional parking can be found in the Hilton Parking Garage. Cost is $8.00 after 5pm. Enter on 6th Ave between University and Union. There is also street parking downtown.

R+ at Scale, Google & Apache Beam