Women in Big Data Apache Spark Meetup @ Databricks


Details
Hosted and moderated by Maddie Schults (https://www.linkedin.com/in/maddieschults/) and Vida Ha (https://www.linkedin.com/in/vidaha/) from Databricks (https://databricks.com/), please join us for an evening of Bay Area Apache Spark Meetup featuring diversity and tech-talks from women educators and engineers in data science, computer science, and education.
Thanks to Databricks (https://databricks.com) for hosting and sponsoring this meetup.
Agenda:
6:00 - 6:30 pm Mingling & Refreshments
6:30 - 6:40 pm Welcome opening remarks, announcements, acknowledgments, and introductions
6:40 - 7:15 pm Tech Talk 1 by Colleen Lewis
7:15 - 7:50 pm Community Tech Talk 2 by Kay Ousterhout
7:50 - 8:25 pm Databricks Tech Talk 3 by Sue Ann Hong
8:25 - 8:45 pm More Mingling & Networking
Tech-Talk 1: Fitting in CS when the stereotypes don't fit
Abstract: The stereotypes of computer scientists just aren't flattering. Probably every computer scientist can think of dimensions of the stereotype that just doesn't fit. Why do these stereotypes of computer scientists matter? And how might we change them and the tech industry more broadly? Learn about how Harvey Mudd College went about changing the culture of CS to go from a major with about 10% women in 2006 to over 50% in 2016.
Bio: Dr. Colleen Lewis is an Assistant Professor of CS at Harvey Mudd College where she has taught classes in both CS and social justice since 2012. Colleen frequently speaks about diversity and inclusion and has spoken at Amazon, Qualcomm, City National Bank, TurnItIn, Grace Hopper, SXSWedu, LA TechWeek, CalTech, USC, Rice, Northwestern, and the British Computing Society. She has conducted four workshops and given 33 invited talks focused on diversity and inclusion. Colleen is featured in the documentary Code: Debugging the Gender Gap.
Colleen’s research is focused on how people learn CS and how people feel about learning CS. Half of her 20 peer-reviewed publications focus on diversity and inclusion within CS. At Grace Hopper in 2016, Colleen won the Denice Denton Emerging Leader Award for her work promoting diversity and inclusion. Colleen's research is funded by a $750k grant from the NSF.
Tech-Talk 2: Apache Spark Performance: Past, Future, and Present
Abstract: Apache Spark performance is notoriously difficult to reason about. Spark’s parallelized architecture makes it difficult to identify bottlenecks when jobs are running, and as a result, users often struggle to determine how to optimize their jobs for the best performance. This talk will take a deep dive into techniques for identifying resource bottlenecks in Spark. I’ll begin with the past, and discuss instrumentation that was added to Spark to measure how long jobs spend waiting on disk and network I/O. Next, I’ll discuss future-looking work from the research community that explores an alternative architecture for Spark based on using single-resource monotasks. Using monotasks makes it trivial for users to understand bottlenecks and predict their workloads’ performance under different hardware and software configuration. This future-looking approach requires a radical re-architecting of Spark’s internals, so I’ll end with the present, and describe how lessons from that work could be applied to Spark today to give users much more information about the performance of their workloads.
Bio: Kay Ousterhout is an Apache Spark PMC member and a recent UC Berkeley Ph.D. graduate. Kay’s Ph.D. research focused on understanding and improving the performance of large-scale data analytics frameworks. In the Spark project, Kay has focused on improving scheduler performance.
Tech-Talk 3: Deep Learning Pipelines: Enabling AI in Production
Abstract:
Deep learning has shown tremendous successes, yet it often requires a lot of effort to leverage its power. Existing deep learning frameworks require writing a lot of code to run a model, let alone in a distributed manner. Deep Learning Pipelines is an Apache Spark Package library that makes practical deep learning simple based on the Spark MLlib Pipelines API. Leveraging Spark, Deep Learning Pipelines scales out many compute-intensive deep learning tasks. In this talk, we discuss the philosophy behind Deep Learning Pipelines, as well as the main tools it provides, how they fit into the deep learning ecosystem, and how they demonstrate Spark's role in deep learning.
Bio:
Sue Ann Hong is a software engineer in the Machine Learning team at Databricks where she contributes to MLlib and Deep Learning Pipelines Library. She got her Ph.D. at CMU studying machine learning and distributed optimization and worked as a software engineer at Facebook in Ads and Commerce.
See you there!

Women in Big Data Apache Spark Meetup @ Databricks