Skip to content

Bay Area Apache Spark & Women in Big Data @ Databricks HQ, SF

Photo of Jules S. Damji
Hosted By
Jules S. D.
Bay Area Apache Spark & Women in Big Data @ Databricks HQ, SF

Details

Hosted and moderated by Maddie Schults (https://www.linkedin.com/in/maddieschults/) from Databricks (https://databricks.com/), please join us for an evening of Bay Area Apache Spark and WiBD (https://www.womeninbigdata.org/blog/) Meetup featuring tech-talks from women in engineering.

Thanks to Databricks (https://databricks.com) for hosting and sponsoring this meetup.

Agenda:

6:00 - 6:30 pm Mingling & Refreshments

6:30 - 6:40 pm Welcome opening remarks, announcements, acknowledgments, and introductions

6:40 - 7:15 pm Holden Karau: Bringing a Jewel (as a starter) from the Python world to the JVM with Apache Spark, Arrow, and Spacy

7:15 - 7:50 pm Anya Bida: Just enough DevOps for Data Scientists (Part II)

7:50 - 8:25 pm Shan He: Creating Beautiful and Meaningful Visualizations with Big Data

8:25 - 8:45 pm More Mingling & Networking

Tech-Talk 1: Details Coming Soon

Abstract:
With the new Apache Arrow integration in PySpark 2.3, it is now starting become reasonable to look to the Python world and ask “what else do we want to steal besides tensorflow”, or as a Python developer look and say “how can I get my code into production without it being rewritten into a mess of Java?”

Regardless of your specific side(s) in the JVM/Python divide, collaboration is getting a lot faster, so let's learn how to share! In this brief talk we will examine sharing some of the wonders of Spacy with the Java world, which still has a somewhat lackluster set of options for NLP.

Bio: Holden Karau (https://www.linkedin.com/in/holdenkarau/)

Tech-Talk 2: Just enough DevOps for Data Scientists (Part II)

Abstract: Imagine we have Ada, our data science intern. Let's run through a very simple wordcount spark job, and find a handful of potential failure points. Dozens of failures can and should happen when running spark jobs on commodity hardware. Given the basic foundation for infrastructure-level expectations, this talk gives Ada tools to ensure her job isn’t caught dead. Once the simple example job runs reliably, with the potential to scale, our data scientist can apply the same toolset to focus on some more interesting algorithms. Turn SNAFUs into successes by anticipating and handling Infra failures gracefully.

Note: this talk is a spark-focused extension of Part I, "Just Enough DevOps For Data Scientists" from Scale by The Bay 2018

https://www.youtube.com/watch?v=RqpnBl5NgW0&t=19s

Bio: Anya Bida (https://www.linkedin.com/in/anyabida/)

Abstract:

Tech-Talk 3: Creating Beautiful and Meaningful Visualizations with Big Data

Abstract:
At Uber, location data is our biggest asset. How do we create data visualizations with rich location data, render a million points of events in the blink of an eye, and, most importantly, derive insights from them? In this presentation, you'll get a behind the scenes look at the tools and data visualizations we use at Uber to inform business decisions. I will walk us through an overview of the data visualization process with a case study, discuss how and why we built our own visualization tool to visualize location data in a more meaningful way. I will also show that you can create beautiful visualizations, but in order for them to be useful, you have to understand the information you are designing.

Bio: https://www.linkedin.com/in/shan-he-25400b16/

Photo of Bay Area Spark Meetup group
Bay Area Spark Meetup
See more events
Databricks, Inc HQ
160 Spear St, Floor 13 · San Francisco, ca