Skip to content

Stealing The Jewels From Python: Putting PySpark code into production...

Photo of Dean Wampler
Hosted By
Dean W. and 2 others
Stealing The Jewels From Python: Putting PySpark code into production...

Details

Join us to hear our special guest, Holden Karau:

With the new Apache Arrow integration in PySpark 2.3, it is now starting become reasonable to look to the Python world and ask “what else do we want to steal besides tensorflow”, or as a Python developer look and say “how can I get my code into production without it being rewritten into a mess of Java?”

Regardless of your specific side(s) in the JVM/Python divide, collaboration is getting a lot faster, so lets learn how to share! In this brief talk we will examine sharing some of the wonders of Spacy with the Java world, which still has a somewhat lackluster set of options for NLP.

Bio:

Holden is a transgender Canadian open source developer advocate with a focus on Apache Spark, BEAM, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a committer on and PMC member on Apache Spark and committer on SystemML & Mahout projects. Prior to joining Google as a Developer Advocate she worked at IBM, Alpine, Databricks, Google (yes this is her second time), Foursquare, and Amazon. When not in San Francisco, Holden speaks internationally about different big data technologies (mostly Spark). She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work she enjoys playing with fire, riding scooters, and dancing.

Photo of Chicago Spark Users group
Chicago Spark Users
See more events
IBM office, Hyatt Center, 6th floor
71 S.Wacker · Chicago, IL