addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramlinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Keeping Spark on Track: Best practices using Apache Spark in Production

This event is catered thanks to the good people @databricks

Miklos Christine is a solutions architect for Databricks where he helps customers deploy and use Apache Spark to build batch and streaming applications. Miklos was previously a systems engineer at Cloudera where he helped strategic customers deploy and use the Apache Hadoop ecosystem in production. He has contributed to several projects in the open source community and holds a BS in electrical engineering and computer sciences from the University of California-Berkeley.

The purpose of this talk is to share best practices learned while developing Apache Spark workflows in production across various industries. Apache Spark is a popular distributed processing framework that allows organizations to analyze multiple streams of data for machine learning and exploratory SQL workloads. I'll discuss debugging tips, best practices for transforming datasets, and discuss integrating with existing libraries such as numpy and pandas. Code examples will be shared in python.

Join or login to comment.

Our Sponsors

  • ACM

    15% off first year membership with ACM.

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy