On February 10, meetup with Python / Data enthusiasts and learn more about Dataswarm and Ibis.
We are soliciting lighting and full length talks for the coming year. Please submit a 5-min lightning talk here (https://docs.google.com/forms/d/1LtV839ktupRboMUSXlXqoJ9lLFvpe-TZtLCf2q6jUpY/viewform?usp=send_form). Please submit a 20, 30, or 45 mins tech talk proposal here (https://docs.google.com/forms/d/1pqYxgSghDLDPbzO9O_2fFJulqNQNRc8jgz33SIh9WZs/viewform).
Scientific data set management: A lesson learned from building the Classical Language Toolkit by Kyle Johnson
Building a Hexastore in Python by Daniel Pyrathon
Using DeepDive for meta-analysis by Eva Vivalt
Talk #1 Dataswarm
Abstract: At Facebook, data is used to gain insights for existing products and drive development of new products. In order to do this, engineers and analysts need to seamlessly process data across a variety of backend data stores. Dataswarm is a framework for writing data processing pipelines in Python. Using an extensible library of operations (e.g. executing queries, moving data, running scripts), developers programmatically define dependency graphs of tasks to be executed. Dataswarm takes care of the rest: distributed execution, scheduling, and dependency management. Talk will cover high level design, example pipeline code, and plans for the future.
Bio: Mike Starr is a software engineer in Facebook’s Data Infrastructure organization. During the previous 3 years, Mike has worked on Facebook's distributed scheduler, “Chronos”, and ETL solution, “Dataswarm." Prior to Facebook, Mike came from Wisconsin (land of beer, brats, and cheese) where he earned his B.S. in Computer Science and Computer Engineering at UW-Madison.
Talk #2 Using Python at Scale for Data Science
Abstract: While Python is a de-facto language for modern data engineering and data science, Python development has been confined to local data processing—thereby limiting its users to smaller data sets. Historically, to address bigger data workloads, Python developers have had to extract samples or aggregates, forcing compromises in data fidelity, adding ETL costs, and ultimately leading to a loss of productivity and addressable use cases. Ibis, a new open source data analytics framework for Python developers, has the goal of enabling the Python data ecosystem (NumPy, pandas, etc.) to operate efficiently at Hadoop scale. To enable high performance Python at scale without the age-old JVM interoperability problems, Ibis take advantage of unique synergies between Python and Impala, the leading open source MPP analytical query engine. In this talk, Ibis creator Wes McKinney will demo the current capabilities of Ibis as well as explain its roadmap.
Wes McKinney is the creator of Ibis and the creator of pandas,
6:00p - Check-in and mingle, with Pizza and Beer provided by our generous sponsor Yelp!
7:05p - Welcome
7:10p - Lightning Talks and Announcements while lightning speakers set up
7:35p - Talk 1 and Q&A
8:20p - Talk 2 and Q&A
9:30p - Doors close
Please take note of the important check-in details at Yelp
1. Doors open at 6:00pm to allow enough time for the check-in process. Before 6:15pm, please wait outside without blocking the building entrance. Wait list will be admitted beginning at 6:45pm. Doors close at 7:30pm.
2. Please update the name on your account to reflect your FIRST NAME and LAST NAME. Yelp security will be checking IDs downstairs. If your name on Meetup.com is not the name on your ID, then please enter your full name here (https://docs.google.com/forms/d/1d_oPoxjcAQzOJqozHIzVuFNnOYi7CDrzouywq4U9SUo/edit).
3. Since alcohol will be served at the event, we ask that any underage attendees RSVP directly to the meet up organizers.
4. Waiting list folks will be allowed into the event AFTER we admit all confirmed attendees.
5. Unfortunately, Yelp cannot safe keep your bicycles, please park your bike on the street.
Yelp is generously providing food, drinks, and beer in addition to their venue space.