Data Engineering With XFrames

This is a past event

109 people went

Location image of event venue


Because of Hadoop Summit and Spark Summit both happening in June, we have moved the June Seattle Spark Meetup up by a week to June 3rd. The great thing is that we will be jumping into the new Seattle offices of Galvanize!


xFrames is a new Python library for easily manipulating structured data at scale. It is especially well suited for data scientists and Python application developers, who can write compact and straightforward code to clean and explore large datasets, write rules-based applications, and prepare features for machine learning.

xFrames is built on Apache Spark, and offers its rich file input and output capabilities as well as its efficient large scale data operations. xFrames provides an expressive pandas-like set of operators and functions, so that the user deals with data, and does not need to understand Spark's underlying data model. It works especially well with interactive IPython notebooks.

xFrames comes packaged as a Python library, as well as docker container container including all dependencies, so that it can run on Linux, Mac OK, and Microsoft Windows with minimal setup. You can get started running locally or configure xFrames to use a Spark cluster.

About the Author:

Charles Hayden is a software engineer at Atigeo in Bellevue. Prior to that he worked at Microsoft, the New York Times, and Bell Laboratories.