Skip to content

Big Data, Analytics, Optimization and More Data Oh My!

Photo of Grace Law
Hosted By
Grace L. and 2 others
Big Data, Analytics, Optimization and More Data Oh My!

Details

Update: add your questions for the speakers to the Google Moderator page (http://goo.gl/vlzoEQ)

September's presentation night on 9/11 will have 4 talks about Big Data and Analytics - from prediction to search to optimization and scaling.

  1. Prediction (for numerical data) using Python by Luke Gotszling - Lightning talk

This talk will be a quick introduction to numerical prediction given historical data using Python. We will look at how we can make predictions using linear regression, EMA (exponential moving average) and other approaches.

Luke is the co-founder and CEO of finally.io (http://finally.io/), a service offering predictive analytics for monitoring servers and cloud infrastructure. Previously, he was the first employee at about.me (http://about.me/) (acquired in 2010). Outside of Python, Luke's passions include NoSQL, machine learning, and running.

  1. Explore your data with ElasticSearch by Honza Kral - Beginner

ElasticSearch is a powerful open source search and analytics engine that makes data easy to explore. We will use real data to demonstrate how ElasticSearch's real-time analytics and visualization tools can help you make sense of your application. We will also demonstrate how ElasticSearch can provide the infrastructure for features beyond search, such as automatic content categorization, user-defined categories and even real-time alerts.

Honza is a Python programmer and Django core developer. Since he is scared of the bright and shiny world of browsers, designers, and users he prefers to stay buried deep in the infrastructure code and just provides others with tools to do the actual site-building. He is working on the Python Drivers at ElasticSearch.

  1. Building Data Infrastructure at SocialCode by Will Larson - Intermediate
    SocialCode is an analytics and optimization shop. Our biggest constraint during development is our ability to share data across teams and between functions (back and forth between engineers and data scientists). This talk looks at our evolution from a monolithic Django/Python and MySQL shop to a service oriented architecture (Django Tastypie) with an underlying data pipeline for reliably aggregating content built on Kafka. Ending with a discussion of our performance oriented decision to move away from Python to Java for the lowest layers of the pipeline, and building HTTP interfaces to abstract the interactions of the Python-Java layers.

Will Larson is the VP of Platform Technology at SocialCode, a social analytics and advertising startup working on the Facebook and Twitter platforms. He came from leading the Engineering at Digg, and previously worked at Yahoo!.

  1. A Billion Rows per Second: Metaprogramming Python for Big Data - Advance

The mainstream paradigms for processing large amounts of data, such as MapReduce and NoSQL, are based on distributed computing and massive horizontal scalability. Since the publication of the original MapReduce paper by Google in 2004, the performance of a single high-end server has grown by the factor of 50.

In this talk, we show how AdRoll uses Python to squeeze the last bit of performance out of a single high-end server, for the purpose of interactive analysis of terabyte-scale datasets. This feat is made possible by Numba, a new NumPy aware dynamic Python compiler based on LLVM. Thanks to Python, the system can provide a very expressive and developer-friendly API, while keeping the complexity of implementation in check. The talk should be relevant to anyone interested in Big Data and High-Performance Computing using Python.

Ville is a Principal Engineer at AdRoll. Previously, Ville was the CEO of Bitdeli, a big data startup that provided a platform for analyzing event streams using user-defined Python code. Ville is also the original author of open-source Disco MapReduce that has been powering Python-based big data in various companies since 2008.

Logistics:

Doors will open at 6:15pm to allow enough time to check-in. Waiting list folks will be allowed into the event AFTER we admit all confirmed attendees.

If you will be bringing a guest, please provide us with their first and last name here (http://goo.gl/lV74N9).

Program begins at 7:00pm sharp and should last till 8:45pm. We will need to vacate Eventbrite by 9:00pm.

As usual, looking forward to meeting you and feel free to ping me with questions or suggestions -

Grace

P.S. Please sign up to give a lightning talk (https://docs.google.com/forms/d/1LtV839ktupRboMUSXlXqoJ9lLFvpe-TZtLCf2q6jUpY/viewform#start=openform) at a future presentation night.

Photo of San Francisco Python Meetup Group group
San Francisco Python Meetup Group
See more events