Skip to content

Seattle Scalability Meetup - Overkill Analytics

Photo of clive boulton
Hosted By
clive b. and 2 others
Seattle Scalability Meetup - Overkill Analytics

Details

This meetup focuses on Scalability and technologies to enable handling large amounts of data: Hadoop, HBase, distributed NoSQL databases, and more!

There's not only a focus on technology, but also everything surrounding it including operations, management, business use cases, and more.

We've had great success in the past, and are growing quickly! Previous guests were from Twitter, LinkedIn, Amazon, Cloudant, Microsoft, 10gen/MongoDB, and more.

This month's guests:

Claudiu Barbura VP of Engineering, Analytics Platform Ubix.io

Overkill Analytics on High Dimensional Feature Spaces

In our quest for data science automation we have learned many lessons that I am going to share in this session.

Less slides and a detailed demo of the data science behind the Outbrain Kaggle competition submission, performed from our own notebook (called DSL Workbench) we built for exploratory data analysis. DSL is the fluent and expressive API we created to expose data/metadata and services from our data science platform.I will compare multiple approaches for feature engineering, reduction as well as full feature space training employing OKA (OverKill Analytics) techniques: where http://spark.ml/spark.mllib could not perform on high dimensional sparse feature spaces we employed Spark for distributing scikit-learn, VW, TensorFlow and R packages and produced ensemble models and prediction tables that still yield highly accurate predictions.
I will cover and show concrete examples for composite and progressive modeling, high dimensional and sparse feature engineering, the primitives we built for handling sparse data beyond the support in Spark or scipy.

While I’ll focus on data science at scale I will also touch on infrastructure aspects, with tips and tricks we learned with the underlying technology stack: scala, python, Spark, HDFS, Cassandra, ElasticSearch, Zookeeper, VW etc

Short bio:

Claudiu is VP of Engineering, Analytics Platform at ubix.ai (http://ubix.ai/), where he leads the development of the data and advanced analytics services that enable AutoCurious to automate and scale data science for mass insight consumption. Formerly at Atigeo where he architected the xPatterns big data platform.

Our format is flexible: We usually have 2 speakers who talk for ~30 minutes each and then do Q+A plus discussion (about 45 minutes each talk) finish by 8:45.

There'll be beer afterwards! (not hosted)

Yard House (4th and Pike)

1501 4th Ave #118, Seattle, WA 98101
Meetup Location:

Whitepages (http://maps.google.com/maps?q=1301+5th+Avenue+%231700%2C+Seattle%2C+WA), 1301 5th Avenue #1600, Seattle, WA

After-beer Location:

Doors open 30 minutes ahead of show-time. Please show up at least 15 minutes early out of respect for our first speaker.

Parking is available in the building and is valet only. Cost is $8.00 after 6pm. (Enter on Union between 4th & 5th) Additional parking can be found in the Hilton Parking Garage. Cost is $8.00 after 5pm. Enter on 6th Ave between University and Union. There is also street parking downtown.

Photo of Seattle Scalability Meetup group
Seattle Scalability Meetup
See more events
Whitepages
1301 5th Avenue #1600 · Seattle, WA