7th Spark London Meetup @ Strata + Hadoop World

Name: 7th Spark London Meetup @ Strata + Hadoop World
Start: 2015-05-05T18:30:00+01:00
End: 2015-05-05T21:30:00+01:00
Location: Hilton London Metropole

Hosted by Martin G. and Francesco B.

Apache Spark+AI London

Details

We are excited to announce the 7th Spark London meetup which will take place after the first day of the Strata + Hadoop World Conference.

We will have two talks followed by a hard-hitting panel session!

The talks will be by Patrick Wendell (co-founder of Databricks) and Deenar Toraskar (Big Data Platform Development Manager/Architect at Deutsche Bank)

The panel session will see Sean Owen (co-author of 'Advanced Analytics with Spark'), Tom White (author of 'Hadoop: the Definitive Guide'), and Steve Loughran (Hortonworks and Apache Software Foundation) join Patrick Wendell on the stage. Our topic will be: 'The Future of Spark'. (Bios at the bottom)

Please tweet #sparklondon with all the Spark questions you need answered!

Thanks again to Cloudera for all the support.

Talks start at 7pm

BIO AND SYNOPSIS

Patrick Wendell

Patrick Wendell is an engineer at Databricks as well as a founding Committer and PMC member of Apache Spark . In the Spark project,Patrick has acted as release manager for several Spark releases,including Spark's recent 1.3 release. Patrick also maintains several subsystems of Spark's core engine.

Before helping start Databricks, Patrick obtained an M.S. in Computer Science at UC Berkeley. His research focused on low latency scheduling for large scale analytics workloads. He holds a B.S.E in Computer Science from Princeton University.

Talk: "Spark DataFrames: Simple and Fast Analysis of Structured Data"

Abstract: This talk will provide a technical overview of Spark’s DataFrame API. First, we’ll review the DataFrame API and show how to create DataFrames from a variety of data sources such as Hive, RDBMS databases, or structured file formats like Avro. We’ll then give example user programs that operate on DataFrames and point out common design patterns. The second half of the talk will focus on the technical implementation of DataFrames, such as the use of Spark SQL’s Catalyst optimizer to intelligently plan user programs, and the use of fast binary data structures in Spark’s core engine to substantially improve performance and memory use for common types of operations.

Deenar Toraskar

TITLE: Simple, fast and flexible Value At Risk (VaR) aggregations and reporting using Spark SQL

DESCRIPTION

Value at risk (VaR) is a widely used risk measure used by risk managers. Value At Risk is not simply additive. This provides unique challenges to report VaR at a any aggregate level such as portfolio or business line as traditional database aggregation functions don't work. The Hive complex data types and Spark SQL user defined functions can be used very effectively to provide simple, fast and flexible VaR aggregation.

Panel's Biographies:

Sean Owen

Sean is Director of Data Science at Cloudera. Bachelor in Coputer Science at Harvard, and MBA in Entrepreneurial Management at LSE, in the past Sean has worked as senior engineer at Google. Sean is co-founder of "Advanced Analytics with Spark".

Tom White

Tom has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. Actually he is engineer at Cloudera. He is author of "Hadoop: The Definitive Guide" for O'Reilly. Tom has a Bachelor's degree in Mathematics from the University of Cambridge, and a Master's degree in History and Philosophy of Science from the Universities of Leeds, UK, and Florence, Italy.

Steve Loughran

Steve is expert in distributed systems and the problems and technologies of datacentre-scale applications, especially for deployment and testing. He works at Hortonworks building the future datacentre-scale operating system for the world's datacentres.
Current areas of research: the emergent open source datacentre-OS stack based on Hadoop; cloud infrastructures and deployment of the Hadoop stack on cloud platforms. Key operational challenges of testing, scalability, reliability, and operations, and that of making the application agile enough to work in a dynamic infrastructure
Active Committer at Apache Hadoop; inactive committer at Apache Ant and Axis open source projects

Patrick Wendell

Co-founder of Databricks (http://www.databricks.com/), PhD student in computer science at Berkeley. Previously, Patrick worked in the Computer Science department at UC Berkeley focusing on data-intensive large scale computing. Before Berkeley, he was at Princeton as an undergrad where he worked on Internet measurement and content distribution. Over the last five years Patrick held internships at several technology companies, including Google, Cloudera, and Conviva. Patrick has a long-standing interest in high-tech entrepreneurship.

Apache Spark+AI London

Evolution AI

Man Group

G-Research

ArcticDB

7th Spark London Meetup @ Strata + Hadoop World

Apache Spark+AI London

Details

Related topics

Sponsors

Evolution AI

Man Group

G-Research

ArcticDB

You may also like