[SF]Scalable ElasticSearch-Spark Connector, Spark SQL/DataFrames, DataSource API

This is a past event

436 people went

Location image of event venue

Details

Scheduled to coincide with the start of the Elasticon 2016 Conference (https://www.elastic.co/elasticon/conf/2016/sf) in San Francisco!

Abstract

We've asked the Elastic (http://www.elastic.co) authors of the elasticsearch-spark connector (https://github.com/elastic/elasticsearch-hadoop) - including Costin Leau - to join us to give a deep dive into the details of their implementation of the Spark SQL Data Sources API.

Additionally, we'll provide a quick overview of the somewhat-hidden Spark SQL Data Sources API - as well as DataFrames and DataSets.

Special thanks to Rackspace (https://www.rackspace.com/) for offering up their awesome office space to host this awesome meetup.

Agenda

6:30-7pm: Arrive and Mingle

7:00-7:15pm: Announcements and Updates (Chris Fregly and Kimberly Palmer)

7:15-7:30pm: Quick Overview of the somewhat-hidden Spark SQL Data Sources API - as well as DataFrames and DataSets. (Chris Fregly, IBM Spark Tech Center)

7:30-8:30pm: Code-level, Deep-dive of the elasticsearch-spark connector from the Elastic developers, themselves! (Costin Leau, Elastic)

8:30-9:00pm: Spark + ElasticSearch In Action (Urvish Mahida, Loggly)

9pm: De-mingle and Leave

Relevant Links

Here is more info on the ElasticSearch-Spark integration:

https://www.elastic.co/guide/en/elasticsearch/hadoop/master/spark.html

Elasticsearch AWS Service:

https://aws.amazon.com/blogs/aws/new-amazon-elasticsearch-service/

This is an implementation of the Spark SQL Data Sources API similar to the following:

1) https://github.com/fluxcapacitor/pipeline

2) https://github.com/databricks/spark-csv

3) https://github.com/databricks/spark-avro

4) https://github.com/datastax/spark-cassandra-connector