Skip to content

Big Data Meetup - November 2015

Photo of Arató Bence
Hosted By
Arató B.
Big Data Meetup - November 2015

Details

This time the main topic will be Apache Spark and our guest speaker will be Chris Fregly from IBM.

Talks:

Spark After Dark 1.5: Complete, End-to-End, Big Data Reference Pipeline using Spark, Cassandra, ElasticSearch, Redis, Zeppelin, and Docker.

Combining the most popular material from his wildly popular Advanced Apache Spark Meetup, Chris Fregly will demo a complete, end-to-end, big data reference pipeline using many popular big data tools including Spark, Cassandra, ElasticSearch, and Redis.

The talk will cover some of the following at a high-level:

  1. Building a Scalable and Performant Spark SQL/DataFrames Data Source Connector such as Spark-CSV, Spark-Cassandra, Spark-ElasticSearch, and Spark-Redshift

  2. Speeding Up Spark SQL Queries using Partition Pruning and Predicate Pushdowns with CSV, JSON, Parquet, Avro, and ORC

  3. Tuning Spark Streaming Performance and Fault Tolerance with KafkaRDD and KinesisRDD

  4. Maintaining Stability during High Scale Streaming Ingestion using Approximations and Probabilistic Data Structures from Spark, Redis, and Twitter's Algebird

  5. Building Effective Machine Learning Models using Feature Engineering, Dimension Reduction, and Natural Language Processing with MLlib/GraphX, ML Pipelines, DIMSUM, Locality Sensitive Hashing, and Stanford's CoreNLP

  6. Tuning Core Spark Performance by Acknowledging Mechanical Sympathy for the Physical Limitations of OS and Hardware Resources such as CPU, Memory, Network, and Disk with Project Tungsten, Asynchronous Netty, and Linux epoll

Speakers:

Chris Fregly is a Principal Data Solutions Engineer for the newly-formed IBM Spark Technology Center, an Apache Spark Contributor, a Netflix Open Source Committer, as well as the Organizer of the global Advanced Apache Spark Meetup and Author of the Upcoming Book, Advanced Spark.

Previously, Chris was a Data Solutions Engineer at Databricks and a Streaming Data Engineer at Netflix.

When Chris isn’t contributing to Spark and other open source projects, he’s creating book chapters, slides, and demos to share knowledge with his peers at meetups and conferences throughout the world.

Schedule:

18:30 Door opens
19:00 Talks begin
21:00 Meetup finishes

This will be an English speaking event. Venue and catering will be provided by EPAM.

Photo of Budapest Data & Analytics Meetup group
Budapest Data & Analytics Meetup
See more events
EPAM HQ
Futó utca 47-53 · Budapest