Spark Meetup chez Google avec Databricks et IBM le lundi 26 octobre 2015


Détails
Bonjour à tous,
Nous avons le plaisir de vous inviter au Spark meetup le lundi 26 octobre chez Google (8 rue de Londres à Paris) à 18h00.
Nous aurons le plaisir d'avoir 3 supers speakers dont certains venus des US pour vous parler des dernières nouveautés autour de Spark.
• 6h-6h15 Welcome
• 6:15-6:45 : Google Dataproc by Sébastien Agnan, Cloud Platform Sales Engineer at Google and Vincent Heuschling, General Manager of AffiniTechThanks!
Google Cloud Dataproc is a managed Hadoop MapReduce, Spark, Pig, and Hive service https://cloud.google.com/dataproc/
Sébastien Agnan a rejoint Google for Work en 2012 et assume aujourd'hui la responsabilité technique de l'offre Google Cloud Platform (IaaS/PaaS/Big Data) pour l’Europe du Sud. Spécialiste des architectures Cloud, il accompagne les clients Google for Work pour concevoir des solutions innovantes, en exploitant les nouvelles technologies et architectures Cloud comme le BigData, les backend mobiles, le Real Time Bidding, ... Sébastien, diplômé de l'ESEO, avec une spécialisation en architecture des systèmes d'information, était architecte puis avant ventes chez ORACLE, avant de rejoindre Google.
Vincent Heuschling is the founder of Affini-Tech a company dedicated to Bigdata solutions. He leads a team of data-engineers to help his customer to build their Bigdata Platforms. As a Google Cloud partner, Affini-tech use the Google Cloud Platform every day to run bigdata solutions like Hadoop, Spark, and Cassandra.
• 6h45-7:30 : Deep dive into Project Tungsten: Bring Spark closer to bare metal by Reynold Xin, Co-Founder of Databricks, key Spark Committer
Project Tungsten focuses on substantially improving the efficiency of memory and CPU for Spark applications, to push performance closer to the limits of modern hardware.
This effort includes three initiatives:
- Code generation: using code generation to exploit modern compilers and CPUs
- Cache-aware computation: algorithms and data structures to exploit memory hierarchy
- Memory Management and Binary Processing: leveraging application semantics to manage memory explicitly and eliminate the overhead of JVM object model and garbage collection
Project Tungsten will be the largest change to Spark’s execution engine since the project's inception. In this talk, we will give an update on its progress and dive into some of the technical challenges we are solving.
Reynold Xin is a committer and PMC member of Apache Spark. He is also a co-founder of Databricks and oversees architectural directions for Spark. Before Databricks, he was pursuing a Ph.D. in the University of California-Berkeley AMPLab, where Spark was born.
• 7:30-8:15 : Spark After Dark 1.5 by Chris Fregly, Principal Data Solutions Engineer at IBM Spark Technology Center in San Francisco
Combining the most popular and technically-deep material from his wildly popular Advanced Apache Spark Meetup, Chris Fregly will provide code-level deep dives into the latest performance and scalability advancements within the Apache Spark Ecosystem by exploring the following:
- Building a Scalable and Performant Spark SQL/DataFrames Data Source Connector such as Spark-CSV, Spark-Cassandra, Spark-ElasticSearch, and Spark-Redshift
- Speeding Up Spark SQL Queries using Partition Pruning and Predicate Pushdowns with CSV, JSON, Parquet, Avro, and ORC
- Tuning Spark Streaming Performance and Fault Tolerance with KafkaRDD and KinesisRDD
- Maintaining Stability during High Scale Streaming Ingestion using Approximations and Probabilistic Data Structures from Spark, Redis, and Twitter's Algebird
- Building Effective Machine Learning Models using Feature Engineering, Dimension Reduction, and Natural Language Processing with MLlib/GraphX, ML Pipelines, DIMSUM, Locality Sensitive Hashing, and Stanford's CoreNLP
- Tuning Core Spark Performance by Acknowledging Mechanical Sympathy for the Physical Limitations of OS and Hardware Resources such as CPU, Memory, Network, and Disk with Project Tungsten, Asynchronous Netty, and Linux epoll
- Demos *
This talk features many interesting and audience-interactive demos - as well as code-level deep dives into many of the projects listed above.
All demo code is available on Github at the following link: https://github.com/fluxcapacitor/pipeline/wiki
In addition, the entire demo environment has been Dockerized and made available for download on Docker Hub at the following link: https://hub.docker.com/r/fluxcapacitor/pipeline/
Chris Fregly is a Principal Data Solutions Engineer for the newly-formed IBM Spark Technology Center, an Apache Spark Contributor, a Netflix Open Source Committer, as well as the Organizer of the global Advanced Apache Spark Meetup and author of the upcoming book, Advanced Spark. Previously, Chris was a Data Solutions Engineer at Databricks and a Streaming Data Engineer at Netflix.
When Chris isn’t contributing to Spark and other open source projects, he’s creating book chapters, slides, and demos to share knowledge with his peers at meetups and conferences throughout the world.
• 8:15-9:30 : networking
Merci de vous enregistrer afin que l’on puisse s’assurer du bon déroulement logistique.
Un grand merci à Google pour nous prêter leur salle et s'occuper de l'apéritif dinatoire.
L'équipe HUG France http://hugfrance.fr @hugfrance (http://www.twitter.com/hugfrance)

Spark Meetup chez Google avec Databricks et IBM le lundi 26 octobre 2015