I hope you are all as excited as I am to get the Maryland Spark meetup back into action for 2018! To kick off the year, we have put together an outstanding evening of networking and hands on, code-heavy talks! The sessions and expert speakers have been selected to provide you a no fluff evening of actionable information. This is a privately funded event focused on delivering quality technical information so ABSOLUTELY NO RECRUITERS.
Databricks, the principal company behind Spark, will present and demonstrate the new and exciting features for the upcoming Spark 2.3 release. GridGain will present and demonstrate how to increase the scale and speed of Spark with the integration of Apache Ignite. Please also note the venue change back to the Arundel Hotel. I can’t wait to see everyone for an awesome evening!
5:00 – 6:00 – Networking, Happy Hour, and Refreshments
6:00 – 7:00 – Databricks presents upcoming features in Apache Spark 2.3
7:00 – 8:00 – GridGain presents Apache Ignite for streaming analytics Talks and Demonstrations:
Databricks Delta and new features in Apache Spark 2.3
This talk presents new and upcoming features in Apache Spark 2.3 and Databricks. Come learn about new Structured Streaming features like stream-to-stream joins, low-latency continuous processing, as well as updates to the SQL Cost Based optimizer, and support for ML Pipeline scoring.
In the second half of the talk, we demonstrate Databricks Delta through notebooks. Delta adds transactional support and performance optimizations to the Databricks Runtime, which allows for faster, consistent, and reliable support for concurrent batch and streaming workloads in cloud data lakes.
Apache Spark and Apache Ignite: Make streaming analytics real with in-memory computing
Streaming analytics software, and Apache Spark in particular, has entered the spotlight as companies try to improve the customer experience and other business outcomes in real-time through automation. It's the arguably the biggest improvement any company can make to their business. It's done by intelligently processing data streams and identifying important events and acting on them in near real-time by kicking off a simple action or more sophisticated business process. One challenge is how to process all this data, which has grown 50x in the last decade alone, and do it in seconds not hours. Another is how to use new types of intelligence, from simple regressions to machine learning, to improve outcomes. The answer several early innovators have turned to is a combination of Apache Spark and Apache Ignite that combines in-memory computing with collocation of data and computations. In this talk we will drill down into several examples using Apache Spark and Apache Ignite. Learn how Apache Spark is integrated with Apache Ignite through standard Spark APIs, and how Spark benefits from processing data in-memory in Apache Ignite.
Specifically in this session we will demonstrate:
- how to use Ignite as an in-memory database for Spark applications
- how to perform streaming analytics by deploying Spark stream pipeline
- how to process data stored in Ignite with Spark RDDs and DataFrames
- how to speed up SQL queries by leveraging the Ignite SQL engine and indexing
Ameet Kini, PhD., is a Resident Solutions Architect at Databricks, where he is privileged to help Federal agencies adopt and succeed with Databricks and Spark. In recent roles, Ameet has worked as a researcher, developer, and architect building production big data applications in the geospatial and graph analysis domains, supporting agencies across the Federal government. He is an early adopter of Spark, having seen the pre-structured days of v0.6. Ameet received a Ph.D. in Computer Science from the University of Wisconsin-Madison in the area of large scale data management and a B.S. in Computer Science from UMBC.
Akmal Chaudhri, PhD., is GridGain’s Technology Evangelist. His role is to help build the global Apache Ignite community and raise awareness through presentations and technical writing. Akmal has over 25 years experience in IT and has previously held roles as a developer, consultant, product strategist and technical trainer. He has worked for several blue-chip companies such as Reuters and IBM, and also the Big Data startups Hortonworks (Hadoop) and DataStax (Cassandra NoSQL Database).