IBM is hosting a developer conference - Essentially the conference is ‘By Developers, for Developers’ based on Open technologies.
This will be held Feb 20 - 22nd in Moscone West.
We have a phenomenal list of speakers for the Spark community as well as participation by the Tensorflow, R, and other communities.
https://developer.ibm.com/indexconf/communities/ Register using this Promo Code to get your free access.
Follow this link: https://www.ibm.com/events/wwe/indexconf/indexconf18.nsf/Registration.xsp?open
Select “Attendee” as your Registration Type
Enter your Registration Promotion Code*: IND18FULL
Promotion Code expires February 12th, 11:59PM Pacific
Government Owned Entities (GOE’s) not eligible
*If you have previously registered, please reach out to Jeff Borek ([masked]) to take advantage of the new discount code.*
Detailed Agenda for Spark Community Day Feb 20th
2:00 - 2:30
What the community does in the coming release Spark 2.3.
Sean Li, Apache Spark committer & PMC member from Databricks
There are many great features added to Apache Spark. This talk is to provide a preview of the new features and updates in the coming release Spark 2.3.
2:30 - 3:00
Data Warehouse Features in Spark SQL
Ioana Ursu, IBM Lead contributor on SparkSQL
This talk covers advanced Spark SQL features for data warehouse such as star-schema optimizations and informational constraints support. Star-schema consists of a fact table referencing a number of dimension tables. Fact and dimension tables are in a primary key – foreign key relationship. An informational or statistical constraint can be used by Spark to improve query performance.
Building an Enterprise/Cloud Analytics Platform with Jupyter Notebooks and Apache Spark
Frederick Reiss Chief architect of IBM Spark Technology Center
Data Scientists are becoming a necessity of every company in the data-centric world of today, and with them comes the requirement to make available a flexible and interactive analytics platform that exposes Notebook services at web scale. In this session we will describe our experience and best practices building the platform, in particular how we built the Enterprise Gateway that enables all the Notebooks to share the Spark cluster computational resources. 3:45-4:15
The State of Spark MLlib and New Scalability Features in 2.3
Nick Pentreath, Spark committer & PMC member
This talk will give an overview of Spark’s machine learning library, MLlib. The new 2.3 release of Spark brings some exciting scalability enhancements to MLlib, which we will explore in depth, including parallel cross-validation and performance improvements for larger-scale datasets through adding multi-column support to the most widely-used Spark transformers.
4:15 - 5:15
Spark and AI
Nick Pentreath & Fred Reiss
This session will be an open discussion of the role of Spark within the AI landscape, and what the future holds for AI / deep learning on Spark. In recent years specialized systems (such as TensorFlow, Caffe, PyTorch and MXNet) have been dominant in the domain of AI and deep learning. While there are a few deep learning frameworks that are Spark specific, often these frameworks are separate from Spark and the ease of integration and feature set exposed varies considerably.
5:15 - 6:00
HSpark: enable Spark SQL query on NoSQL Hbase tables
Bo Meng, IBM Spark contributor, Yan Zhou IBM Hadoop Architect HBase is a NoSQL data source which allows flexible data storage and access mechanisms. While leveraging Spark’s high scalable framework and programming interface, we added SQL capability to HBase and an easy of use interface for data scientists and traditional analysts. We will discuss how we implement HSpark by leveraging Spark SQL parser, mapping different data types, pushing down the predicates to HBase and improving the query performance