A Primer into Jupyter, Spark on HDInsight, and Office 365 Analytics with Spark


Details
We have an exciting Spark @ Microsoft set of sessions!
Machine Learning and Automatic Visualizations on Spark via Jupyter Notebook
Come see how we have built support into Jupyter Notebook so that you can get an automatic Spark context that will generate data visualizations with no extra work. You’ll see how we’ve explored a dataset and developed some intuitions on it thanks to automatic visualizations. We will also show you the Spark Machine Learning pipeline we’ve built and how you can query it for predictions. Furthermore, learn how you can do all of this locally in the cluster or in a remote installation of Jupyter.
Speaker: Alejandro Guerreo Gonzalez
Alejandro is a software engineer at Microsoft working on Big Data technologies since 2013. He is currently working to bring Spark to as many people as possible by extending the Jupyter notebook to integrate with Spark and do automatic visualizations. He shares his passion for data with a team that works with customers who are building solutions on top of Spark.
Spark on HDInsight Revealed
Since HDInsight launched Spark clusters last year, HDInsight spark team’s mission has been making Spark easy-to-use and production-ready. In the process, we have explored many open source technologies such as Livy, Jupyter, Zeppelin. In this talk, we will demo top customer features, deep dive into HDInsight Spark architecture, and share learnings from building the perfect cluster.
Speakers: Judy Nash and Lin Chan
Judy Nash is a software engineer at HDInsight team at Microsoft, working on bringing big data technology to Azure. A Spark and Ambari contributor, she is a key developer in delivering Spark on HDInsight’s Windows and Linux offerings.
Lin is a senior software engineer at HDInsight team at Microsoft, working on bringing big data technology to Azure. After 8+ years in SQL Server working in storage and allocation, he is treading into the big data world and delivering Spark on HDInsight.
Coming straight out of Spark Summit 2016 East
Using Spark to power the Office 365 Delve Organization Analytics
Keeping millions of paying users happy - how the Office 365 Customer Fabric helps us attract, retain and engage users. Office 365 is one of the world’s biggest subscription services—expectations are high as customers trust us with their most critical documents, communications and workflows. Keeping everyone happy and creating opportunities for us to delight our users is no easy task. We’ve leveraged Cassandra and Spark among other systems to build our "Customer Fabric", where inferences are generated to drive outreach, satisfaction, upselling, acquisition as well as predictive user interfaces. We believe deep user understanding is strategic and key to sustained growth and engagement. To do this, the Office 365 Customer Fabric team deployed Apache Spark, Apache Kafka, and Cassandra in their production environment. With so much data velocity with high-frequency data and 10TB stored a day, they needed a platform that could handle their diverse big data requirements. In this session, we will discuss how we used Spark to solve their need for both real-time and batch mode analytics. This includes utilizing user and customer level aggregations, MLlb techniques for ticket classification, and real-time ingestion of data.
Speakers: Olga Ivanova and Arun Jayandra
Olga Ivanova: Lead PM in Office 365 Customer Fabric team at Microsoft on a mission to keep millions of paying user happy with help of our Customer Fabric big data products. Prior to this, Olga led the Customer Lifecycle team in Office 365 Enterprise Cloud, building solutions for large enterprise customers going through divestitures, M&As and other migrations, while working closely with each large customer in transition.Olga joined Microsoft 10 years ago after graduating from University of Pennsylvania's M&T program.
Arun Jayandra: Lead developer in Office 365 Customer Fabric Team working on building platform to enable to big data applications. Previously, Arun was lead developer in Azure Active Directory responsible to build the web services that was used by office 365 user management portal and owned the Powershell module to manage O365 AAD data in cloud. Prior to joining Microsoft 10 years ago, Arun has extensive experience working in multiple startup companies related to CRM and Payment Processing system.

A Primer into Jupyter, Spark on HDInsight, and Office 365 Analytics with Spark