Intro to Apache Spark for the Business Analyst


Details
https://www.youtube.com/watch?v=svxBcBNcV8A
Lions and tigers and Spark, oh my! It's hard enough keeping up with the explosion in data, but just keeping track of the tools is a challenge. What is Big Data? How do I become a data scientist? How can I leverage the cloud? (What is the cloud?). These are all tough questions for anyone to answer, let alone the business analyst who does not have a strong programming and technology background. Put your mind at ease - we are here to help.
This talk will introduce the open source processing engine, Spark, highlighting not only it's awesome power but how it fits within the larger data landscape. You will learn why Spark was developed to crunch through Big Data, what MapReduce is, and why Spark can beat the pants off of it in terms of performance and ease of use. You will learn how non-programmers can get started with Spark (warning: there is no escaping code, but you can do it, we promise), where you can find great tutorials on Spark, and how you can use Spark in the cloud with IBM.
At the end of this talk you will be able to firmly place Spark in the Big Data ecosystem and articulate to your colleagues how data processing platforms have evolved to handle large amounts of data. You will know how to get started using Spark and be comfortable enough with Spark syntax to write a few lines of code like the boss you are. This talk will be fast and furious, but fun. Fasten your seatbelts and get ready to learn about Spark.
Agenda
Introduction to Big Data
• Why are we here?
• What tools do we use (what is this Hadoop thing I keep hearing about)
• Why did Spark evolve?
Introduction to Spark
• What makes Spark special (why use it)
• Why is it faster than MapReduce
• How easy is it to use
Getting Started with Spark
• Installing a Docker container from Big Data University
• Introducing the command line
• Writing our first lines of Spark with the Scala API
• Learning More About Spark
• Free resources available
• Big Data University
• IBM Cloud
https://www.youtube.com/watch?v=JfqJTQnVZvA
IBM is committing to the Apache Spark project with investments in design-led innovation and broad-scale education programs to promote open source innovation and accelerate intelligence into every application.
Apache® Spark™ is an open-source cluster computing framework with in-memory processing to speed analytic applications up to 100 times faster compared to technologies on the market today. Developed in the AMPLab at UC Berkeley, Spark can help reduce data interaction complexity, increase processing speed and enhance mission-critical applications with deep intelligence.

Intro to Apache Spark for the Business Analyst