Cowerkshop - "Hadoop - Introduction to processing Big Data"


Details
PLEASE RSVP FOR THIS WORKSHOP AT http://goo.gl/Xp8bqG
Workshop Cost - $25 - please enroll at http://goo.gl/Xp8bqG
Course Details:This workshop will allow the student time to become familiar with Hadoop, core components and associated applications.
The students will be systematically led through various Hadoop topics with plenty of opportunity to experiment on their own.
They will learn about Hadoop, its architecture and the two foundations of Hadoop, Hadoop Distributed File System (HDFS) and MapReduce. This will provide a solid foundation for diving deeper into Hadoop or introducing Hadoop within the workplace.
Expected Length: 2.5 hours
Prerequisites:
Download and Installl Cloudera Quickstart VM
http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html
Agenda:3pm-3:30pm
Introduction
Introductory discussion of background of students, expectations, software to be installed and agenda.
Big Data
What data challenges do we face that have given rise to products like Hadoop as well as other NoSQL products.
Big Data And modern data challenges
Scale Out vs. Scale Up
Internet Scale
Offline Batch vs. Online Transactions
Big Data and NoSQL
Hadoop Introduction
Hadoop Distributions – vendor landscape
Comparison to Traditional Equivalent Products
Why Use Hadoop and Common Use Cases
Hadoop Overview
Hadoop overview, its architecture, changes in architecture between major versions and the discussion of the vast
Hadoop Eco-system.
Architecture
Hadoop 1.0 Architecture
Hadoop 2.0 Architecture
Execution modes
Single
Pseudo-distributed
Distributed
Hadoop Eco-System HBASE
Hive
Pig
Sqoop
Exercise – install VM, run through various exercises to demonstrate Hadoop and the various components.
3:30pm-4:30pm
HDFS In-DepthThis module continues the HDFS section from the last module and takes an in-depth look at HDFS, architecture and use.
Structure and Architecture
HDFS Commands
Importing / Exporting Data
Example Usage on command line
Exercise – file manipulation using Java
4:30pm-5:30pm
MapReduce In-Depth
Theory and application
MapReduce model for processing data
Mapper
Reducer
Shuffle and Sort
YARN (MapReduce v2)
Daemons
MapReduce1 vs. YARN
Submitting MappReduce jobs
Exercise – Creating and Running Map Reduce Job
Using PIG
Exercise – using PIG to analyze data
Questions and Wrap-Up
About the Instructor:
Brian Enochson is an independent software developer, consultant, and trainer living on the Jersey Shore. He spends his time working on high throughput applications and tackling NoSQL, Big Data, and machine learning problems. Passionate about helping others through writing, mentoring, and training, Brian also loves to learn from others and their experiences and is the main reason he likes to get out and present to people. He is currently working as a consultant for a few organizations helping them with their software development and big data solutions. Brian has a M.S. in computer information systems from Boston University.
Workshop Cost - $25 - please enroll at http://goo.gl/Xp8bqG

Cowerkshop - "Hadoop - Introduction to processing Big Data"