Cowerkshop - "Hadoop - Introduction to processing Big Data"

Name: Cowerkshop - "Hadoop - Introduction to processing Big Data"
Start: 2014-03-08T15:00:00-05:00
End: 2014-03-08T17:30:00-05:00
Location: Cowerks at the Lakehouse

Hosted by Bret M.

Jersey Shore Tech

Details

PLEASE RSVP FOR THIS WORKSHOP AT http://goo.gl/Xp8bqG

Workshop Cost - $25 - please enroll at http://goo.gl/Xp8bqG

Course Details:This workshop will allow the student time to become familiar with Hadoop, core components and associated applications.
The students will be systematically led through various Hadoop topics with plenty of opportunity to experiment on their own.

They will learn about Hadoop, its architecture and the two foundations of Hadoop, Hadoop Distributed File System (HDFS) and MapReduce. This will provide a solid foundation for diving deeper into Hadoop or introducing Hadoop within the workplace.

Expected Length: 2.5 hours

Prerequisites:
Download and Installl Cloudera Quickstart VM
http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html

Agenda:3pm-3:30pm
Introduction
Introductory discussion of background of students, expectations, software to be installed and agenda.

Big Data
What data challenges do we face that have given rise to products like Hadoop as well as other NoSQL products.

Big Data And modern data challenges

Scale Out vs. Scale Up

Internet Scale

Offline Batch vs. Online Transactions

Big Data and NoSQL

Hadoop Introduction
Hadoop Distributions – vendor landscape
Comparison to Traditional Equivalent Products
Why Use Hadoop and Common Use Cases

Hadoop Overview

Hadoop overview, its architecture, changes in architecture between major versions and the discussion of the vast

Hadoop Eco-system.

Architecture

Hadoop 1.0 Architecture

Hadoop 2.0 Architecture

Execution modes

Single

Pseudo-distributed

Distributed

Hadoop Eco-System HBASE

Hive

Pig

Sqoop

Exercise – install VM, run through various exercises to demonstrate Hadoop and the various components.

3:30pm-4:30pm
HDFS In-DepthThis module continues the HDFS section from the last module and takes an in-depth look at HDFS, architecture and use.
Structure and Architecture

HDFS Commands

Importing / Exporting Data

Example Usage on command line

Exercise – file manipulation using Java

4:30pm-5:30pm

MapReduce In-Depth

Theory and application

MapReduce model for processing data

Mapper

Reducer

Shuffle and Sort

YARN (MapReduce v2)

Daemons

MapReduce1 vs. YARN

Submitting MappReduce jobs

Exercise – Creating and Running Map Reduce Job
Using PIG

Exercise – using PIG to analyze data
Questions and Wrap-Up

About the Instructor:

Brian Enochson is an independent software developer, consultant, and trainer living on the Jersey Shore. He spends his time working on high throughput applications and tackling NoSQL, Big Data, and machine learning problems. Passionate about helping others through writing, mentoring, and training, Brian also loves to learn from others and their experiences and is the main reason he likes to get out and present to people. He is currently working as a consultant for a few organizations helping them with their software development and big data solutions. Brian has a M.S. in computer information systems from Boston University.

Workshop Cost - $25 - please enroll at http://goo.gl/Xp8bqG

Jersey Shore Tech

Cowerkshop - "Hadoop - Introduction to processing Big Data"

Jersey Shore Tech

Details

Related topics

You may also like