Hello Hadoopers RSVPs is open for the March Bay Area Hadoop user group at Yahoo!'s Sunnyvale campus. Please note that the location has changed -
Building C, Second Floor, Classroom 5 It's in the same campus, just cross the street and walk pass building D to Building C
- 6:00 - 6:20 - Socializing and Beers
- 6:20 - 6:50 - Preview to the Hadoop Security Release Owen O'Malley, Yahoo!
- 6:50 - 7:20 - MapReduce Online Tyson Condie University of California, Berkeley
- 7:20 - 7:50 - High level distributed programming with Clojure, Cascading, and Hadoop Bradford Cross, Flightcaster
- QnA and Open Discussion
Session details are available below. Looking forward to seeing you there! Dekel MapReduce Online
MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, the output of each MapReduce task and job is materialized to disk before it is consumed. In this talk, I will describe a modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, and can reduce completion times and improve system utilization for batch jobs as well. The Hadoop Online Prototype (HOP) is our modified version of the Hadoop MapReduce framework with pipelining support. It enables online aggregation, which allows users to see "early returns" from a job as it is being computed. HOP also supports continuous queries, which enable MapReduce programs to be written for applications such as event monitoring and stream processing. HOP retains the fault tolerance properties of Hadoop, and can run unmodified user-defined MapReduce programs in both pipelined and traditional blocking modes. Bio: Tyson Condie is a Ph.D. student at the University of California, Berkeley, whose research focuses on data management and distributed systems. He has been advised by Prof. Joseph M. Hellerstein since entering the Berkeley Ph.D. program in 2004. His thesis at Berkeley focuses on designing and developing distributed system software in a high-level declarative language. Prior to Berkeley graduate school he was at Stanford University where he earned a Masters degree in Computer Science under Prof. Hector Garcia-Molina. His industry experience includes research internship positions at Intel and Yahoo! as well as full-time development positions at Sybase and Oracle. High level distributed programing with Clojure, Cascading, and Hadoop
Presenter: Bradford Cross Flightcaster built a scalable machine learning system in Clojure wrapping Cascading and Hadoop. The infrastructure that wraps Cascading/Hadoop and its configuration/deployment to EC2 clusters is all written in Clojure. Come and see how much simpler and more fun your life can be.