October 2011 HUG Agenda:
- 6:00 - 6:30 - Socialize over food and beer(s)
- 6:30 - 7:00 - Fail-Proofing Hadoop Clusters with Automatic Service Failover
- 7:00 - 7:30 - Incremental Processing in Hadoop
- 7:30 - 8:15 - Dodd-Frank Financial Regulations and Hadoop
Fail- Proofing Hadoop Clusters with Automatic Service Failover: With the increase use of Hadoop comes an increase in the need for critical safeguards, especially in the financial industry where any data loss could mean millions in penalties.
What happens when parts of an Hadoop cluster goes down? How do Hadoop based solutions for the financial industry cope with NameNode failures? We will share the failover issues we've encountered and best practices for performing continuous health monitoring, best suited for the financial industry.
In addition, we will cover ZooKeeper-based failover for NameNode and other related SPOF services (e.g., JobTracker, Oozie, Kerberos).
Presenter: Michael Dalton, Zettaset Inc.
Incremental Processing In Hadoop: A Hadoop cluster offers limited resources in terms of CPU, disk and network bandwidth. Further, a Hadoop cluster is typically shared amongst users and has multiple jobs running concurrently that compete for resources. In such an environment, a Map-Reduce job with input data scaling petabytes, is bound to consume excessive resources, incur long delays, negatively impact other jobs and bring down the cluster throughput.
I present an extension of Map-Reduce execution model (as implemented in Hadoop) that allows incremental processing wherein a job can add input on the fly as and when required. The job may begin as a small job choosing to process a limited subset of data. As data flows through the system, useful statistics become available that help decide the additional input (if any) that needs to be added/processed. Job expansion is governed by user-defined policies that dictate the job's growth as per the available resources on the cluster. I share encouraging results from experimental evaluation under single/multi-user workloads.
Presenter: Raman Grover, PhD Student UC Irvine
Dodd-Frank Financial Regulations and Hadoop: The Dodd-Frank Act signifies the biggest US regulatory change in several decades. According to experts, Dodd-Frank will have a substantial influence over an estimated 8,500 investment managers, all the 10–12 US exchanges and alternative execution networks.
This presentation describes implementation perspective about the new regulations that specifically relate to the central clearing of OTC derivatives, and the repercussions for confirmation / settlement flows, real-time reporting and risk management. A trading platform and repository will need direct access to exchanges. This eliminates layers of risk by removing redundant data keying and duplication. Straight-through processing facilitates integration of front-to-back office systems and has the additional benefit of helping prevent illegal market manipulative practices providing the necessary audit trail. Big Data Analytics with cloud based Hadoop, Hbase, Hive along with BI tools will be necessary for straight-through processing and realtime reporting.
Presenter: Shyam Sarkar, AyushNet and Suvradeep Rudra, AyushNet
Yahoo Campus Map:
Location on Wikimapia: