Announcing a new Meetup for New York Hadoop User group!
What: [NYC Hadoop 7/21]Debugging MapReduce(Cloudera); News/Blog Analysis (Stony Brook)
When: July 21,[masked]:30 PM
Price: $5.00 per person
ContextWeb, 17th floor
22 Cortlandt Street
New York, NY 10007
we will have 2 presentations:"Debugging MapReduce Programs Locally" - Aaron Kimball (Cloudera)
Debugging in the distributed environment poses significant challenges. Failures occur on remote machines, making it difficult to attach debuggers to processes, and requiring distributed collection of logs to understand what went wrong. Distributing code to a cluster environment is also slow, making debugging a time-consuming process. But many problems can be diagnosed locally, given proper test tools. This talk will review some practices for local debugging of MapReduce programs, and introduce MRUnit, a unit test framework developed by Cloudera to address specific needs when unit testing Mapper and Reducer implementations."News and Blog Analysis with the Lydia system" by Mikhail Bautin (Stony Brook University)
Processing large-volume historical news datasets is computationally intensive.
To avoid the bottleneck of a single relational database of our legacy "Lydia" text analysis system and to dramatically increase the scale of our news analysis, we have designed a new data aggregation and processing architecture that encompasses components such as text and derived statistics processing phases, an on-demand data retrieval server, and a user interface web application.
The new Lydia architecture code-named 'Freedonia' based on the Hadoop. The new architecture contains scalable versions of duplicate article detection, entity popularity and sentiment time series calculation, cross-document co-referential entity name identification, and statistic aggregation across groups of co-referential entities and other types of groups. The data is accessible to social scientists through a web interface, and advanced users can access our data services programmatically through an appropriate API.
Learn more here:http://www.meetup.com/Hadoop-NYC/calendar/10780094/