Skip to content

Details

NOTE: The meeting will start at 18:00 as usually (in the past, by mistake 8 PM was choosen in the web-ui on this page).

Title: Finding a needle in a stack of needles - adding Search to the Hadoop Ecosystem

Speaker: Wolfgang Hoschek

Abstract:

Apache Hadoop is enabling organizations to collect larger, more varied data - but after it's collected how will it be found? Your users expect to be able to search for information using simple text based queries -- regardless of data location, size, and complexity.

How do they quickly find information that's just been created, or been stored for months or even years? Cloudera Search Engineer Wolfgang Hoschek will present their solution to this problem; what architecture is necessary to search HDFS and HBase? How was Apache Solr, Lucene, Flume, MapReduce, HBase and Morphlines integrated to allow for Near Real Time and Batch indexing of documents? What are the solved problems and what's still to come? Join us for an exciting discussion on this new technology.

Bio:

Wolfgang Hoschek is a Software Engineer on the Platform and Cloudera Search team. He is a committer on the Apache Flume and Apache Lucene/Solr projects, a committer on the Kite project and the lead developer on Morphlines. He is a former CERN fellow and former Computer Scientist at Lawrence Berkeley Lab. He has 15+ years of experience in large-scale distributed systems, data intensive computing and real time analytics. He received his Ph.D from the Technical University of Vienna, Austria.

Members are also interested in