Solr, Spark, & Hive

Name: Solr, Spark, & Hive
Start: 2015-01-21T18:00:00-08:00
End: 2015-01-21T20:00:00-08:00
Location: Cloudera

Hosted by Alea A.

SFBay Apache Lucene/Solr Meetup

Details

Join us at Cloudera on Wednesday, January 21, for the below presentations, along with food, drinks, and the usual networking with other Solr enthusiasts. Hope to see you there!

Presentations will begin at 6:30pm

Ingesting HDFS data into Solr using Spark: Presented by Wolfgang Hoschek, Cloudera

Abstract: Apache Solr on Hadoop is enabling organizations to collect, process and search larger, more varied data. Apache Spark is is making a large impact across the industry, changing the way we think about batch processing and replacing MapReduce in many cases. But how can production users easily migrate ingestion of HDFS data into Solr from MapReduce to Spark? How can they update and delete existing documents in Solr at scale? And how can they easily build flexible data ingestion pipelines? Cloudera Search Software Engineer Wolfgang Hoschek will present an architecture and solution to this problem. How was Apache Solr, Spark, Crunch, and Morphlines integrated to allow for scalable and flexible ingestion of HDFS data into Solr? What are the solved problems and what's still to come? Join us for an exciting discussion on this new technology.

Speaker: Wolfgang Hoschek is a Software Engineer at Cloudera working on the Hadoop Platform and Cloudera Search team. He is a committer on the Apache Flume and Apache Lucene/Solr projects, a committer on the Kite project, a committer on the HBase Indexer project, and the lead developer on Morphlines. He is a former CERN fellow and former Computer Scientist at Lawrence Berkeley Laboratory, and former Senior Software Engineer at Skytide. He has 15+ years of experience in large-scale distributed systems, data intensive computing and real time analytics. He received his Ph.D in Computer Science from the Technical University of Vienna, Austria.

Integrating Hive and Solr for Efficient Analytical Queries: Presented by Hrishikesh Gadre, Cloudera

Apache Hive is an open-source data warehouse software providing SQL-like query interface to analyze and manage large datasets residing in Hadoop. On the other hand, Apache Solr on Hadoop provides an enterprise search platform supporting features like full-text search, hit highlighting, faceted search, dynamic clustering and many more. While Apache Hive is suitable for querying structured data, Solr’s search engine heritage makes searching and navigating text a very efficient process. In this talk, we will discuss the integration between Hive and Solr which enables efficient execution of analytics use-cases where textual data is a dominant navigational element.

Speaker: Hrishikesh Gadre is a software engineer at Cloudera working on Cloudera Search. Prior to Cloudera, Hrishikesh worked for virtualization giant VMware for more than three years building next-generation network/security virtualization platform. He has a master’s degree in Computer Engineering from Rutgers University, New Jersey specializing in large-scale distributed systems.

SFBay Apache Lucene/Solr Meetup

Solr, Spark, & Hive

SFBay Apache Lucene/Solr Meetup

Details

Related topics

You may also like