Cambridge Search Meetup - Hadoop & Solr

After the successful hackday (by the way we're still hoping to get the application we built live, can you help?) we're back to the usual evening Meetup. Our first talk is by Tom White:

"Recently Apache Solr has been integrated into the Hadoop ecosystem to provide full text search at "big data" scale. This talk will give an overview of how Cloudera has tackled integrating Solr into the Hadoop ecosystem and highlights some of the design decisions and future plans. Learn how Solr is going to get closer to Hadoop, which contributions are going to what project, and how you should consider tackling search at Hadoop scale in the future. "

Tom White is a Software Engineer at Cloudera, has been an Apache Hadoop committer since February 2007 (and is a PMC Member), and is a member of the Apache Software Foundation. He is also the author of the best-selling O'Reilly book, "Hadoop: The Definitive Guide." Tom has a Bachelor's degree in Mathematics from the University of Cambridge and a Master's in Philosophy of Science from the University of Leeds, UK.

For our second talk I'll be discussing some of the work we've done recently for media monitoring companies, and in particular how we've developed a way of applying tens of thousands of stored queries to a document in around a second. We'll shortly be open-sourcing some of the core technology behind this idea (based on a branch of Apache Lucene). We're hoping this will be useful not just for those who need to monitor incoming documents, but for automatic classification and categorisation as well.

PLEASE NOTE we have a new venue this time as the usual place is booked up. We'll have a bar tab and some free nibbles as usual. We'll be in the room at the top of the stairs.

Join or login to comment.

  • Charlie H.

    Thanks to the sterling efforts of Matt Pearce and Tom Mortimer, we have developed some of the software we built on the day into - a search engine for UK MP's tweets.

    February 4, 2014

  • Vladimir

    The details about how publishing industry hinders the information from being analysed and how the search techniques allow to overcome this were fascinating, Though I agree that machine learning (e.g. neural networks) should be considered at least to reduce the number of manual work. E.g. by setting the acceptance/rejecting threshold below which a person should review and confirm the outcome of the algorithm.

    All the best at Dublin, Charlie!

    September 13, 2013

  • Charlie H.

    Great to see you all and thanks to Tom White for speaking. I usually review our Meetups on our Flax blog but it seems odd to review myself - but feel free to do so yourselves!

    September 13, 2013

Our Sponsors

  • Flax

    organising, drinks, snacks

  • Amazon CloudSearch

    Drinks and nibbles for the meetup at Enterprise Search Europe 2013

  • LucidWorks

    Sponsorship of the Lucene/Solr Hackday on 26th July 2013

  • Intelblocks

    Sponsorship of the April 2014 Enterprise Search Europe Social

  • Heliosearch

    Sponsorship of the April 2014 Enterprise Search Europe Social

  • Datastax

    Sponsors of the Cambridge Search Meetup on May 14th 2014

  • Elasticsearch

    Free food, drinks and swag for the Elasticsearch Hackday

People in this
Meetup are also in:

Create a Meetup Group and meet new people

Get started Learn more

I started the group because there wasn't any other type of group like this. I've met some great folks in the group who have become close friends and have also met some amazing business owners.

Bill, started New York City Gay Craft Beer Lovers

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy