bol.com's multifactor Hadoop based recommender + Hadoop warehousing with Impala

Time to meetup and talk Hadoop related topics again! bol.com has kindly offered to host us in their Utrecht based office and will also provide us with our beloved pizza's and drinks. We have two talks; one by Barrie Kersbergen who works on recommendations at bol.com and a second talk which will be announced shortly.

Agenda:

• 18.00: Arrive, eat, drink, socialise

• 19.00: First talk by Barrie Kersbergen, software engineer @ bol.com

A real world multifactor recommender system @bol.com

Creating recommendations using scalable technology, such as PIG, HIVE or Mahout, is one thing, but what does it take to develop an operating recommender system and what crucial customer behavioural factors are involved, what is the impact of the visual presentation of item being recommended or the way customers move from one item to another? Some questions related to using these customer behavior factors in a multifactor recommender system are: What should be the level of personalization? How does the real time behavior of the customer constrain the recommender system? How do we measure success when everything is constantly changing? In this talk I will address the above mentioned questions and give insight into the multifactor recommender system of online retailer bol.com. Moreover, experiences will be shared with building this recommender system and the quality of its output.

• 19.45: Short break

• 20.00: Second talk by Graham Gear, engineer @ Cloudera

Building a Hadoop Warehouse with Impala

Impala (impala.io) raises the bar for SQL query performance on Apache Hadoop. With Impala, you can query Hadoop data – including SELECT, JOIN, and aggregate functions – in real time to do BI-style analysis. As a result, Impala makes a Hadoop-based enterprise data hub function like an enterprise data warehouse for native Big Data.

During this talk we'll explore how Impala's architecture supports query speed over Hadoop data that not only convincingly exceeds that of Hive on MapReduce/Spark/Tez, but also that of a proprietary analytic DBMS over its own native columnar format. We will explain the current state of, and roadmap for, Impala's analytic SQL functionality and provide and example configuration and benchmark suite that demonstrate how Impala offers a high level of performance, functionality, and ability to handle a multi-user workload, while retaining Hadoop’s traditional strengths of flexibility and ease of scaling.

• 20.45: Drink and socialise some more

• ????: Doors close. Everybody out.

Join or login to comment.

  • Chantal C.

    Hey, I'm new to the group. Are you interested in a free seminar about support4research: https://www.surf.nl/agenda/2014/12/seminar-s4r-what-surf-can-do-for-research/index.html

    November 3

  • Anurag S.

    Given the great after discussion about the meetup and the knowledge of other participants, I will be glad to hear their story in the upcoming meetups. Sharing knowledge and sharp discussions are the whole point of these meetups.

    June 27, 2014

  • tzolov

    For example a competitor product (HAWQ) claims to be 6x faster than Impala for the same TPC-DS benchmark, running the complete benchmark set. (http://bit.ly/1mDZTq7, http://www.gopivotal.com/sites/default/files/SIGMODMay2014HAWQAdvantages.pdf)
    Although this sound like yet another pitch (and it could be) the point i am trying to make is that i would appreciate more objective and informative talks. Admirations for Friso, Barrie, Niels and the bol.com team for organizing this event!

    June 27, 2014

  • tzolov

    Gary made a nice Impala introduction. IMHO though the talk didn't go further than a product pitch. There were some misleading claims like the statement that Impala is ANSI SQL compliant, except some features like subqueries, ... For me this means not compliant.
    The benchmarks seemed to have been selected in way to favor the pitch. Only 21 of the 115 TPC-DS benchmark queries were selected. Explanation that this is done for manageability reason doesn't sound serious. At least not for technical audience. The benchmark standards exist for a purpose. One have to run the complete benchmark set and report honestly the results. Including the queries that could not run or have crashed. This is what i would expect from a tech talk.

    June 27, 2014

  • Daniel D.

    The first presentation (by Barrie) could've been more indepth. What kind of algorithms do they use in their recommender system(s)? What technologies, libraries do they use? Also a tip for Barrie: practise on your stage presence. You seem like a nice but somewhat timid guy. You could've spoken more loudly and told your story with more passion. Do you like what you're doing? I couldn't really tell :)
    Niels' presentation was more technical, which I really liked, but to be honest, I think he was to self conscious about his new "invention". Part of me wondered if modifying the Lambda-Architecture was really necessary, or that it was part of carving out some fame for Niels :) In general, I would have liked to hear more about business validation for all those complex ideas. The booking.com guys were spot-on with their questions ;)
    Cloudera talk was great in that Gary really knew his stuff and has experienced at presenting! Would've liked to see more best practises, not only a product pitch :)

    4 · June 26, 2014

    • Subbu

      Also, look around the internet and check out the presentations given about automated recommendation engines and compare it with the talk given. If you find that my reservations are uncalled for, please feel free to flame me. As any one who has done any form of mathematical modelling might/would confirm, the real secret sauce is always how the parameters are tweaked based on simulations and not the actual algorithms. Hence I have a strong reservation when the presenters won't even go into top level algorithms citing proprietary knowledge. I also understand that the effort needed to host 120+ attendees is not easy and kudos to Bol.com and Friso for arranging it but there is only so much that (really) good free food can make up for lack of content in the presentation.

      2 · June 27, 2014

    • Daniel D.

      Spot on about time not being free! And I also strongly believe in the fact that people should be able to be critical to each other. The arguments that are presented in this discussion are real :) If we can not be critical of one another, what use is a Meetup? It drives away all learning potential, which frankly is what I came for.

      June 27, 2014

  • Anthony G.

    I liked both presentation. The first one was more about an interesting idea and the second one about an interesting open source product. I especially would like to thank bol.com for the organisation of the event (with food and drinks) for that many people.

    June 27, 2014

  • Anurag S.

    Join us in Data in Action Meet up at Backbase office in Amsterdam:
    http://www.meetup.com/Amsterdam-Data-in-Action-Meetup/events/190198892/

    1 · June 26, 2014

    • Marcel M.

      Thanks for pointing this out, because they changed the date.

      June 26, 2014

  • Anurag S.

    I liked the meetup because we had a right balance between technology and business applications. Particularly, it was great to hear about how Bol.com makes their recommendation system work. I can understand that e-commerce companies can not disclose all about their recommendation technology because this is a key tool to compete for them.
    Cloudera talk was just ok for me because in all the major big data events, I have heard about Impala and benchmarks numbers from Cloudera. The speaker from Cloudera Gary articulated his points very well.
    The food at Bol.com was so good that I forgot for a while that I was in Holland.

    3 · June 26, 2014

  • Barrie K.

    Hi Daniel, unfortunately the microphone was lost, my voice still needs recovering :)

    The topic of my presentation was to give insight in what it takes to develop an operating recommender system. The actual algorithms are not shared on purpose for obvious reasons, but you can get Git access to the codebase via: https://banen.bol.com/vacature/software-engineer-for-non-dutch-speaker/ :)

    As I tried to explain we use a SOA architecture, consisting of Spring java technology to handle realtime with a commercial db as persistence store. All behavioral analytics is custom designed and built for our use-cases. Popular opensource libraries like Mahout and Apache Math are used to support our algorithms.

    In batch we use hadoop map-reduce jobs to do deep analysis and tasks that are not time-critical. Like realtime this is all custom built for our use-cases. Cheers, Barrie

    7 · June 26, 2014

    • Daniel D.

      Thanks for the insights Barrie! It makes the picture somewhat more clear :) I'll definitely keep the offer for Git access in mind ;)

      June 26, 2014

  • Mark G.

    Very interesting meetup that nicely represents the diversity of our field, highlighting both business perspectives and IT perspectives. For a next meetup I would really be interested in how people go about bridging the gap between these perspectives. E.g. Impala makes it possible to allow people from a more business background to apply their methodological knowledge on Big Data, but is this the right way? Similarly the bol.com solution seemed very technology driven with a number of untested assumptions about aspects of visitor behavior.

    1 · June 26, 2014

  • Nesko J.

    Thanks to Barrie, Niels, Gary and Friso. Well done.
    Not many of us are able to host 127 people for a meetup. Cheers.

    1 · June 26, 2014

  • Marcel M.

    Thanks Niels and Barry for hosting the meetup.
    Props to Bol.com too.
    And multi-thanks to Friso for organizing such excellent events.
    unfortunately there was not enough time to have in-depth presentations.
    Perhaps next time?

    June 26, 2014

  • Geert Van L.

    I enjoyed crossing the border for these sessions, if you are interested into Spark you can check out our next bigdata.be meetup - http://www.meetup.com/bigdatabe/events/189310212/

    June 26, 2014

  • Rob D.

    Thanks Niels & Barrie @ Bol.com for hosting the meetup. You set the bar high for the next one!

    June 26, 2014

  • Arun S.

    Real great. Met great people and learnt quite a bit.

    June 25, 2014

  • Jorn E.

    It was awesome! Nice food, nice people! Impala presentation could use a little spice though ;)

    June 25, 2014

  • rahim

    Location wise :(

    June 25, 2014

  • Marti K.

    Good presentations
    Fine meet & greet

    June 25, 2014

  • Jorn E.

    127 People, going to be a busy meetup :-P

    June 25, 2014

People in this
Meetup are also in:

Imagine having a community behind you

Get started Learn more
Bill

I started the group because there wasn't any other type of group like this. I've met some great folks in the group who have become close friends and have also met some amazing business owners.

Bill, started New York City Gay Craft Beer Lovers

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy