Skip to content

bol.com's multifactor Hadoop based recommender + Hadoop warehousing with Impala

Photo of Friso van Vollenhoven
Hosted By
Friso van V.
bol.com's multifactor Hadoop based recommender + Hadoop warehousing with Impala

Details

Time to meetup and talk Hadoop related topics again! bol.com (https://banen.bol.com/) has kindly offered to host us in their Utrecht based office and will also provide us with our beloved pizza's and drinks. We have two talks; one by Barrie Kersbergen who works on recommendations at bol.com and a second talk which will be announced shortly.

Agenda:

• 18.00: Arrive, eat, drink, socialise

• 19.00: First talk by Barrie Kersbergen, software engineer @ bol.com

A real world multifactor recommender system @bol.com

Creating recommendations using scalable technology, such as PIG, HIVE or Mahout, is one thing, but what does it take to develop an operating recommender system and what crucial customer behavioural factors are involved, what is the impact of the visual presentation of item being recommended or the way customers move from one item to another? Some questions related to using these customer behavior factors in a multifactor recommender system are: What should be the level of personalization? How does the real time behavior of the customer constrain the recommender system? How do we measure success when everything is constantly changing? In this talk I will address the above mentioned questions and give insight into the multifactor recommender system of online retailer bol.com. Moreover, experiences will be shared with building this recommender system and the quality of its output.

• 19.45: Short break

• 20.00: Second talk by Graham Gear, engineer @ Cloudera

Building a Hadoop Warehouse with Impala

Impala (impala.io (http://impala.io/)) raises the bar for SQL query performance on Apache Hadoop. With Impala, you can query Hadoop data – including SELECT, JOIN, and aggregate functions – in real time to do BI-style analysis. As a result, Impala makes a Hadoop-based enterprise data hub function like an enterprise data warehouse for native Big Data.

During this talk we'll explore how Impala's architecture supports query speed over Hadoop data that not only convincingly exceeds that of Hive on MapReduce/Spark/Tez, but also that of a proprietary analytic DBMS over its own native columnar format. We will explain the current state of, and roadmap for, Impala's analytic SQL functionality and provide and example configuration and benchmark suite that demonstrate how Impala offers a high level of performance, functionality, and ability to handle a multi-user workload, while retaining Hadoop’s traditional strengths of flexibility and ease of scaling.

• 20.45: Drink and socialise some more

• ????: Doors close. Everybody out.

Photo of Netherlands Hadoop User Group group
Netherlands Hadoop User Group
See more events
Bol.com
Keulsekade 189 · Utrecht