Almost 2 years ago bol.com decided to move towards an elasticsearch powered search engine. But how do you approach such a project? who do you involve and what do you need to (not) do? The engineers at Bol.com would like to share their experiences about this migration, in 4 short talks:
Prolog: Overall project and execution, by Maarten Roosendaal (IT architect @ bol.com):
Where do you start such an endeavor? Everything and everyone is constantly changing but you have to start somewhere. In this section we will give an overview of our migration strategy and the choices we made and the challenges we faces.
Part 1: Varnishing Search Performance, by Volkan Yazici (software engineer @ bol.com):
Searching is “peanuts”. You setup your Elasticsearch cluster (or better find a SaaS partner) and start shooting your search queries against it. Well... Not really. If we put the biblical data ingestion story aside, it won't take long to realize that even moderately complicated queries can become a bottleneck for those aiming for <50ms query performance. Combine a couple of aggregations, double that for facets of range type, add your grandpa's boosting factors to the scoring, and there you go; now you are a search query performance bottleneck owner too! Maybe I am exaggerating a bit. Why not just start throwing some caches in front of it? Hrm... We actually thought of that and did so. Though it brought a mountain of problems along with itself, and there goes my story”
Part 2: To update or not to update? To re-index or not re-index, by Mary Gouseti (software engineer @ bol.com)
One thing that is evident when we look at the search problem is that it is very diverse. Apart from the searching itself we also have non-functional requirements such as data size, querying patterns, latency requirements and many others. As a result, the usages of elasticsearch are also very diverse. I have experience only with two elasticsearch systems so far and their differences surprised me. However, they had one similarity, elasticsearch doesn't like to be pushed to its limits one way or the other. So let's see how the choices over keeping the index up to date pushed our elasticsearch clusters to their limits, how they affected resiliency and bug fixing.
Part 3: Templating and user testing new queries, by Byron Vorbach (Luminus):
When replacing Endeca, it was chosen to mimic it's search functionality to make for an easier transition to our new ES engine. Now that we (almost) finished our transition, it's time to start working on some new features and some new queries!! Sounds good, but how do we know for sure whether our new queries are actually improving the engine? We need to test it! This talk is about how we apply ab-testing and templating, to test out new queries.
18:00 – 18:30 Diner
18:30 – 20:00 Presentations
20:00 – 20:30 Drinks
Over the Speakers:
Maarten Roosendaal is working at bol.com since 2010 as an IT Architect with the focus on Scalable Search, SEO and External Facing API’s
Mary Gouseti has been swimming in the elasticsearch waters at bol.com since 2014.
Volkan Yazici has been working as a Java plumber in the area of search and browse at bol.com since 2014.
Byron Voorbach is an Elasticsearch expert working at Luminus and helping bol.com with the migration project