Getting Elasticsearch to Play Nicely with Foursquare's Offline Machine Learning and Aggregation Ecosystem
Foursquare has a diverse machine learning pipeline producing useful features to enhance search, ranging from quality scores on venues and lists, to natural language processing on tips and menus, to photo classification, to aggregation of features into high level intent indicators. These features are built daily offline based on new data and are regularly introduced, overhauled, or deprecated by engineers and data scientists. Ensuring that our search engine has timely access to the latest features, and is easily extensible for developers to test and prototype new features, while being reliable and performant for production search, has presented interesting challenges and will be the focus of this talk.
We will discuss how we leverage our offline infrastructure to regenerate indexes using the latest features from source and learned data, how we set up our cluster to sandbox offline index construction, how we rollout these indexes to replace current production indexes, handing over live writes and reads and incorporating recent updates, and some of the decisions we had to make in structuring our indexes and queries to work in this environment. And we will address some of the challenges we have faced integrating Elasticsearch's live ad hoc search and aggregation features with specialized high-performance retrieval indexes constructed offline.
Nate Folkert is a long-time engineer at Foursquare who stumbled into managing and ultimately completely rearchitecting the Elasticsearch infrastructure after inadvertently breaking it. Besides life-logging, gamification of everyday experiences, and torturing data, his passions are his wife and two little girls, and insufferable dilettantism. He is currently recovering from GDPR PTSD by scheming about ways foursquare can use their data to delight users. You can find him under the handle nfolkert on most social media old people use.
Data Science in the Elastic Stack
Michael will present an overview of Elastic's machine learning capabilities.
As we know, data science work can be messy, fractured, and challenging as data volumes increase. This session will explore how the Elastic stack can offer a single destination for data ingesition and exploration, time series modeling, and communication of results through data visualizations by focusing on a few sample data sources.
We will also explore new functionality offered by Elastic machine learning, in particular an integration with our APM solution.
Trained as a mathematician, Michael Hirsch started his career with no development experience. His first task - "model the world in a relational database." Over the last 7 years Michael has established himself a data scientist, with a focus on building end-to-end systems. In his career, he has built machine learning powered platforms for clients including Nike, Samsung, and Marvel, and approaches his work with the idea that machine learning is only as useful as the interfaces that users interact with.
Currently, Michael is a Product Engineer for Machine Learning at Elastic. He focuses on tailoring Elastic's ML offering to customer use cases, as well as integrating machine learning capabilities across the entire Elastic Stack.