Skip to content

Real time search and insights with Apache Kafka

Photo of Hannes Stockner
Hosted By
Hannes S. and 2 others
Real time search and insights with Apache Kafka

Details

**Streaming link: http://goo.gl/AruLGK

6.30pm - Doors open, Food + Drinks, Network

7.00pm - Talk - "Real-time full-text search for streaming data with Luwak, Kafka and Samza" by Alan Woodward

Traditionally, search works like this: you have a large corpus of documents, and users write ad-hoc queries to find documents within that corpus. Documents may change from time to time, but on the whole, the corpus is fairly stable.

However, with fast-changing data, it can be useful to turn this model on its head, and search over a stream of documents as they appear. For example, companies may want to detect whenever they are mentioned in a feed of news articles, or a Twitter user may want to see a continuous stream of tweets for a particular hashtag.

In this talk, we describe open source tools that enable search on streams: Luwak is a Lucene-based library for running many thousands of queries over a single document, with optimizations that make this process efficient. Samza is a stream processing framework based on Kafka, allowing real-time computations to be distributed across a cluster of machines. We show how Luwak and Samza can be combined into an efficient and scalable streaming search engine.

Alan Woodward is a director of Flax, the Cambridge-based open source search specialists, and a committer to the Apache Lucene/Solr project. Alan is the author of Luwak, Flax's scalable stored search library used by companies such as Bloomberg and Infomedia A/S in their large-scale media monitoring systems. Alan has also recently built log analytics systems using Logstash, Apache Kafka and Elasticsearch.

7.45pm - Break

8.00pm - Talk - "Streaming customer insights at British Gas Connected Homes with Kafka, Spark and Cassandra" by Josep Casals

In this talk we'll show the inner workings of Connected Home's Data Platform. Connected Home is a Centrica company and the UK's foremost player in the connected homes market. The Connected Home's data platform has been designed to stream real time service insights such as excessive consumption alerts, optimised schedules and failure detection. The talk will include an overview of the architecture and use of Apache Kafka, Apache Spark and Cassandra followed by streaming and analytics use cases. Josep is lead data engineer at British Gas connected homes.

Photo of Apache Kafka London group
Apache Kafka London
See more events
Centrica Connected Home
20 Rathbone Place, W1T 1HY · London