Solr/Lucene Meetup July 2016


Details
Session #1: "Parallel SQL and Streaming Expressions in Apache Solr 6" -- Shalin Shekhar Mangar
Apache Solr is a powerful search and analytics engine with features such as full-text search, faceting, joins, sorting and capable of handling large amounts of data across a large number of servers. However, with all that power and scalability comes complexity. Solr 6 supports a Parallel SQL feature which provides a simplified, well-known interface to your data in Solr, performs key operations such as sorts and shuffling inside Solr for massive speedups, provides best-practices based query optimization and by leveraging the scalability of SolrCloud and a clever implementation, allows you to throw massive amounts of computation power behind analytical queries.
In this talk, we will explore the why, what and how of Parallel SQL and its building block Streaming Expressions in Solr 6 with a hint of the exciting new developments around this feature.
Shalin Shekhar Mangar
Shalin Shekhar Mangar is an Apache Lucene/Solr committer since 2008 as well as a member of the Lucene/Solr project management committee. He worked at AOL for five years on vertical search, content mangement systems, social/community platforms and anti-spam systems as well as AOL WebMail's Inbox Search system which uses a highly customized version of Apache Solr to service tens of millions of users and more than a billion index/search operations a day. He currently works at LucidWorks Inc. on Apache Solr, mostly on the SolrCloud side of things.
Session #2: "Bitter-sweet taste of scaling with Solr" -- Harshvardhan Shrivastava
Flipkart uses Solr as indexing+query engine for its log-aggregation and storage service (internally called LogSvc). Today all of Flipkart’s engineering logs to the scale of 2.2 million events per second go through its data pipeline that used to serve 40K/s - using shard performance aware algorithm for indexing without increasing the cluster size. All of which gets queried in near real-time (< 2 min) over the rolling window of size 4 hours. The same log events get stored at HDFS for long term archival and large scale analytics, but Solr is the only interface to near-realtime log-based debugging.
Running indexing service at this scale and upkeep of underlying infrastructure 24/7 over the private cloud (265 node SolrCloud cluster) is an interesting engineering proposition. We at Flipkart CloudPlatform team faced many challenges in the process and some required non-trivial efforts ranging from GC tuning and expensive-query-protection to distributed-race-condition fixes. This talk is a story of what happens at that mind-numbing scale and our experience dealing with it everyday.
Harshvardhan Shrivastava:
Harshvardhan is an SDE-2 in Infrastructure team at Flipkart. He works with Log service team that aims to aggregate logs across all systems and services in Flipkart and provide storage and indexing services on top on them. He is interested in high-performance, high-efficiency, low-cost computing and all things scale.

Solr/Lucene Meetup July 2016