- Featuring Open Distro for Elasticsearch and Solr w/ Kubernetes
(Carl Meadows and Adithya Chandra) Open Distro for Elasticsearch: Performance Analyzer and PerfTop Elasticsearch is challenging to provision for a workload, as its performance is hard to model. Elasticsearch is also hard to troubleshoot when it comes to performance problems (and there are several!). Attend this meetup to learn about Performance Analyzer, a feature offered with Open Distro for Elasticsearch. Performance Analyzer is a system that instruments Elasticsearch and exposes metrics that have diagnostic capabilities. We will talk about the design and implementation of Performance Analyzer and also demonstrate a live demo of how it can be used in conjunction with the Open Distro PerfTop to troubleshoot a cluster issue. (Lucidworks) Scaling with K8s: Open Orchestration for Solr's Second Movement Kubernetes gives us a common language to declare how a distributed application should be installed, configured, and maintained in production. Solr offers the community a mature, distributed application for full text-search in the most demanding deployment scenarios. Lucidworks presents a great template for Solr users to embark on their journey of autoscaling and cloud orchestration in the cloud-native era.
- Using Solr's Tagger to Improve Relevancy and More
Hear from Erik Hatcher, Apache Lucene-Solr committer and co-founder of Lucidworks, as he shares practical tips for improving your search relevancy. Using the Solr tagger, key attributes of a query expression can be extracted and used to boost or filter results. If you elect to follow along, you can spin up Solr on your devices, in the cloud on a single node, or in a Kubernetes cluster. There will be a demo, food and beer.
- SolrCloud AutoScaling
Join us for an evening of networking and Solr Autoscaling discussions with at Reddit HQ! Apache Solr Committer, Varun Thacker from Lucidworks will share some of the goals of Solr's Autoscaling framework, and Reddit Software Engineer, Jerry Bao from will be sharing how the Reddit teams continues to scale their search infrastructure using features built on Solr. Food & drinks will be provided. Hope to see you there! --Talks-- --Solr's AutoScaling Framework, Presenter: Varun Thacker, Solr Committer, Lucidworks-- The goal of Solr's AutoScaling framework is for search clusters to be able to grow to a trillion documents without much human intervention. We'll discuss practical use-cases to keep the cluster healthy and performing optimally, complete with fault tolerance. For example, we'll cover how to achieve these scenarios by utilizing the framework. - Effectively managing disk space by setting triggers and sending out alerts. - Maintaining a minimum replication factor when nodes go down. We'll also use rules to make sure the replicas are spread out, thus maximizing fault tolerance. The talk will cover how to use the suggestions end-point to know the violations in your cluster. --Search Infra @ Reddit: Challenges in Scaling to Millions of Cat Posts, Presenter: Jerry Bao, Software Engineer, Reddit-- As Reddit continues to grow year after year, Search has become a vital part of the Reddit experience, both internally and externally. Come learn about the tools we’ve used, challenges we’ve faced, and lessons we’ve learned rebuilding and scaling our search infrastructure to the next million cat posts.
- Fusion Day San Francisco with Reddit and Uber
***Please register at the following page to request your spot at this event*** http://www.cvent.com/d/sgqd1g Lucidworks is excited to bring you Fusion Day San Francisco, a half-day seminar to learn how companies like Reddit and Uber are innovating more, enabling employees to work smarter, and connecting users to the content they love through personalized digital experiences. Join us on Wednesday, August 8 to learn how to forge data and behavior to connect people through common interests and ideas at work and at play. Breakfast & Lunch will be provided. Featured Speakers: - Senior Engineering Manager at Reddit - Engineering Manager at Uber - Director of Fusion Product at Lucidworks - CEO of Lucidworks Date: Wednesday, August 8, 2018 Time: 8:00am - 1:30pm Location: The W Hotel, 181 3rd St · San Francisco, CA *************** Important: Registration for this event is NOT through Meetup. Those interested in attending must register via the event page here: http://www.cvent.com/d/sgqd1g ***************
- Upcoming Developments in Solr 7
Join us for an evening of networking and Solr discussions with Solr committers we have visiting from out of town! Steve Rowe and Andrzej Białecki will share some of the recent developments they have been working on in the upcoming version of Solr, and Alexander Kanarsky will be sharing an overview of Lucidworks Fusion features built on Solr. Food & drinks will be provided. Hope to see you there! Talks-- Testing Autoscaling Framework on Large Clusters in Super-Human Time, by Andrzej Białecki OpenNLP Integration Points in Lucene/Solr, by Steve Rowe Lucidworks Fusion Overview, by Alexander Kanarsky Speakers-- --Andrzej Białecki, Engineer, Lucidworks Andrzej has been actively involved in Open Source since 1997. Currently he’s an Apache Lucene/Solr PMC Member. He’s also the author of a popular Lucene index inspection utility – Luke. --Steve Rowe, Senior Software Engineer, Lucidworks Steve is a Member of the Apache Software Foundation, and a committer and PMC member on the Lucene/Solr Project. Prior to joining Lucidworks in 2012, he spent 10 years working on NLP as a Research Software Engineer at the Center for Natural Language Processing at Syracuse University. --Alexander Kanarsky, Senior Software Engineer for Fusion, Lucidworks Alexander is a Senior Software Engineer at Lucidworks, working on developing Lucidworks Fusion. Prior to joining Lucidworks, he led Backend Search team at Trulia (now part of Zillow group), scaling up Trulia's Solr-based search infrastructure. He also was one of core developers for Zantaz (later Autonomy) Digital Safe, Lucene-based petabyte-scale world largest private email and messaging archive.
- Fusion Day San Francisco
Lucidworks Fusion is built with the power of Apache Solr & Apache Spark and provides everything you need to build and deploy intelligent search applications that wow your customers and empower your employees. Join us for Fusion Day in San Francisco to see first-hand how Fusion can help you develop powerful search apps and cut months off your development cycle. Stick around after lunch for a hands-on Lucidworks Fusion training that will help you become a Fusion pro! *************** Important: Registration for this event is NOT through Meetup. Those interested in attending must register via the event page here: http://www.cvent.com/d/b5qdkn ***************
- Performance & Ease of Use Enhancements in Solr
It's been some time since we've gathered our SF Solr folks for a Meetup! Hope you can join us for a special session to hear from Solr committers Ishan Chattopadhyaya, Noble Paul, and Andrzej Bialecki about new and upcoming enhancements for performance and ease of use in Solr. Food & drinks will be provided. Upcoming Optimizations in Solr : Presented by Ishan Chattopadhyaya Overview of upcoming performance improvements to Solr, including: in-place updates of numeric docValues fields, addition of PointFields types, and more. Speaker: Ishan Chattopadhyaya is a Solr Committer and engineer at Lucidworks. Prior to working at Lucidworks, Ishan worked on the Yahoo! Search team, Multimedia Search team, and Shopping Vertical Search team. Ishan started his career with MapQuest (Aol)'s search, building their single line search backend with Apache Lucene. Ishan has contributed to the development of the authentication and authorization framework of Apache Solr. Modern APIs for Solr: Presented by Noble Paul Hear about Solr API v 2.0 for ease of use and consistency. Speaker: Noble has 16 years of experience in building software in Java which include application servers and customer facing applications. Noble joins us after working in AOL for 7 years where he has designed and built the search solution for AOL mail and worked on various other projects including e-commerce and infrastructure software. Noble has been a committer on Apache Lucene/Solr since 2009. Solr Metrics Teaser: Presented by Andrzej Bialecki Solr committer Andrzej Bialecki will give a teaser of upcoming metrics for Solr and also some experiments he has done with Solr + Raspberry Pi. Speaker: Andrzej Białecki has 20 years of experience in software engineering, ranging from system integration, to OS development to information retrieval, to standardization of e-commerce models. He’s been actively involved in Open Source since 1997. He’s an Apache Lucene/Solr PMC Member and the author of a popular Lucene index inspection utility – Luke. Andrzej holds an MS in Electrical Engineering from Warsaw University of Technology, Poland.
- Solr as a SparkSQL DataSource
Join us at BlackRock for food, drinks, networking, and a discussion about the open source project that exposes Solr as a SparkSQL data source. Agenda: 6:00pm - 6:30pm: Eat, drink, network 6:30pm - 7:30pm: Presentation and Q&A Solr as a SparkSQL DataSource: Presented by Kiran Chitturi, Lucidworks Solr has been adopted by all major Hadoop platform vendors as the de facto standard for big data search because of its ability to scale to meet even the most demanding workflows. As more organizations seek to leverage Spark for big data analytics and machine learning, the need for seamless integration between Spark and Solr emerges. In this presentation, Kiran Chitturi introduces an open source project that exposes Solr as a SparkSQL DataSource. Attendees will come away with a solid understanding of common use cases, access to open source code, and performance metrics to help them develop their own large-scale search and discovery solution with Spark and Solr. Specifically, Kiran covers the following topics: + Using deep-paging cursors, streaming result sets, and intra-shard splitting to maximize read performance when constructing RDDs from Solr queries + High-volume reads into Spark using DocValues and Solr’s streaming API + Data-locality optimizations when Solr and Spark executors are co-located on the same host + Writing DataFrames to Solr + Writing to Solr from Spark streaming jobs + Using Solr/Lucene Analyzers to perform text analysis in Spark ML pipelines When discussing big data, especially search on big data, it’s important to establish performance metrics. For instance, how many docs per second can be indexed from Spark to Solr using this framework? Or, how many rows can be processed per second when reading data from Solr into Spark? Kiran concludes his presentation by showing read/write performance metrics achieved using a 10-node Spark / SolrCloud cluster running on YARN in EC2. Speaker: Kiran Chitturi is a software developer at Lucidworks. He works on Lucidworks enterprise product Fusion and currently leads the spark-solr code ( https://github.com/Lucidworks/spark-solr ).
- Developing Scalable Search at Playstation
Join us on Tuesday, April 19, for an evening of networking with the San Francisco Solr community and the below talk from Ai Sasho and Alvin Peng from Sony. Developing Scalable Search at Playstation: Presented by Alvin Peng and Ai Sasho, Sony Interactive Entertainment Sony Interactive Entertainment will present how the Playstation team was able to use Lucene and Solr to attain search across a user base of over 200 million gamers. The speakers Ai and Alvin were key engineers in the Playstation 4's success and will share learnings in PS4's fast growing ecosystem Speakers: Alvin Peng is a Sr. Software Engineer at Sony Interactive Entertainment who is working on scalable web services, media sharing and searching. He was a Sr. Dev Manager at a startup, and before that he was a Speech Synthesis and Recognition researcher at Microsoft. Ai Sasho is a Sr. Software Engineer at Sony Interactive Entertainment, working on social features on the PlayStation 4. Prior to joining Sony, she has worked in research, analyzing and data mining biological data. She is passionate about technologies to improve social interactions within gaming communities utilizing big data.
- SolrCloud Cluster Management APIs
Join us for an evening of networking and a SolrCloud discussion with Solr committer Anshum Gupta. Food and drinks will be provided by BlackRock. Hope to see you there! 6:00pm - 6:30pm: Networking, Food, Drinks 6:30pm - 7:30pm: Presentations 7:30pm - 8:00pm: Wrap up SolrCloud Cluster Management APIs Apache Solr is widely used by organizations to power their search platforms and often support multiple users. A lot of cluster management APIs were introduced over the last few releases, allowing users to manage operations ranging from replica placement to forcing leader elections via API calls. At the end of this talk, intermediate Solr users will understand what's available, and when can they avoid direct interference with the system, leading to more stable clusters and lower chances of nodes going down. Speaker: Anshum Gupta is a Lucene/Solr committer and PMC member with over 10 years of experience with search and related technologies. He recently joined the search team at IBM Watson, where he works on stretching the limits and improving SolrCloud, something he has been involved with for a few years now. Prior to this, he was a part of the open source team at Lucidworks and was also the co-creator of AWS CloudSearch - the first search as a service offering by AWS.