The Apache Solr Smart Data Ecosystem
Details
Experimenters,
As promised earlier, we will host Trey Grainger (https://www.linkedin.com/in/treygrainger) of Lucidworks (https://lucidworks.com/) while he presents the latest developments to this wonderful information retrieval platform Apache Solr (http://lucene.apache.org/solr/).
In Trey's words:
The Apache Solr (http://lucene.apache.org/solr/) Smart Data Ecosystem
Search engines, and Apache Solr in particular, are quickly shifting the focus away from "big data" systems storing massive amounts of raw (but largely unharnessed) content, to "smart data" systems where the most relevant and actionable content is quickly surfaced instead. Apache Solr is the blazing-fast and fault-tolerant distributed search engine leveraged by 90% of Fortune 500 companies. As a community-driven open source project, Solr brings in diverse contributions from many of the top companies in the world, particularly those for whom returning the most relevant results is mission critical.
Out of the box, Solr includes advanced capabilities like learning to rank (machine-learned ranking), graph queries and distributed graph traversals, job scheduling for processing batch and streaming data workloads, the ability to build and deploy machine learning models, and a wide variety of query parsers and functions allowing you to very easily build highly relevant and domain-specific semantic search, recommendations, or personalized search experiences. These days, Solr even enables you to run SQL queries directly against it, mixing and matching the full power of Solr's free-text, geospatial, and other search capabilities with the a prominent query language already known by most developers (and which many external systems can use to query Solr directly).
Due to the community-oriented nature of Solr, the ecosystem of capabilities also spans well beyond just the core project. In this talk, we'll also cover several other projects within the larger Apache Lucene/Solr ecosystem that further enhance Solr's smart data capabilities: bi-directional integration of Apache Spark and Solr's capabilities, large-scale entity extraction, semantic knowledge graphs for discovering, traversing, and scoring meaningful relationships within your data, auto-generation of domain-specific ontologies, running SPARQL queries against Solr on RDF triples, probabilistic identification of key phrases within a query or document, conceptual search leveraging Word2Vec, and even Lucidworks' own Fusion project which extends Solr to provide an enterprise-ready smart data platform out of the box.
We'll dive into how all of these capabilities can fit within your data science toolbox, and you'll come away with a really good feel for how to build highly relevant "smart data" applications leveraging these key technologies.
-----------------------------------------------
Trey (https://www.linkedin.com/in/treygrainger) is the SVP of Engineering at Lucidworks (https://lucidworks.com/), co-author of the book Solr in Action (https://www.manning.com/books/solr-in-action), and a researcher on numerous data science and information retrieval related publications. He previously served as Director of Engineering at CareerBuilder, developing their large-scale search, recommendations, and data analytics products. Trey holds degrees from Georgia Tech (masters) and Furman University (bachelors) in computer science and business and also completed masters-level study in information retrieval and web search at Stanford University.
We will, courtesy of Improving, provide Pizza, Beer, and Soda from the Improving's fabulous Plano Facility (http://improving.com/location/dallas).
Live streams of the event will be posted here in the meetup.
Would like someone to volunteer for operating a Facebook Live Stream in addition to the usual periscope live stream (which I will announce from @dfwdatascience).