Optimizing Multilingual Search

  • December 17, 2014 · 6:30 PM

Multilingual search requires the developer to address challenges that don’t exist in the monolingual case. In Solr, a robust multilingual search engine requires different analysis chains for each language because each language has its own logic for tokenization, lemmatization, stemming, synonyms, and stop words. To make multilingual search even harder, query strings are typically no longer than a handful of words, making language identification of query strings more difficult, or at worst ambiguous even to a human (“pie” could be an English or Spanish query). We’ll explore the breadth of Solr schema and configuration options available to a multilingual search application developer to balance functionality, performance, and complexity. We’ll dive deep into specific experiments with a practical application.

Speaker Bio: David Troiano 

David Troiano is a Principal Software Engineer at Basis Technology who develops the services and applications that consume the core natural language processing products that Basis delivers.  Over the past five years, he has worked on content search, discovery, and recommendation systems built on Lucene / Solr, with an eye toward scalability and performance.  He also has professional experience with machine learning and predictive analytics tools in the quantitative finance industry.  David holds a bachelor’s degree in Computer Science from Harvard College.

Join or login to comment.

  • Carlos V.

    Here is the full link to the presenation: http://www.slideshare.net/basistech/optimizing-multilingual-search-david-troiano2. Hope everyone enjoyed the talk!

    January 2

  • Ian G.

    We're glad David's talk was well received. His slides have been posted to slideshare. optimizing-multilingual-search-david-troiano

    December 31

  • Jim O.

    Great job. Great topic, talk and qa. Happy Holidays.

    1 · December 17

  • Evan M.

    new topic for me and people were very friendly

    1 · December 17

  • Jack K.

    Sigh... I have a conflict, so I'll miss this meetup. Anybody who wants to meetup informally to talk Solr/Lucene (or DataStax Enterprise Search/Solr) over the next few weeks should let me know!

    December 17

  • Taj H.

    I would like to connect with someone at this event with expertise in building faceted, semantic search to search and match resumes with jobs using Lucene, Solr or ElasticSearch. [masked] or[masked]

    December 17

  • Carlos V.

    For those of you who just need to know: the user group meeting will be on the 44th floor of 1515 Broadway. Hope to see you all there!

    December 16

  • odoncaoa

    Real interested in acquiring multilingual insider's developer insight. Curious about how the degree to which cross language semantics may have gotten involved, and solutions that might have been devised, as well. Long time NYC Semantic Web group member ;^) Would also be very interested in learning about details of the morphological analytic process(es) employed by Rosette Linguistics, which interrogates pages, paragraphs, sentences of (Mandarin) Chinese logographic text, ultimately determining the actual words being represented along the way; and the internals of the Chinese translation process which could then be at work.

    August 9

Our Sponsors

People in this
Meetup are also in:

Sometimes the best Meetup Group is the one you start

Get started Learn more

I'm surprised by the level of growth I've seen since becoming an organizer, it's given me more confidence in my abilities.

Katie, started NYC ICO

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy