addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1light-bulblinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Distributed Computing with Hydra and HPCC Systems

We will follow the two speaker format again for this meetup.  The meetup will feature two great open source distributed systems that aren't named Hadoop!  Matt Abrams from AddThis will be presenting Hydra.  The second talk will discuss large-scale entity extraction using LexisNexis' High Performance Computing Cluster (HPCC Systems).

AddThis is sponsoring the beer for this event and LexisNexis will be sponsoring the pizza.  LexisNexis has also provided a Kindle Fire HDX that we will be raffling off at the event!

Hydra: An Introduction

Matt Abrams will present a practical introduction to using Hydra for distributed data processing. This talk will focus on technical execution rather than the low-level design and theory behind Hydra.  Several examples of different types of Hydra jobs and queries will be discussed in order to demonstrate Hydra's capabilities. The goal of this talk is to give the audience a core understanding of what Hydra is and how to use it solve data challenges.   The examples discussed during the presentation will be available for download so that you can replicate the experiments in your own development environments.

Title: Large-scale Entity Extraction and Probabilistic Record Linkage

Short Description: Large-scale entity extraction, disambiguation and linkage in Big Data can challenge the traditional methodologies developed over the last three decades. Entity linkage, in particular, is cornerstone for a wide spectrum of applications, such as Master Data Management, Data Warehousing, Social Graph Analytics, Fraud Detection and Identity Management. Traditional rules based heuristic methods usually don't scale properly, are language specific and require significant maintenance over time.

We will introduce the audience to the use of probabilistic record linkage, also known as specificity based linkage, on Big Data, to perform language independent large-scale entity extraction, resolution and linkage across diverse sources. We will also present a live demonstration reviewing the different steps required during the data integration process (ingestion, profiling, parsing, cleansing, standardization and normalization), and show the basic concepts behind probabilistic record linkage on a real-world application.


Joe Barter is a Consulting Software Engineer for LexisNexis Special Services, where he is involved in a variety of “big data” projects pertaining to LexisNexis’ High Performance Computing Cluster (HPCCSystems). Joe has worked extensively with the platform since 2008 and with HPCC Systems’ Scalable Automated Linking Technology (SALT) since 2010. Joe’s expert knowledge of SALT has enabled him to develop solutions that address complex entity disambiguation and non-obvious relationship problems, particularly in his role as key contributor to Smart View.

Joe is a graduate of the University of Dayton with over 25 years of software development experience. While he cut his coding teeth with IBM’s 360/370 assembly, his primary development languages are currently SALT, ECL, and Java.  He has a passionate interest in employing advanced analytics on “big data” to produce actionable information. Other interests include Machine Learning and Natural Language Processing.

Join or login to comment.

  • Dallin Y.

    Hey Matt, sorry I can not male it tonight. I must have shared the dog bowl with my Labrador cause we both got the stomach bug overnight. Let me know if you can post the notes or any video. Thanks and Laterz. :)

    March 4, 2014

  • Matt A. is back online and the snow is clearing. We are still on for tonights meetup. Hope to see you here! Please update your RSVP if you are unable to attend.

    March 4, 2014

Our Sponsors

  • AddThis

    AddThis is providing the space and drinks (free beer!)

  • O'Rielly

    Free Books

  • Mesosphere

    Build and run scalable and fault-tolerant applications.

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy