Large-scale entity extraction, disambiguation and linkage in Big Data can challenge the traditional methodologies developed over the last three decades. Entity linkage, in particular, is cornerstone for a wide spectrum of applications, such as Master Data Management, Data Warehousing, Social Graph Analytics, Fraud Detection and Identity Management. Traditional rules based heuristic methods usually don't scale properly, are language specific and require significant maintenance over time.
We will introduce the audience to the use of probabilistic record linkage, also known as specificity based linkage, on Big Data, to perform language independent large-scale entity extraction, resolution and linkage across diverse sources. We will also present a live demonstration reviewing the different steps required during the data integration process (ingestion, profiling, parsing, cleansing, standardization and normalization), and show the basic concepts behind probabilistic record linkage on a real-world application.
Bio (Speaker changed!):
Arjuna Chala is the Sr. Director of Technology for LexisNexis, HPCC Systems. Arjuna is primarily responsible for leading the development of the next generation of big data tools for the HPCC Systems Platform. Specifically, Arjuna is leading the effort in developing tools around exploratory data analysis, data streaming and business intelligence. In addition, Arjuna also is the primary technology liaison for the system integrators that partner with HPCC Systems. In this regard, a large part of Arjuna's focus is to spread the technology in international markets like China, Brazil, Europe and India.