We are excited to continue with our "Invited Speakers Series". Our speaker are Data Scientist and Data Architect from Spiceworks. Our Sponsor for this event is SparkCognition.
1. NLP Approach to Grouping Similar Questions in Community Forums.
2. Discuss various methods of identity mapping and related data quality
1. Natalie Durgin - Senior Data Scientist and Mathematician at Spiceworks
2. Marian Nodine( Misty) - Data Architect at Spiceworks
1. Natalie works as a Data Scientist at Spiceworks. During her three-year tenure, she has encountered a diverse set of problems including lookalike models, recommendation algorithms, graph models of user behavior, email delivery optimization, off-line model evaluation methods, on-network click-prediction, and natural language processing in community question answering forums. Previously, she earned a Ph.D. in mathematics at Rice University, where she studied the algebraic geometry of spaces used to classify geometric data. She serves on the steering committee of the BIG Math Network and is passionate about helping mathematicians transition from academia to jobs in business, industry and government. After hours, she participates in an academic research collaboration in Compressed Sensing and also enjoys rock climbing, and dabbling in microscopy.
2. Misty is the Data Architect at Spiceworks, where she has been working to transform their existing data pipelines into a unified, real-time pipeline that enables serving customized IT content to the users of the Spiceworks products. She also works with the data scientists to help analyze and predict user behavior.
Misty has a depth of experience in Data Science. Previous to Spiceworks, she was the Lead Data Scientist at StepOne, Inc. She also was a founder of Elastic Knowledge, where she consulted with early startup companies on how they could leverage machine learning technologies to improve their products. She also has many years of experience in government and industry research, including being the Principal Investigator for a DARPA contract and the technical lead for several other projects.
Misty received her Ph.D. in Computer Science from Brown University. She received her S.B. and S.M. in EECS from Massachusetts Institute of Technology. She is an author of over 30 peer-reviewed technical papers. In her free time she likes to hike and knit.
Detail Topic and Agenda:
Topic 1: Survey of several NLP problems and techniques to resolve them.
We briefly survey several natural language problems that arise around Community Question Answering forums, and explore a solution to one of those problems proposed by Charlet and Damnati at the International Workshop on Semantic Evaluation, 2017. Their approach is very modular and includes several unsupervised methods feeding into a supervised method to achieve their final results. Turning our attention to a variant of the problem arising at Spiceworks, we will step through applying various aspects of their solution, highlighting advantages of the modular approach as well as various pitfalls. Our problem variant finishes with some unsupervised clustering, and we also step through the decision-making process behind the application of these techniques.
Topic 2: Clean data and maintaining data quality for building accurate machine learning models.
Clean data is an important component to building accurate machine learning models. When we are building a customer profile for personalization, recommendation, or advertising, however, we often collect data about the person from multiple disparate sources, and use identity mapping techniques to relate the data sources together.
In this talk, we will examine methods of creating identity maps, including both matching identities within your organization and matching across organizations. We will also look at methods for evaluating and improving the underlying quality of your identity maps.