Topic Modeling with Python


Details
Abstract:
Topic modelling is an unsupervised machine learning method that helps us discover hidden semantic structures in a collection of texts, allowing us to learn topic representations of documents in a corpus. The model can be widely used on different spectrums of unstructured data, such as research papers, medical notes, call transcripts, etc. In this talk, we will go through key steps of using Python to identify which topic is likely discussed in a document, i.e., topic modelling. In particular, we will cover Latent Dirichlet Allocation (LDA), a widely used topic modelling technique, and apply LDA to convert call transcripts to a set of topics.
Speaker Bio:
Qian Chen is a data scientist at Anthem, the second largest health insurer in US, supporting Program Integrity team to detect medical fraud and recover overpaid claims. Projects that he has worked on include:
-
Building Retroactive Rate Change models on government-based data (Medicare & Medicaid) to detect overpaid outpatient and inpatient claims.
-
Using Latent Dirichlet Allocation (LDA) model to analyze Medicare call transcripts and to classify data into different topics.
-
Using commercial data to detect Anthem participating providers that incorrectly submit claims through out-of-network channel.
Prior to joining Anthem, Chen worked for Allstate Insurance as a data scientist for about 3 years. He supported Allstate’s biggest department — Claims department’s work and helped build the first automation system for processing Allstate insured’s medical bills. This system can automate 20% bills. Additionally, he led a project to predict how likely Allstate could win a litigated case and advise claims adjusters to use different strategies for handling easy and tough cases.
Qian Chen has a master’s degree in Statistics from University of Chicago and a bachelor’s degree in Statistics from George Washington University. Fun facts about him: He has two dogs—one golden retriever and one corgi. They stay with his wife and him in South Loop. In his leisure time, he enjoys taking his dogs to hike in the forest preserve or linger in the city.
6:00 p.m - 6:30 p.m is time for social. Seminar will start at 6:30 p.m.
Our Sponsor: Metis Chicago ( https://www.thisismetis.com/ )

Sponsors
Topic Modeling with Python