E-mails, Facebook statuses, Whatsapp messages… We all leave textual traces all over the place every day. Text analysis, including authorship identification or anomaly detection, has been a critical competency within forensic investigations for a long time. However, the volume of analyzed text data grows rapidly and multi-language litigations are getting more and more common. Traditional approaches are getting costly and less practical and it is getting difficult to find a “smoking gun”. To discover patterns and trends in the data, it is no longer possible to rely on manual analysis only. This talk introduces the basics of forensic linguistics and the state-of-the-art ML methods and tools for automated unstructured content analysis within electronic discovery and crime detection as such. Real-life examples and demo included.

Kateřina is a data scientist with a natural language processing background, focusing on semantic analysis of textual data. Having previously worked as a product developer and business consultant specializing on text analytics in the big data domain, she is now involved in forensic data analysis at Deloitte. Kateřina got her PhD. in computational linguistics at MFF UK Prague. Her research concerns mainly sentiment analysis and information extraction. She gives lectures on Linguistic Applications at Charles University in Prague and Palacký University Olomouc.

