For our November Data Science DC, we're thrilled to have to speakers talk about machine learning algorithms and how they can be socially responsible, or not. Mike Williams from Fast Forward Labs in NYC will be talking about how supervised learning algorithms can amplify existing inequality, and Prof. Lisa Singh from Georgetown will present her research on the digital trails we leave on the web. Much of the Big Data revolution is our new ability to analyze extensive, detailed data generated by people and their behavior -- these presentations will provide you with insights into what you can do better to leverage this data in a responsible way.
NOTE: We'll be at Pew Research this month! Thank to them for hosting!
• 6:30pm -- Networking, Empanadas, and Refreshments
• 7:00pm -- Introduction, Announcements
• 7:15pm -- Presentations and Discussion
• 8:30pm -- Data Drinks (TBA)
This talk will use the example of sentiment analysis to show that supervised machine learning has the potential to amplify the voices of the most privileged people in society. A sentiment analysis algorithm is considered ‘table stakes’ for any serious text analytics platform in social media, finance, or security. As an example of supervised machine learning, I’ll show how these systems are trained. But I'll also show that they have the unavoidable property that they are better at spotting unsubtle expressions of extreme emotion. Such crude expressions are used by a particularly privileged group of authors: men. In this way, brands that depend on sentiment analysis to 'learn what people think' inevitably pay more attention to men. But the problem doesn't stop with sentiment analysis: at every step of any model building process, we make choices that can introduce bias, enhance privilege, or break the law! I'll review these pitfalls, talk about how you can recognise them in your own work, and touch on some new academic work that aims to mitigate these harms.
Mike Williams is a research engineer at Fast Forward Labs (http://www.fastforwardlabs.com/), which helps organizations accelerate their data science and machine intelligence capabilities by profiling near future technologies from academia and elsewhere, producing reports on their development and prototypes demonstrating their application. He has a PhD in astrophysics from Oxford, and did postdocs at the Max Planck Institute in Munich and at Columbia University. Follow Mike on Twitter @mikepqr (https://twitter.com/mikepqr).
Helping Users Understand Their Web Footprint
With the growth of online social networks and social media sites, the increase in dynamic web content, and the popularity of digital communication, more and more public information about individuals is available on the Internet. This talk will present a novel information exposure detection framework that generates and analyzes the web footprints users leave across the social web. Our approach uses probabilistic operators, free text attribute extraction, and a population-based inference engine to generate the web footprints. Using a web footprint, the framework then quantifies a user’s level of information exposure relative to others and makes suggestions on which attributes to remove or hide. After presenting the framework, I will show an evaluation that quantifies information exposure on public profiles from Google+, LinkedIn, FourSquare, and Twitter. Finally, the talk will conclude with a brief discussion about data privacy and ethics.
Dr. Lisa Singh (http://people.cs.georgetown.edu/~singh/) is the Graduate Director and an Associate Professor in the Department of Computer Science at Georgetown University. Her research interests are in data science, data mining, and databases. She currently has funding from NSF to study privacy on the web (adversarial inference), dolphin social structures with the Shark Bay Dolphin Research project (graph databases, visual analytics and social mining) and forced migration with the Institute for the Study of International Migration, LLNL, York University and others (text mining, graph mining, event detection). Dr. Singh received her B.S.E. degree from Duke University and her M.S. and Ph.D. degrees from Northwestern University. She has served on many organizing and program committees, including KDD, ICDM, ICDE, and SIGMOD, and is currently involved in different organizations related to women in computing and computational thinking education for K-12.
This event is sponsored by Pew Research (http://www.pewresearch.org/), Statistics.com (http://bit.ly/12YljkP), Hired (https://hired.com/), Elder Research (http://datamininglab.com/), and Booz Allen Hamilton (https://www.boozallen.com/consulting/strategic-innovation/nextgen-analytics-data-science). (Would your organization like to sponsor too? Please get in touch!)