Data Mining for Patterns That Aren't There

This is a past event

520 people went

Location image of event venue

Details

For our September Meetup, we're thrilled to have two speakers talk about how to deal with a common issue in statistical analysis: finding spurious patterns in your data, and making poor decisions or poor predictions as a consequence. This will be an outstanding opportunity to learn about best practices in data mining from two highly experienced practitioners.

Abstract: Repetitive computer intensive modeling can lead to overfitting and model underperformance. But this is not simply a technical problem with the modeling process, it is actually part of a larger and more complex statistical and philosophical phenomenon. Awareness of the broader context can give you deeper understanding of and confidence in your models. Peter Bruce, President of the Institute for Statistics Education at Statistics.com, will introduce the issue with some probability examples and discussion of the "lack of replication" problem in scientific research. Gerhard Pilcher, Vice President and Senior Scientist at Elder Research, will continue with discussion of the "vast search effect," illustrations from the business and government worlds, and a summary of remedies and best practices.

Agenda:

6:30pm -- Networking, Empenadas, and Refreshments

7:00pm -- Introduction

7:15pm -- Presentations and discussion

8:30pm -- Post presentation conversations

8:45pm -- Adjourn for Data Drinks (Circa, 22nd & I St.)

Bios:

Peter Bruce (http://www.linkedin.com/pub/peter-bruce/14/345/321) is the founder and President of The Institute for Statistics Education at Statistics.com (http://bit.ly/12YljkP), and worked previously at Cytel Software Corporation, which specializes in software and services for clinical trials. Early in his career he served with the U.S. Foreign Service. He is a co-author of "Data Mining for Business Intelligence" (Wiley, 3rd ed. forthcoming), "Introductory Statistics, a Resampling Approach" (pre-press), and a number of journal articles. He was a co-developer of Resampling Stats software, and was instrumental in the launch of XLMiner, a data mining add-in for Excel.

Gerhard Pilcher (http://datamininglab.com/senior-leadership/gerhard-pilcher) currently leads the Washington, DC office and all federal civil work for Elder Research, Inc (http://datamininglab.com/). Gerhard also serves on Advisory Boards for the NCSU Institute for Advanced Analytics and the GWU Department of Decision Sciences Master of Science program. He is a visiting lecturer at Georgetown University and teaches a three day Business Knowledge Series course on Data Mining through the SAS Institute. Gerhard has an MS in Analytics (Institute for Advanced Analytics, NCSU) and a BS in Computer Science from North Carolina State University.

Parking:

For those driving, we encourage you to find parking for this event via our sponsor, ParkMe (http://www.parkme.com/). ParkMe will help you find the closest, cheapest parking, and has iPhone (https://itunes.apple.com/us/app/parkme-parking-find-cheapest/id417605484?mt=8) and Android (https://play.google.com/store/apps/details?id=com.parkme.consumer) apps.