Data Mining for Patterns That Aren't There

For our September Meetup, we're thrilled to have two speakers talk about how to deal with a common issue in statistical analysis: finding spurious patterns in your data, and making poor decisions or poor predictions as a consequence. This will be an outstanding opportunity to learn about best practices in data mining from two highly experienced practitioners.

Abstract: Repetitive computer intensive modeling can lead to overfitting and model underperformance.  But this is not simply a technical problem with the modeling process, it is actually part of a larger and more complex statistical and philosophical phenomenon. Awareness of the broader context can give you deeper understanding of and confidence in your models. Peter Bruce, President of the Institute for Statistics Education at Statistics.com, will introduce the issue with some probability examples and discussion of the "lack of replication" problem in scientific research. Gerhard Pilcher, Vice President and Senior Scientist at Elder Research, will continue with discussion of the "vast search effect," illustrations from the business and government worlds, and a summary of remedies and best practices.

Agenda:

6:30pm -- Networking, Empenadas, and Refreshments

7:00pm -- Introduction

7:15pm -- Presentations and discussion

8:30pm -- Post presentation conversations

8:45pm -- Adjourn for Data Drinks (Circa, 22nd & I St.)

Bios:

Peter Bruce is the founder and President of The Institute for Statistics Education at Statistics.com, and worked previously at Cytel Software Corporation, which specializes in software and services for clinical trials.  Early in his career he served with the U.S.  Foreign Service.  He is a co-author of "Data Mining for Business Intelligence" (Wiley, 3rd ed. forthcoming), "Introductory Statistics, a Resampling Approach" (pre-press), and a number of journal articles.  He was a co-developer of Resampling Stats software, and was instrumental in the launch of XLMiner, a data mining add-in for Excel.

Gerhard Pilcher currently leads the Washington, DC office and all federal civil work for Elder Research, Inc. Gerhard also serves on Advisory Boards for the NCSU Institute for Advanced Analytics and the GWU Department of Decision Sciences Master of Science program. He is a visiting lecturer at Georgetown University and teaches a three day Business Knowledge Series course on Data Mining through the SAS Institute. Gerhard has an MS in Analytics (Institute for Advanced Analytics, NCSU) and a BS in Computer Science from North Carolina State University.

Parking:

For those driving, we encourage you to find parking for this event via our sponsor, ParkMe. ParkMe will help you find the closest, cheapest parking, and has iPhone and Android apps.

Join or login to comment.

  • Doug_S

    I would have liked Gerhard Pilcher to go into more detail about how to diagnose overfitted models, likely situations in which important variables are unaccounted for, and other such pathologies, but I wouldn't want to guess how much statistical geekiness the audience would tolerate. I assume he had the same problem. We might want to have another session just on diagnosing deficiencies in data mining models. He and Peter Bruce did say enough to get a lively discussion started, and some follow-up conversations afterward, so I'd call it a success. On the other hand, about half the audience left immediately upon conclusion of the formal presentations, so maybe they wanted more than they got.

    1 · October 1, 2013

    • A former member
      A former member

      You said "go into more detail about how to diagnose overfitted models, likely situations in which important variables are unaccounted for, and other such pathologies". I agree with this completely and it is the reason I was a bit disappointed. I did however enjoy the experience (and speakers) on the whole and would gladly attend the next meeting. I also met some interesting new people...

      October 1, 2013

    • Aaron S.

      I wonder if the technical level of the presentation affects people's decisions about whether to go to the bar afterward; are you more likely to want a beer after an easy lecture or a hard lecture? I suspect the null hypothesis should not be rejected on this one... Deserves further study? :)

      1 · October 1, 2013

  • Doug_S

    Maybe next time begin by asking the audience how much statistical background they have, so the presenter knows what basics have to be restated and where s/he can skip forward to more advanced material. Also, you can get away with going faster and deeper if people know the slides will be available afterward.

    October 1, 2013

  • Gauthier

    - Interesting topics with nice examples from very good speakers.
    - The room was very cold!!

    October 1, 2013

  • Harlan H.

    If anyone would like to write an event review for the Data Community DC blog, please let me know! (Click on my name to email me.) We'll have the slides and audio available.

    October 1, 2013

    • Eunice

      Hi Harlan, I would be glad to write an event review if we are still seeking one and have sent you an email with my contact information, as well. Thanks!

      1 · October 1, 2013

  • Tony O.

    I thought these were good high-level presentations by both Peter and Gerhard. They were able to keep the presentations interesting and went into a little more detail during the Q&A session afterward. For those interested in Data Community DC's R Statistical Programming workshops on October 19th that Harlan mentioned during the intro, you can find more details and RSVP at http://bit.ly/19ePwzw

    October 1, 2013

  • Alex P.

    Anyone get a final picture of the blue-dot-sticker review board?

    October 1, 2013

  • Nevin H.

    Top notch, short and sweet. Could have maybe snuck in another speaker or case study

    October 1, 2013

  • Brand N.

    I listened to Gerhard Pritcher's talk at the Predictive Analytics World Government Conference recently:
    http://semanticommunity.info/Analytics/Predictive_Analytic_World_Government_2013#Gerhard_Pilcher_You_may_have_a_lot_o_fdata... and hoped for more details as well.

    I wrote a story for a Data Science DC Post:
    http://semanticommunity.info/Analytics/Predictive_Analytic_World_Government_2013#Story

    There are more details in the recent NAS Report on Frontiers in Massive Data Analysis:
    http://semanticommunity.info/Big_Data_at_NIST/Frontiers_in_Massive_Data_Analysis#Story

    that was discussed at NIST Big Data WG Workshop today:
    http://bigdatawg.nist.gov/home.php

    NIST is hosting a Data Science Symposium, November 18-19:
    http://www.nist.gov/itl/iad/data-science-symposium-2013.cfm

    and technical abstracts are due by October 4th.

    October 1, 2013

  • Jerome Y.

    Perhaps there may be more technical details at Gerhard Pilcher's web site:
    www.datamininglab.com.

    October 1, 2013

  • A former member
    A former member

    Disappointed, I felt the topic was covered somewhat superficially. In fairness though, the topic needs more than 2 hours....

    October 1, 2013

  • Andrea

    Could of been a bit more technical

    3 · September 30, 2013

    • Aman S.

      Couldn't agree more. Although the speaker s are very prominent, I too was expecting something more technical.

      September 30, 2013

  • Miriam H.

    I would be great to hear a little about what was covered. Recording, PPT, anything?

    2 · September 30, 2013

  • William

    Also missing out due to work...

    September 30, 2013

  • Jim B.

    Sorry, I am going to have to bail on this great event because of my day job.

    September 30, 2013

    • John Mark M.

      I too had to miss this. Any chance a recording will be posted?

      1 · September 30, 2013

  • Laxman N.

    I am interested in Data Analytics / visualization on Data Mining

    September 25, 2013

  • Lance L.

    Can't wait to meet you guys.

    1 · September 9, 2013

  • Fola

    Coming

    September 5, 2013

  • Brand N.

    I hope you can attend our conference next Tuesday and Wednesday where our work with the Cray Graph Computer will be presented on Tuesday at 3:45 pm: http://www.afei.org/events/3A03...

    Speaking of Neo4j gaining leverage in the government, I have arranged for our Semantic Medline Data Science Team to help Wes and Geoff use that data set in Neo4j for the next meetup (October 22nd if possible) and helped my son get started with it in his work for the Department of Justice.

    September 5, 2013

  • Mike T.

    There for the empanadas and shrinkage

    September 4, 2013

  • Mike T.

    There for the empanadas and shrinkage

    September 4, 2013

  • Mike T.

    There for the empanadas and shrinkage

    September 4, 2013

  • Mike T.

    There for the empanadas and some shrinkage

    September 4, 2013

200 went

Our Sponsors

People in this
Meetup are also in:

Imagine having a community behind you

Get started Learn more
Rafaël

We just grab a coffee and speak French. Some people have been coming every week for months... it creates a kind of warmth to the group.

Rafaël, started French Conversation Group

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy