addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwchatcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgoogleimageimagesinstagramlinklocation-pinmagnifying-glassmailminusmoremuplabelShape 3 + Rectangle 1outlookpersonplusprice-ribbonImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruseryahoo

Chicago Machine Learning Meetup: Demystifying Dimensionality Reduction

  • Apr 9, 2013 · 5:30 PM
  • This location is shown only to members

The Talk

Datasets come in all shapes and sizes. Some are tall and skinny (lots of samples), some are short and wide (less samples more variables), and some are both tall and wide (common for network graphs).  In many cases, the number of variables becomes too large to manage effectively in memory.

Dimensionality Reduction is a common method of modeling data as a smaller representation, which maintains item similarity (as measured by the "distance" between two samples).

Unfortunately, the techniques generally employed in Dimensionality Reduction come with intimidating and unwieldy names such as Singular Value Decomposition, Principle Component Analysis, Latent Dirichlet Allocation, K-Means Clustering, Latent Semantic Analysis, Random Projections, etc. Many of the techniques involve the same underlying mathematics, they were simply developed independently for different domains. 

Jeff will attempt to demystify the topic a bit by explaining what it is, why you would use it and many of the common underlying themes, terms and and approaches in everyday English.

About Jeff

Jeff Hansen is a Senior Data Engineer at Think Big Analytics. Yet another physicist gone rogue, when Jeff isn't busy building Big Data systems, he enjoys herding goats in France, scuba diving in Thailand and factoring prime numbers on his couch. Jeff received a BA in Physics and German from Washington University in St. Louis. As a frugal autodidact, Jeff has since paid institutions to take a number of courses for fun, but he now prefers to do most of his learning for free offline with the help of coursera.com, oyc.yale.edu and webcast.berkeley.edu.

 

 

Join or login to comment.

  • A former member
    A former member

    I re-recorded the video this morning and I've been told the new recording has been uploaded over the old one -- http://thinkbiganalytics.com/resources/recent_big_data_events/jeff-hansen-demystifying-dimensionality-reduction/

    April 22, 2013

  • A former member
    A former member

    Apparently there was a problem with the audio in the recording. Sorry about that, I'll be rerecording it this weekend and posting a new copy.

    April 19, 2013

  • Daniel C.

    Jeff - I am getting a lot of static and no voiceover in that video. Anything wrong with my setup?

    April 19, 2013

  • A former member
    A former member

    For anybody who was interested, I still need to get the actual materials online somewhere, but I gave a revised version of the presentation online yesterday and it's available for replay on Think Big's website -- http://thinkbiganalytics.com/resources/recent_big_data_events/jeff-hansen-demystifying-dimensionality-reduction/

    April 19, 2013

  • Simon H.

    I agree, very good talk. One of the attendees referenced the Python GenSim library as an excellent one for exploring the SVD algorithm and related algorithms (http://radimrehurek.com/gensim/). Although mainly used for topic modelling and creating semantic spaces in natural language processing tasks, the LSI component is identical to SVD and could be used to purely do SVD. The python modules are written in C and are extremely fast, I can do LSA (SVD) on reasonable sized corpora with 10s ot 100s thousand words in a a few seconds on a weedy netbook.

    April 11, 2013

  • Avi N.

    I really appreciated that lecture. I've always wanted an intuitive explanation of linear algebra and it was very nicely provided. Useful as well.

    April 11, 2013

  • A former member
    A former member

    By the way, here was the stackoverflow page I mentioned with Ted Dunning's response -- http://stackoverflow.com/questions/4951286/svd-for-sparse-matrix-in-r and here's the paper he referenced that I also mentioned (70 or so pages, but a good read) http://arxiv.org/abs/0909.4061

    April 10, 2013

  • A former member
    A former member

    For those of you who couldn't attend, I'll be giving a similar talk as an online webinar next Thursday -- http://thinkbiganalytics.com/about/big_data_events/big_data_webinars/ I'll make sure to get a revised version of the materials posted after that.

    April 10, 2013

  • Paul K.

    Please put the materials online!

    April 10, 2013

  • Christopher S.

    Last minute change of plans. I hope everyone enjoys Jeff's presentation!

    April 9, 2013

  • Miao L.

    Sorry get sick can't make it, please transfer the spot to others if they need.

    April 9, 2013

  • Benjamin

    Looks like I won't be able to make it after all :(

    April 8, 2013

  • Aaron

    Have Class that conflicts with the timing.

    April 6, 2013

  • Suraj S

    'Suraj Shrestha' is my full name. Thanks.

    April 4, 2013

  • Christopher S.

    Excited to hear the talk!

    March 29, 2013

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy