addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupsimageimagesinstagramlinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1outlookpersonJoin Group on CardStartprice-ribbonImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruseryahoo

(Easy) High performance text processing in Machine Learning

This month we have Daniel Krasner presenting "(Easy) High performance text processing in Machine Learning".

Note: Pivotal will be hosting. However, drinks and snacks are not provided as our group is just too large to sponsor that. So please respect the space and don't touch the fridges.


This talk covers rapid development of high performance scalable text processing solutions for tasks such as classification, semantic analysis, topic modeling and general machine learning. We demonstrate how Python modules, and in particular the Rosetta Python library, can be used to process, clean, tokenize, extract features, and finally build statistical models with large volumes of text data. The Rosetta library focuses on creating small and simple modules (each with command line interfaces) that use very little memory and are parallelized with the multiprocessing package. We will touch on LDA topic modeling and different implementations thereof (Vowpal Wabbit and Gensim). The talk will be part presentation and part “real life” example tutorial.


Daniel Krasner is a research scholar with the “Declassification Project” at Columbia University and the co-Founder of KFit Solutions, a data science consulting firm. His current interests and work focus on high performance statistical solutions in text and natural language processing. He is the co-creator or “Rosetta,” an open source python text processing library. In addition, Daniel continually works with a number of hedge funds in the city, building financial modeling and decision support systems. Previously, Daniel was the chief data scientist at Sailthru, an email and behavioral analytics platform, a senior researcher at Johnson Research Labs, and a professor teaching Applied Data Science in the Columbia University statistics department. Prior to entering the world of data science, Daniel Krasner was a researcher at the Mathematical Sciences Research Institute in Berkeley and an assistant professor of mathematics at UCLA. He holds a PhD in mathematics from Columbia University.

Join or login to comment.

  • Nitin k.

    Excellent presentation in simple language. Can we have the slides and a link to the presentation, Paul....
    Thanks in advance.

    February 23, 2014

  • John Peter S.

    Good talk. Really good. I also will contact you for the slides. Thanks to Paul and Pivotal.

    February 21, 2014

  • A former member
    A former member

    Great talk, it was very clear. Would it be possible to download the slides?

    3 · February 21, 2014

  • Gary G.

    Very nice and understandable presentation with practical Python engineering advice and advanced thoughts on the use of -- and interfacing to -- a suitable LDA library.

    February 20, 2014

  • Niels B.

    Can't seem to find the Rosetta library... anyone have the link?

    February 20, 2014

  • Iordan S.

    Can't make it unfortunately. Releasing the seat.

    February 20, 2014

  • Paul D.

    Also, the nice people at are sponsoring captioning for accessibility for tonight. Go here for that:

    February 20, 2014

  • Paul D.

    To all of those that don't make it through the waiting list, Pivotal will be live streaming the event here:

    Load that page and a link should show up when it starts. I'll be kicking things off around 7:05 PM

    February 20, 2014

    • Daren

      Will it be saved for later viewing?

      February 20, 2014

    • Paul D.

      yes, it will be recorded

      February 20, 2014

  • H L.

    I can't make it tonight. I am giving up my seat.

    February 20, 2014

  • @aaronchall

    Remember to release your RSVP's if you can't make it!

    February 20, 2014

  • Farhan A.

    A work obligation came up and I won't be able to make it any more :(

    February 20, 2014

  • A former member
    A former member

    Any updates on the wait list? I really want to attend the event

    February 20, 2014

  • Chakri

    hi, is this session is hands-on? what software is needed apart from python? is there anything on git-hub to get started?


    February 19, 2014

  • BigData L.

    Same here. Last time someone was recording in the Pivotal Labs I think. Hoping you can do the same and then post.

    February 19, 2014

  • shilpa

    Can I have some video of the event as can not attend due to distance problem.

    February 19, 2014

  • James Q

    Will a video or slides be made available? Turns out I will not be able to make this one :(

    February 17, 2014

    • Paul D.

      Yes, for this one we should have video

      4 · February 18, 2014

  • A former member
    A former member

    I would love to attend this event since I am starting on using sentiment detection from news feeds

    February 16, 2014

  • Gene E.

    Do some textmining myself

    February 8, 2014

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy