Text mining with R

Name: Text mining with R
Start: 2010-10-19T18:30:00-05:00
End: 2010-10-19T21:30:00-05:00
Location: UCLA Boelter Hall Room 9413

Hosted By

Szilard P.

Details

In this meeting we'll have Rob Zinkov talk about text mining with R.

The talk will be accompanied with demos, and we'll also have time to do some analysis of YOUR data. So, if you have data suitable for text mining, please consider to share it with us (please contact Szilard in advance).

Also, we'd like to have some discussions on text mining, so if you have been involved into this field, please consider to participate in the discussions (e.g. what applications you have used it, what kind of data, what tools from R you have used etc.)

Finally, we can accommodate a few short (5-10 min) talks on text mining related topics. If you have experience in text mining, please consider to do a short presentation (please contact Szilard if you'd like to do so).

UPDATE: See list of short talks below.

The abstract for Rob's talk: "R isn't just good for analyzing large datasets, building web applications, GIS, and stunning visualizations. You can also use R for text mining. In my presentation I will show how you can use R and various libraries in CRAN to do a wide array of common text mining and natural language processing tasks. These include topic modeling, summarization, entity extraction, and sentiment analysis. Bring your own unstructured data for an exciting live demo."

Rob's bio: Rob Zinkov is a Ph.D student in Computer Science at USC/ISI. He is also a software developer at Digisynd in Burbank.

Short talks:

Ryan Rosario: Accessing R from Python using RPy2

Abstract: Many prefer scripting languages such as Python, Perl and Ruby for text mining. The typical workflow for an R user that is in this boat is to perform text mining operations in one of these languages, in my case Python, and then read in some resulting data into R for modeling, from disk. The RPy2 package avoids this break in workflow. RPy2 allows the user to call R from within Python and allows easy translation between Python and R data types. The user then has full access to R from within Python. In this short presentation, I will show an example of how to use Python and the NLTK (Natural Language Toolkit) library and I will show how easy it is to seamlessly pass data to R without ever leaving Python. I will also discuss why somebody might want to text mine in Python and rely on R for only the modeling and analysis.

Bio: Ryan is a Ph.D. candidate at the Department of Statistics at UCLA and recently received his M.S. in Computer Science at UCLA. He also works as a data scientist and researcher for the Rubicon Project, a web ad optimization company in West LA.

Please RSVP as places are limited.

Venue and starting time as usual: UCLA Boelter Hall Room 9413 (the lab), 6:30pm

Important message for first comers: Please consult the detailed venue information given at the first meeting as many people had difficulties finding the place the first time: https://www.meetup.com... (https://www.meetup.com/LAarea-R-usergroup/calendar/10154603/)

Events in Los Angeles, CA