"Official" May 2014 Meetup


Details
Please note: the new date is Monday 5/12!!!
Agenda:
6:30 Networking and pizza
7:00 Announcements
7:05: Andrew Defries: Cheminformatics
7:20: Mark Rabkin: Transforming R to PMML
7: 45 Jaimyoung Kwon: R and Python
Details
R and Python: There and Back Again
I have been a happy user of R since my stat PhD days but also having a lot of fun with Python. Python emerged as a great tool for doing data science as well as a general purpose programming language. In this presentation, I will try to answer these question from statistician’s perspective who work in a big data space: "Do data scientists need to learn Python, R, or both, and why?", "When best to use Python, R, or something else?", and "If not R, what would Python replace, then?" among others. I will throw in some Hadoop consideration in the mix as a bonus.
Biography
Jaimie (Jaimyoung) Kwon is Director of Data Mining at Advertising.com, a division of AOL Platforms. Since joining the company in 2007, he has been working on various projects to leverage petabytes of online advertising log data to provide values to e-marketers and advertisers. Among others, he oversaw the development and launching of (a) a reporting platform for advertising audience insights and (b) user level campaign optimization and targeting platform using machine learning algorithms. He holds a PhD degree in Statistics from UC Berkeley, and has dozens of academic papers and presentations on application of statistics to various large-data problems.
#-----------------------------
Title: Cheminformatics, Chemical Space and R
Advances in the biological and genomic sciences enable us to ask deep questions from big data. To attack these large problems, share insight, and foster collaboration, scientists are increasingly using UNIX and R along with open source packages for data analysis. In cheminformatics chemical compounds are represented in a computer readable formats that represent one or more features such as chemical formula (1-D representation), structure in space (2-D/3-D), physicochemical (cLogP, bond donors, etc) or annotation. Cheminformatics tools available in R from Bioconductor (ChemmineR, fmcsR, eiR) were used in the analysis of pesticides used in california from 1991-2011. Several comparisons were performed such as chemical similarity tests that lead to inferences to on-target (pest) and off-target selectivity. These methods and results will be presented.
Andrew Defries background
My area of expertise is chemical biology where we inquire about relationships between small molecule chemicals and biological processes. #------------------------------------------------
Transforming R to PMML
This talk will address how to convert R Models to PMML using the the "pmml" and "pmmlTransformations" packages and discuss the benefits of doing so which include:
- Overcoming R's memory and speed limitations
- Deploying models in minutes, not months * Making many predictive models operational at once.
- Using multiple models to deploy ensembles, segmentation, and chaining We will also discuss how today's technology not only enables models to work with RDMS and NOSQL databases but how to enable real-time scoring against in-flight data.
Mark Rabkin Bio
Mark, currently Director of Business Development for Zementis, has led business development and sales teams at Apple, Coopers & Lybrand Consulting, and Staples as well at several venture capital backed start-ups. Mark is a graduate of San Francisco State University and holds an MBA from the Johnson Graduate School of Management at Cornell University where he was awarded the Kidd Grant for Entrepreneurship.

Sponsors
"Official" May 2014 Meetup