addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwchatcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-crosscrosseditemptyheartfacebookfolderfullheartglobegmailgoogleimagesinstagramlinklocation-pinmagnifying-glassmailminusmoremuplabelShape 3 + Rectangle 1outlookpersonplusprice-ribbonImported LayersImported LayersImported Layersshieldstartrashtriangle-downtriangle-uptwitteruseryahoo

"Official"­ May 2014 Meetup

Please note: the new date is Monday 5/12!!!

6:30 Networking and pizza
7:00 Announcements
7:05: Andrew Defries: Cheminformatics
7:20: Mark Rabkin: Transforming R to PMML
7: 45 Jaimyoung Kwon: R and Python


R and Python: There and Back Again

I have been a happy user of R since my stat PhD days but also having a lot of fun with Python. Python emerged as a great tool for doing data science as well as a general purpose programming language. In this presentation, I will try to answer these question from statistician’s perspective who work in a big data space: "Do data scientists need to learn Python, R, or both, and why?", "When best to use Python, R, or something else?", and "If not R, what would Python replace, then?" among others. I will throw in some Hadoop consideration in the mix as a bonus.


Jaimie (Jaimyoung) Kwon is Director of Data Mining at, a division of AOL Platforms. Since joining the company in 2007, he has been working on various projects to leverage petabytes of online advertising log data to provide values to e-marketers and advertisers. Among others, he oversaw the development and launching of (a) a reporting platform for advertising audience insights and (b) user level campaign optimization and targeting platform using machine learning algorithms. He holds a PhD degree in Statistics from UC Berkeley, and has dozens of academic papers and presentations on application of statistics to various large-data problems.


Title: Cheminformatics, Chemical Space and R
Advances in the biological and genomic sciences enable us to ask deep questions from big data. To attack these large problems, share insight, and foster collaboration, scientists are increasingly using UNIX and R along with open source packages for data analysis. In cheminformatics chemical compounds are represented in a computer readable formats that represent one or more features such as chemical formula (1-D representation), structure in space (2-D/3-D), physicochemical (cLogP, bond donors, etc) or annotation. Cheminformatics tools available in R from Bioconductor (ChemmineR, fmcsR, eiR) were used in the analysis of pesticides used in california from 1991-2011. Several comparisons were performed such as chemical similarity tests that lead to inferences to on-target (pest) and off-target selectivity. These methods and results will be presented.

Andrew Defries background
My area of expertise is chemical biology where we inquire about relationships between small molecule chemicals and biological processes. #------------------------------------------------
Transforming R to PMML
This talk will address how to convert R Models to PMML using the the "pmml" and "pmmlTransformations" packages and discuss the benefits of doing so which include:

* Overcoming R's memory and speed limitations
* Deploying models in minutes, not months * Making many predictive models operational at once.
* Using multiple models to deploy ensembles, segmentation, and chaining We will also discuss how today's technology not only enables models to work with RDMS and NOSQL databases but how to enable real-time scoring against in-flight data.

Mark Rabkin Bio
Mark, currently Director of Business Development for Zementis, has led business development and sales teams at Apple, Coopers & Lybrand Consulting, and Staples as well at several venture capital backed start-ups. Mark is a graduate of San Francisco State University and holds an MBA from the Johnson Graduate School of Management at Cornell University where he was awarded the Kidd Grant for Entrepreneurship.

Join or login to comment.

  • Gary M.

    Huge spotlight that wasn't used but was there anyways? Yes! PA system? No! Ok, it was still cool of HB to host, and the talks were good.

    May 13, 2014

  • Russell S.

    I found three notebooks left after the event - I've had two claimed, but one has not been yet! Please get in touch via meetup if you have lost a notebook. We're open Mon-Fri for you to come by and pick it up.


    May 13, 2014

  • Jim S.

    Excellent presentations.

    May 13, 2014

  • Katherine A.

    Thank you hosts and speakers! I believe there was an announcement about something happening at PayPal- can whoever that was share the details again? Thank you!

    May 12, 2014

    • Andrew P.

      ACM 2014 Data Camp not up yet - that's what I heard anyway....

      May 12, 2014

    • Stephen

      ACM Data Science Camp, Sat Oct 25, at Paypal/eBay North. Website coming shortly. Call for speakers.

      May 13, 2014

  • Stephen

    Great talks on a variety of topics. Andrew Defries' talk was superb.

    May 13, 2014

  • Stephen

    Does anyone want to carpool up from southbay? Contact me offlist if so. Leaving Mountain View 5:15pm.

    May 12, 2014

  • Robert S.

    Dan Bikle has a class for using machine learning for the stock market.
    Dan has spent years studying the stock market combined with machine learning algorithms. I am porting Dans awesome work to R and am looking for others who wish to contribute during his next classes.­
    His class uses Centos and Postgres and the associated MADLib statistical library. I think R may be an simpler choice. I created the first feature set in R during the last class and will work on adding his machine learning algorithms during this and next weeks class. Here is the R I wrote last week:­...­
    If you know R well join this Saturday and help improve the R port.

    April 28, 2014

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy