Enabling Large Scale In-database Analytics with R across a Cluster


Details
Meet from 5:45pm for a 6pm talk.
Talk Outline: As data volumes continue to grow and companies become increasingly aware of the commercial potential of analyzing their data, it has become apparent at any solution which claims to meet these needs must have a number of key characteristics:
Scalability: The ability to ingest, store, organize and join large amounts of data and to grow without loss of performance. In database execution: The ability to execute a wide variety of computationally challenging algorithms over large datasets without extensive data movement. Adaptability: The flexibility to cope with evolving data formats, new data sources and newly developed programming frameworks without compromising performance. This presentation will demonstrate how R can be run across a Linux cluster using the Aster nCluster database. The implications of exposing the algorithmic richness of R to large scale datasets will be discussed. We will also touch on running other open source language such as Python and Perl across a cluster and how map reduce jobs can be easily written and deployed via the Aster IDE.
About our speaker: Bio: Ross is Chief Data Scientist at Teradata and currently works with major clients throughout Australia and New Zealand to help them exploit the value of ‘big data’. He specialized in deployments involving non-relational, semi structured data and analyses such as path analysis, text analysis and social network analysis. Previously, Ross was deputy headmaster of John Colet School for 18 years before working as a SAS analyst, a business development manager at Minitab Statistical Software and founder and lead analyst at datamilk.com.
Ross Farrelly has a BSc (hons 1st class) in pure mathematics from Auckland University, a Masters in Applied Statistics from Macquarie University and a Masters of Applied Ethics from the Australian Catholic University.

Enabling Large Scale In-database Analytics with R across a Cluster