Past Meetup

Enterprising R - Richard Volpato

This Meetup is past

33 people went

Location image of event venue

Details

Enterprising R

The R-Project has delivered a software platform that not only offers effective statistical tools, but also an expressive language lived by a vibrantly clever community of ‘data dons’ (see r-project.org and r-bloggers.com). I will be describing and showing elements of a corporate renovation in which R will routinely provide a ‘weightings regime’ to apply to surveys. For R to function in a corporate environment it has to be more than a set of tools on a desktop: it needs to be a ‘service’ that can communicate securely and reliably with other systems, schedule and prioritize requests of its services, and of course, enable users to select data, transformations to apply to data, view the results and verbalise relevant implications.

The components of this system include, beyond a specialised R package (CAL-R): MySql as a data store; Django, as the application server dealing with workflow, page delivery and tracking; Django-Piston, making the whole service available as a ‘RESTful service’ to the business. This all connects to R via Pyper (a piping system that maps R objects into Python objects and vice versa). The load and composition of tasks for CAL-R are controlled by a queuing/scheduling system based on RabbitMQ and Django Celery. The formatting and user manipulation of objects on the page (eg drag and drop, select etc) is delivered via jQuery (a JavaScript framework) and Protovis (a JavaScript implementation of ‘ggplot’ style graphics) renders custom interactive graphs in the browser (note: not just a pdf!).

To build such an enterprise-wide service, I have used a number of techniques, including – iterative development, class-based code (including R generic functions & nested R environments), KANBAN to manage workflow, Subversion for version control, and R to do bootstrapping of various weighting scenarios.

This kind of work is becoming both feasible and indeed, I will argue, necessary across many corporate settings, because the vastness of the ‘data deluge’ swamping organisations results in them drowning in, rather than ‘metabolising’ the data into information about themselves and what lies or lurks beyond them. Data remains a deeply misconstrued cultural resource. The expressiveness of R and the flexibility of other scripting languages (eg Python, JavaScript and Ruby) now enable the information carried (buried!) within data to be released into digestible portions targeting distinct users..

Finally the particular practical setting of this deployment of R has further generic value to any public utility mode of operation. In this century, with data centric programming releasing the full capacity of data to reveal destinies, the public utility model for getting things done (from banking, ecological management, creative ecologies, transport - to name but a few) will prove to be far more widely applicable, efficient and fair than hitherto imagined possible. However this will be achieved by accomplishments in specific settings, not just rhetorical flourishes about ‘public goods’.

Biography

Richard Volpato, born in Italy, studied at ANU and Cambridge (UK) mostly sociology and social research. He has introduced the delights of data to thousands of students at Universities of Tasmania (mostly), Melbourne, Canberra and ANU. He also participated in the Australian Consortium of Social and Political Science Inc (ACSPRI) Summer schools to bring data analytic skills to graduate students and public servants. He has also consulted across a large number of industries, notably forestry, tourism, superannuation, education, health, religion and urban/regional planning. Over the last decade he has been involved in open source software, helping with the development of Sprints to complete the open source platform, Zope 3, and leading (via Zope Corporation) the successful tender for the Victorian Government Planning scheme amendments system (ZAPP: Zope Amendment Production Platform). He also developed the idea of ‘members as corporate resources’ within Superannuation. His current role at the Copyright Agency as Manager of Data Quality and Analysis has been to improve data quality, its value and the systems used to acquire and process it.