February 12, 2013 · 6:15 PM
We have a short turnaround time this month as we welcome Bryan Lewis to discuss SciDB.
About the talk:
SciDB is an open-source database that organizes data in n-dimensional arrays.
Interesting SciDB features include parallel processing, distributed storage, ACID transactions, efficient sparse array storage, and native linear algebra operations.
The "scidb" package for R provides two general ways to interact with SciDB from R:
1. By running database queries from R transferring data using data.frame iterators.
2. Through a sparse n-dimensional array object class for R inspired by the bigmemory package. The arrays mimic standard R arrays, but operations on them are performed by the SciDB engine. Data are materialized to R only when requested.
We illustrate using SciDB and R with a few examples including computing a truncated singular value decomposition of a large matrix, and bi-clustering of large arrays using the biclust package.
Bryan Lewis has worked with R for a number of years and is the author of a number of R packages including irlba, rredis, doRedis, websockets, and bigalgebra, and others. He is the chief data scientist at Paradigm4 in Waltham, MA and has a Ph.D. in applied mathematics.
Pizza starts at 6:15, Bryan will go on at 7 then we'll head to the bar.