Feb'19 - You don't need Spark for medium data (Zhuo Jia Dai)

Name: Feb'19 - You don't need Spark for medium data (Zhuo Jia Dai)
Start: 2019-02-13T17:45:00+11:00
End: 2019-02-13T19:30:00+11:00
Location: SMSA

Hosted By

Eugene D. and 2 others

Feb'19 - You don't need Spark for medium data (Zhuo Jia Dai)

Details

We look forward to seeing you at our first 2019 meetup, on 13th February:

Arrive from 5:45 pm for a 6 pm talk.

"You don't need Spark for medium data"

TALK OUTLINE:

Medium data is an important segment of datasets sandwiched between small data (datasets that can be manipulated in R or Python/Pandas) and big data (datasets the require distributing data over many computers to be effective e.g. Hadoop/Spark). This segment is important because it is difficult to analyse without proper tools but is also the predominant form of data in many industries including banking. The canonical tools for dealing with medium data include Dask, JuliaDB.jl, SAS, and Spark; and there aren't any good options in R. In this talk, we will present the disk.frame (https://github.com/xiaodaigh/disk.frame) R package which is a new medium data manipulation framework that is simple, fast, and (hopefully) intuitive to use. We will showcase how to summarise 1.8 billion data points on a laptop within minutes using disk.frame.

BIO:

ZJ has more than 10 years of experience in credit risk modelling/analytics/data science and has recently become an independent consultant. He has a maths background, and runs the Sydney Competitive programming meetup and Julia (Julialang) meetup.

Events in Sydney , AU

Sydney Users of R Forum (SURF)

See more events

Sydney Users of R Forum (SURF)

Wednesday, February 13, 2019
5:45 PM to 7:30 PM AEDT

SMSA

280 Pitt St · Sydney

Sydney Users of R Forum (SURF)

public group

Feb'19 - You don't need Spark for medium data (Zhuo Jia Dai)