Skip to content

Stream processing with R in AWS

Photo of Gergely Daróczi
Hosted By
Gergely D.
Stream processing with R in AWS

Details

Abstract: R is rarely mentioned among the big data tools, although it's fairly well scalable for most data science problems and ETL tasks. This talk presents an open-source R package to interact with Amazon Kinesis via the MultiLangDaemon bundled with the Amazon KCL to start multiple R sessions on a machine or cluster of nodes to process data from theoretically any number of Kinesis shards. Besides the technical background and a quick introduction on how Kinesis works, this talk will feature some stream processing use-cases at CARD.com, and will also provide an overview and hands-on demos on the related data infrastructure built on the top of Docker, Amazon ECS, ECR, KMS, Redshift and a bunch of third-party APIs.

Bio: Gergely is an enthusiast R user and package developer, founder of an R-based web application at rapporter.net, Ph.D. candidate in Sociology, Director of Analytics at CARD.com with a strong interest in designing a scalable data platform built on the top of R, AWS and a dozen APIs. He maintains some CRAN packages mainly dealing with reporting and API integrations, co-authored a number of journal articles in social and medical sciences, and recently published the "Mastering Data Analysis with R" book at Packt. He is one of the founders and organizers of the Hungarian R Meetup.

Note: this talk will be part of Budapest Startup Safary 2017 (http://budapest.startupsafary.com), which is a low-budget, two-day conference on topics related to startup businesses -- including data, so we highly suggest checking the whole conference program and register, although attendance to this specific talk will be free for BURN members who RSVP here.

Photo of Budapest Users of R Network group
Budapest Users of R Network
See more events