Skip to content

May 2017: Data Science with Time-Series Data

Photo of Bostjan Kaluza
Hosted By
Bostjan K.
May 2017: Data Science with Time-Series Data

Details

Welcome to the fifth event hosted by Data Science Slovenia! The main topic is time-series data.

We'll host Rok Piltaver (http://dis.ijs.si/rokpiltaver/) (Data Engineering @ Celtra) sharing his experience how to tame large streams of time-stampped data and Rundradeb Mitra (https://www.linkedin.com/in/mitrar/?ppe=1) (Evangelist @ timescaledb) presenting how TimeScaleDB could be leveraged to power applications with high-volume time-series data.

  1. Scalable analytics pipelines

In this talk we will illustrate general data engineering principles using a case study of analytics pipeline that receives 1TB of data per day and provides an API that returns aggregated metrics about the observed data in milliseconds. First, we will discuss infrastructure (Amazon Web Services) and software requirements for collecting large quantities of data reliably. Second, we will discuss how to use parallel computing (Apache Spark) to clean and reformat the collected data in a scalable way. We will also show the advantages of modern databases (Snowflake Cloud Data Warehouse) for storing the processed data. Third, we will show how to use the discussed tools to compute and store detailed pre-aggregated analytics data that does not fit into a traditional relational databases such as MySQL. Finally, we will show how to quickly answer analytics API requests using the pre-aggregated data.

About Rok Piltaver

Rok Piltaver (http://dis.ijs.si/rokpiltaver/) is a software developer in the analytics team at Celtra Inc. where he works on data engineering and data science tasks. He is also an AI researcher at Jozef Stefan Institute focused on data mining and intelligent systems. He received the best Slovenian innovation award for detecting unusual movement of personnel and equipment in high security buildings, the best collaboration with industry award for predicting failures of lab refrigerator devices, was part of the team that won activity recognition competition and the best paper award for person identification based on door acceleration. He designed and developed several other systems ranging from a robot that learns from humans to smart house that automatically adapts to residents’ needs and an intelligent tourist itinerary planner. In his PhD thesis, he developed a machine-learning algorithm that provides optimal trade-offs between classifier comprehensibility and accuracy.

  1. TimeScale Database for Time series data: Applications from IoT to AI

In past 24 months, Time series database has been the fastest growing database category. This interest is because of its applications in AI, IoT and many other sectors.

In this talk, the speaker will explain what is time series database and why do we need a separate database for this. Then he will explain why it is a good decision to make a time series database on SQL and not NoSQL.

About Rudradeb Mitra:

Rudradeb (https://www.linkedin.com/in/mitrar/?ppe=1)'s background is an AI researcher and had published 10 research papers on various AI topics including language processing, semantic web, 5th generation languages, and multi-agent planning. After finishing his Masters from Univ. of Cambridge he went on to build 4 startups - two in Silicon Valley, one in UK and one in Belgium.

These days he is collaborating with an NYC startup (one of the co-founders is ex-MIT and the second co-founder in Prof. of Princeton) helping them evangelizing their database.

Photo of Data Science Slovenia group
Data Science Slovenia
See more events
Hekovnik
Tobačna ulica 5 · 1000 Ljubljana