Have you heard the term "Data River" before? If not this is meetup should be a great introduction to Data Rivers. Apache Druid is the newest tool in your analytics toolkit. It is a distributed, column-oriented OLAP database with native SQL support. It delivers sub-second ad-hoc queries against both streaming and batch (Hadoop) data. Come and learn about Data Rivers and how you can use Apache Druid to build your own. Tijo Thomas, Imply solution architect, will be speaking about Druid and Imply. We will also have a presentation by our generous hosts at Zeotap as a Lead, Data Analytics will speak about how they have implemented Druid.
10:30 - 10:45: Network/Social over food and drinks
10:45 - 11:00: Welcome, opening remarks and announcements
11:00 - 11:40: Chaitanya Bendre, Analytics Lead at Zeotap
11:40 - 12:00: Prateek Shrivastava, Product Manager at Qubole
12:00 - 12:30: Tijo Thomas, Solutions Architect at Imply
12:30 - 01:00: Networking
Talk1: Real time aggregations on big data using Druid @Zetoap
Speaker A. Chaitanya Bendre
Chaitanya Bendre is a Lead engineer in Zetoap. He has graduated from BITS Pilani majoring in Computer Science. He has more than 5 years of experience working in big data. Currently, he manages Insights product at Zeotap and data science pipelines. He has been working with Druid in production for more than 2 years. He holds active interest in databases, functional programming and data science.
Speaker B. Nipun Jain
Nipun Jain is a Software Engineer at Zeotap. He has his undergraduate degree majoring in Computer Science from BITS Pilani. He has been working with Druid since a year and has successfully used it for solving Audience Estimation use case. He has a keen interest in databases and big data pipelines.
Talk 2: Realtime Ingestion Using Spark Streaming
Speaker: Prateek Shrivastava
Prateek is product manager for spark streaming product at Qubole.
Talk 3: Setting the stage for fast analytics with Druid
Speaker: Tijo Thomas
Tijo Thomas is a Solutions architect at Imply. He has post graduated from IIT Bombay majoring in Information Technology . He has more than 16 years of experience in software industry. Before joining Imply he was working at Cloudera. He has around 7 year of experience in various big data technologies.
Druid is an emerging standard in the data infrastructure world, designed for high-performance slice-and-dice analytics (“OLAP”) on large data sets. This talk is for you if you’re interested in learning more about pushing Druid’s analytical performance to the limit. Perhaps you’re already running Druid and are looking to speed up your deployment, or perhaps you aren’t familiar with Druid and are interested in learning the basics. Some of the tips in this talk are Druid-specific, but many of them will apply to any operational analytics technology stack.
The most important contributor to a fast analytical setup is getting the data model right. The talk will center around various choices you can make to prepare your data to get the best possible query performance.