ADS Drinks & Data: CIDR meets ADS
Details
Mark your calendars for CIDR meets ADS on Wednesday, January 11th 2023! ADS would like to invite you to this special meetup in collaboration with CIDR at Zurich Room in the Mövenpick Hotel in Amsterdam.
The Conference on Innovative Data Systems Research (CIDR) is a systems-oriented conference, emphasizing the systems architecture perspective. It is complementary in its mission to mainstream database conferences like SIGMOD and VLDB.
ADS and CIDR are organizing a free meetup at the conference venue, to close off the last day of the conference. This will take advantage of the presence of prominent data systems researchers who are visiting CIDR. The list of speakers includes world-class researchers in this field.
*
Chair
Peter Boncz, professor of Large-Scale Analytical Data Management at Vrije Universiteit Amsterdam and senior researcher in the Database Architectures Group at CWI.
*
Talk #1 by Ana Klimovic (ETH Zurich) - Scalable Input Data Processing for Resource-Efficient ML
Abstract: Processing input data plays a vital role in ML training, impacting accuracy, throughput, and cost. This talk will discuss the characteristics of ML input pipelines, which have motivated the design of a new system architecture, in which we disaggregate input data processing from model training. I will present Cachew, a fully-managed service for ML data processing, built on top of Tensorflow's data loading framework, tf.data. Cachew's autoscaling and autocaching policies reduce end-to-end training time by up to 4.1x and total cost by up to 3.8x compared to scaling data processing resources with a traditional Kubernetes Horizontal Pod Autoscaler.
*
Talk #2 by Matei Zaharia (Databricks, Stanford University) - Natural Language Meets Query Processing
Abstract: Natural language processing (NLP) is advancing, creating exciting new tools for building applications. Everyone is aware of language models (LMs) that can generate text as one such tool, but there is also great progress in information retrieval (IR) using neural networks. I’ll present research from my group that combines these tools to create systems that are more powerful than either LMs or IR alone — for example, systems that use the “reasoning” power of LMs and the ability to search over a large corpus to answer complex questions with information from multiple sources.
For database system developers, NLP also provides exciting opportunities to improve the user interface (e.g., let users query data using natural language), as well as opportunities to optimize the infrastructure using techniques similar to traditional database systems. I’ll show a few examples of these ideas in action in systems from my group, such as ColBERT and PLAID (information retrieval models with state-of-the-art accuracy and computational performance), Baleen (complex queries using multi-hop retrieval), and DSP (a programming framework that combines text-processing operators including LMs and IR).
*
Talk #3 by Jordan Tigani (MotherDuck) - The end of "Big Data" and what the duck you can do about it
Abstract: The exponential growth of data sizes has been used to justify dramatically different techniques for handling data and novel architectures. However, 15 years into the "Big Data" craze, the vast majority of people don't have giant data sets, and the few that do tend to not process more than a modest amount of that data at a time. Moreover, advances in hardware mean that the threshold for what you'd consider "Big Data" in the first place has been increasing steadily. This means that over time, fewer workloads will need complex distributed architectures to handle them.
This talk will go through some of the assumptions in the modern data ecosystem and how these may not hold true in a world where data size is not a factor for most users. We will explore ways in which data can be a liability, and argue that organizations should consider constraining the amount of data collected and retained. Finally, the talk will discuss what kinds of things become possible with moderate data sizes, and how to keep them that way.
