Self-Tuning Data Systems


Details
Overview:
While we have been doing analysis of data forever, the problem of consuming data at scale (volume, veracity, velocity, and variety) continues to grow daily. According to Google, every 2 days we create as much data as we did from the dawn of humanity to 2003. There’s a good chance that we already have the data for the next big breakthrough… we just have to be able to extract the knowledge.
The Harvard Data Systems Lab conducts ongoing research in designing, tuning, and using data systems. We’ll talk a bit about what the lab is working on, my journey through grad school, the Harvard program and what it’s like to be in it, and talk about my area of research.
Data systems take on many forms and necessarily become more and more complex as breakthroughs occur. Think of it like this: the more complex the system, the more knobs to tune. The gap in expertise required to tune data systems is becoming untenable; more and more scarce while becoming increasingly complex.
I began my research in the database Kernel optimizing SQL system joins. We’ll briefly talk about how a system performs a join at the low level to illustrate the problem. We’ll move more deeply into the problem scans and indexes present. While joins are relevant to some systems, every system (SQL, NoSQL, Spark, Kafka, etc.) has indexes and scans. The margin of advantage between using an index and just scanning the entire set is becoming a much more interesting and relevant problem. While my thesis focuses on methods of using AI to get there, the real journey is in the discoveries along the way and decisions you have to make as a researcher when you gain new knowledge.
Bio:
Angelo Kastroulis is a consultant and entrepreneur that focuses on Health IT, AWS cloud computing, Big Data (Spark, SOLR, Kafka, and Cassandra), and Data Science (Machine Learning and Neural Networks). He’s helped companies like Disney, Walmart, Optum Health, and McKesson solve some tough problems. As a member of Harvard’s Data Systems Laboratory, his area of focus is self-tuning data systems.
As always, we'll have a great group of people, pizza, and beer.
Tentative Schedule:
5:30-6:00 Refreshments and Socializing
6:00-7:00 Self-Tuning Data Systems with Angelo Kastroulis
7:00-7:30 Closing Remarks and Questions

Self-Tuning Data Systems