Approximation Techniques for multidimensional queries using Datasketches library


Details
Different data approximations/summaries allow the process and merge different sets of data independently, and then the summaries computed for each set can be combined to get summaries of various combinations of the datasets (union/intersection, etc). For massive datasets, this can prove to be beneficial since it is now possible to distribute and parallelize by partitioning data across machines and summarising the results.
In this talk, we will talk about the data sketches library, an open-source library for problems like unique counts, quantiles, frequent items, sampling, etc. We will cover a problem In which we leveraged sketches for multidimensional queries to count distinct items in massive datasets.
Speaker Bio:
Srimathi Harinaryanan, SDE @ Amazon.
Srimathi is a software engineer and has spent the last 10+ years of tech consulting and building scalable systems across domains and delivering tangible business value for customers across the globe. She is passionate about building products/platforms that create impact/user value.

Approximation Techniques for multidimensional queries using Datasketches library