Cardinality estimation using HyperLogLog... (Online Meetup)


Details
This is an online meetup. The Zoom link is https://zoom.us/j/8057306529
Please register the event so you can receive reminder from Meetup.
Topic:
Cardinality estimation using HyperLogLog with intersection support and Dask parallel computation
Abstract:
Cardinality is important to many business applications (e.g., counting the number of unique visitors to a website over a given amount of time). Cardinality estimation methods provided by HyperLogLog are a subclass of probabilistic data structures that approximate cardinality using hashing and other techniques internally to quickly answer an array of cardinality-related questions.
I’ll be speaking about a Python implementation of HyperLogLog that I’m modifying to work with Dask, a parallel computation package for Python. So far, the modifications have included serialization and adding the ability to get cardinality for intersections (HyperLogLog-proper calculates cardinality for unions only). I’ll be demonstrating how this could be used to quickly find relationships in large datasets and for data visualization dashboards.
Speaker Bio:
Scott Little is a data scientist working in digital marketing and has also taught Python and data science classes in Chicago. He has a PhD in Physics from the University of Toledo, where he specialized in thin-film photovoltaic solar cells. For fun he enjoys cycling, electronics, and predicting solar power from satellite imagery and ground photometer sensors.

Sponsors
Cardinality estimation using HyperLogLog... (Online Meetup)