Skip to content

Cardinality estimation using HyperLogLog... (Online Meetup)

Photo of Ji Dong
Hosted By
Ji D. and Sou-Cheng T. C.
Cardinality estimation using HyperLogLog... (Online Meetup)

Details

This is an online meetup. The Zoom link is https://zoom.us/j/8057306529

Please register the event so you can receive reminder from Meetup.

Topic:

Cardinality estimation using HyperLogLog with intersection support and Dask parallel computation

Abstract:

Cardinality is important to many business applications (e.g., counting the number of unique visitors to a website over a given amount of time). Cardinality estimation methods provided by HyperLogLog are a subclass of probabilistic data structures that approximate cardinality using hashing and other techniques internally to quickly answer an array of cardinality-related questions.

I’ll be speaking about a Python implementation of HyperLogLog that I’m modifying to work with Dask, a parallel computation package for Python. So far, the modifications have included serialization and adding the ability to get cardinality for intersections (HyperLogLog-proper calculates cardinality for unions only). I’ll be demonstrating how this could be used to quickly find relationships in large datasets and for data visualization dashboards.

Speaker Bio:

Scott Little is a data scientist working in digital marketing and has also taught Python and data science classes in Chicago. He has a PhD in Physics from the University of Toledo, where he specialized in thin-film photovoltaic solar cells. For fun he enjoys cycling, electronics, and predicting solar power from satellite imagery and ground photometer sensors.

Photo of PyData Chicago group
PyData Chicago
See more events