Skip to content

Details

This is an online meetup. The Zoom link is https://zoom.us/j/8057306529

Please register the event so you can receive reminder from Meetup.

Topic:

Cardinality estimation using HyperLogLog with intersection support and Dask parallel computation

Abstract:

Cardinality is important to many business applications (e.g., counting the number of unique visitors to a website over a given amount of time). Cardinality estimation methods provided by HyperLogLog are a subclass of probabilistic data structures that approximate cardinality using hashing and other techniques internally to quickly answer an array of cardinality-related questions.

I’ll be speaking about a Python implementation of HyperLogLog that I’m modifying to work with Dask, a parallel computation package for Python. So far, the modifications have included serialization and adding the ability to get cardinality for intersections (HyperLogLog-proper calculates cardinality for unions only). I’ll be demonstrating how this could be used to quickly find relationships in large datasets and for data visualization dashboards.

Speaker Bio:

Scott Little is a data scientist working in digital marketing and has also taught Python and data science classes in Chicago. He has a PhD in Physics from the University of Toledo, where he specialized in thin-film photovoltaic solar cells. For fun he enjoys cycling, electronics, and predicting solar power from satellite imagery and ground photometer sensors.

Members are also interested in