Skip to content

Details

This is an online meetup. The Zoom link is https://zoom.us/j/8057306529

Please register the event so you can receive reminder from Meetup.

Topic:

Cardinality estimation using HyperLogLog with intersection support and Dask parallel computation

Abstract:

Cardinality is important to many business applications (e.g., counting the number of unique visitors to a website over a given amount of time). Cardinality estimation methods provided by HyperLogLog are a subclass of probabilistic data structures that approximate cardinality using hashing and other techniques internally to quickly answer an array of cardinality-related questions.

I’ll be speaking about a Python implementation of HyperLogLog that I’m modifying to work with Dask, a parallel computation package for Python. So far, the modifications have included serialization and adding the ability to get cardinality for intersections (HyperLogLog-proper calculates cardinality for unions only). I’ll be demonstrating how this could be used to quickly find relationships in large datasets and for data visualization dashboards.

Speaker Bio:

Scott Little is a data scientist working in digital marketing and has also taught Python and data science classes in Chicago. He has a PhD in Physics from the University of Toledo, where he specialized in thin-film photovoltaic solar cells. For fun he enjoys cycling, electronics, and predicting solar power from satellite imagery and ground photometer sensors.

Sponsors

Tegus by AlphaSense

Tegus by AlphaSense

Space and Food Sponsorship

W W Grainger Inc

W W Grainger Inc

Venue and food sponsor

Illinois Institute of Technology

Illinois Institute of Technology

Venue Sponsor

Adyen

Adyen

Financial Sponsor

You may also like