Ben Linsay on HyperLogLog & a PWLMini by Sandy Vanderbleek

Papers We Love
Papers We Love
Public group

Two Sigma

101 Ave. of the Americas, 23rd Fl. J · New York

How to find us

Cross Streets: Watt and Grand. Note: Please make sure you’re signed-up for the meetup, including your first and last name. Without this info you won’t be allowed into the building by security.

Location image of event venue


We're thrilled to host Ben Linsay, engineer extraordinaire, presenting on HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm ( by Flajolet, et. al.

In addition to Ben's talk, Sandy Vanderbleek will be opening the event with a lightning talk on Peter Norvig's Correcting A Widespread Error in Unification Algorithms (


• Ben Linsay on HyperLogLog

This extended abstract describes and analyses a near-optimal probabilistic algorithm, HyperLogLog, dedicated to estimating the number of distinct elements (the cardinality) of very large data ensembles. Using an auxiliary memory of m units (typically, "short bytes"), HyperLogLog performs a single pass over the data and produces an estimate of the cardinality such that the relative accuracy (the standard error) is typically about 1.04/√m. This improves on the best previously known cardinality estimator, LogLog, whose accuracy can be matched by consuming only 64% of the original memory. For instance, the new algorithm makes it possible to estimate cardinalities well beyond 10^9 with a typical accuracy of 2% while using a memory of only 1.5 kilobytes. The algorithm parallelizes optimally and adapts to the sliding window model.

• Sandy Vanderbleek on Correcting A Widespread Error in Unification Algorithms

Peter Norvig found an error in the unification algorithm presented in his AI textbooks and several others and wrote a brief paper about it. While his paper focuses on Lisp implementations of higher-order unification, I will restrict the problem to syntactic propositional unification and present the erroneous and correct algorithm in a pattern and substitution notation.


• Ben Linsay ( (@blinsay ( is somehow still a software engineer. He's worked on distributed data processing pipelines in adtech, built and maintained APIs for small startups, and has accidentally been a DBA twice. Ben has written a couple HyperLogLog implementations in his spare time and doesn't really want to show them to anyone.

• Sandy Vanderbleek ( has been a software engineer in industry and academia for 10 years. He is currently a Data Scientist at Publicis Media ( His research interests are formal methods and computational logic with applications to industry.


Doors open at 7:00 pm; the presentations will begin right at 7:30 pm; and, yes, there will be refreshments of all kinds and pizza.

You'll have to check-in with security with your Name/ID. Definitely sign-up if you’re going to attend–unfortunately people whose names aren’t entered into the security system in advance won’t be allowed in.

After Ben's presentation, we will open up the floor to discussion and questions.

We hope that you'll read some of the papers and references before the meetup, but don't stress if you can't. If you have any questions, thoughts, or related information, please visit #pwlnyc ( on slack (, our GitHub repository (, or add to the discussion on this event's thread.

Additionally, if you have any papers you want to add to the repository above (papers that you love!), please send us a pull request ( Also, if you have any ideas/questions about this meetup or the Papers-We-Love org, just open up an issue.

TwoSigma ( - Platinum Sponsor of the New York chapter