Probabilistic Data Structures


Details
The next Meetup will be about probabilistic data structures, what they are and what they are used for. It is thought more as an overview of the different data structures than of sophisticated use cases.
The following article provides a good overview about the topic: [1]
Wikipedia is also a good starting point [2]
Some (definitely not all) interesting papers are:
• Bloom filter: Space/Time Trade-offs in Hash Coding with Allowable Errors [3]
• Skip list: Skip Lists: A Probabilistic Alternative to Balanced Trees [4]
• Count–min sketch: An Improved Data Stream Summary: The Count-Min Sketch and its Applications [5]
• Hidden Markov Models: [6], [7], [8]
Agenda:
19:00 - 19:30 Socializing
19:30 - 20:30 Talks
Bloom Filters - Stefan Seelmann
Count-Min Sketch - Alex Petrov
Hidden Markov Models - Juan Miguel Cejula
20:30 - 20:45 Break
20:45 - 21:30 Discussion
For the rest of the evening, starting to create something would be nice (for those who don't get enough). Some ideas:
• Implementing one or multiple data structures in a language of choice
• Creating browser based animations how some of the data structures work
• Evaluating existing implementations
• Outline for a (non-scientific) magazine article or conference talk
[1] https://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/
[2] http://en.wikipedia.org/wiki/Category:Probabilistic_data_structures
[3] http://astrometry.net/svn/trunk/documents/papers/dstn-review/papers/bloom1970.pdf
[4] ftp://ftp.cs.umd.edu/pub/skipLists/skiplists.pdf
[5] http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf
[6] http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf
[7] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2766791/
[8] http://cit.mak.ac.ug/staff/pnabende/papers/cisse08_pnabende.pdf

Probabilistic Data Structures