Past Meetup

Probabilistic Data Structures

This Meetup is past

26 people went

Location image of event venue

Details

The next Meetup will be about probabilistic data structures, what they are and what they are used for. It is thought more as an overview of the different data structures than of sophisticated use cases.

The following article provides a good overview about the topic: [1]

Wikipedia is also a good starting point [2]

Some (definitely not all) interesting papers are:

• Bloom filter: Space/Time Trade-offs in Hash Coding with Allowable Errors [3]

• Skip list: Skip Lists: A Probabilistic Alternative to Balanced Trees [4]

• Count–min sketch: An Improved Data Stream Summary: The Count-Min Sketch and its Applications [5]

• Hidden Markov Models: [6], [7], [8]

Agenda:

19:00 - 19:30 Socializing

19:30 - 20:30 Talks

Bloom Filters - Stefan Seelmann

Count-Min Sketch - Alex Petrov

Hidden Markov Models - Juan Miguel Cejula

20:30 - 20:45 Break

20:45 - 21:30 Discussion

For the rest of the evening, starting to create something would be nice (for those who don't get enough). Some ideas:

• Implementing one or multiple data structures in a language of choice

• Creating browser based animations how some of the data structures work

• Evaluating existing implementations

• Outline for a (non-scientific) magazine article or conference talk

[1] https://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/

[2] http://en.wikipedia.org/wiki/Category:Probabilistic_data_structures

[3] http://astrometry.net/svn/trunk/documents/papers/dstn-review/papers/bloom1970.pdf

[4] ftp://ftp.cs.umd.edu/pub/skipLists/skiplists.pdf

[5] http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf

[6] http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf

[7] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2766791/

[8] http://cit.mak.ac.ug/staff/pnabende/papers/cisse08_pnabende.pdf