Evening w/ Martin Odersky! (Scala in 2016) +Spark Approximates +Twitter Algebird


Details
We're super excited to announce that Martin Odersky - the father of Scala as well as my co-worker at the IBM Spark Technology Center, Jakob Odersky - has agreed to come speak at our meetup!
Topic: Scala in 2016
http://photos4.meetupstatic.com/photos/event/3/d/4/b/600_446175691.jpeg
Also, come enjoy a code-level deep dive into Spark + approximation algorithms and probabilistic data structures such as Count Min Sketch, HyperLogLog, BloomFilters, MinHash/Locality Sensitive Hashing (LSH), and DIMSUM sampling.
Agenda
6:30 - 7:00: Arrive and Mingle
7:00 - 8:00pm: Scala in 2016 (Martin Odersky)
"After a fairly quiet 2015, things are heating up this year. There are lots of new developments building, spanning foundations, compilers, libraries, and the organization of the community. In my talk I will give an outline of what's ahead."
8:00 - 8:30pm: Spark & Probabilistic Algorithms, Twitter Algebird, CountMin Sketch, HyperLogLog, BloomFilters, Locality Sensitive Hashing, DIMSUM Sampling, and a whole lot of demos!! (Chris Fregly, IBM Spark Technology Center)
8:30 - 9pm: Demingle and Leave
Related Links and Notes
http://web.stanford.edu/class/cs345a/slides/05-LSH.pdf
http://web.stanford.edu/class/cs345a/slides/04-highdim.pdf
http://www.infoq.com/presentations/abstract-algebra-analytics
http://eugenezhulenev.com/talks/interactive-audience-analytics/
https://gist.github.com/debasishg/8172796
https://www.mapr.com/blog/some-important-streaming-algorithms-you-should-know-about
Custom HyperLogLog:
Redis HLL:
Custom Aggregations:
-
Closed API, may not work in next version of Spark
-
All code must go under org.apache.spark.sql
-
Use org.apache.spark.sql.catalyst.expression.Sum for example
LSH: http://twitter.github.io/algebird/index.html#com.twitter.algebird.MinHasher32
http://www.slaney.org/malcolm/yahoo/Slaney2008-LSHTutorial.pdf
http://github.com/twitter/algebird
http://github.com/avibryant/simmer
http://esumitra.github.io/algebird-boston-spark/#
https://github.com/twitter/algebird/wiki/Learning-Algebird-Monoids-with-REPL
http://research.neustar.biz/tag/count-min-sketch/
Apache NiFi as Streaming Producer:
http://ingest.tips/2014/12/22/getting-started-with-apache-nifi/
https://blogs.apache.org/nifi/entry/integrating_apache_nifi_with_apache

Evening w/ Martin Odersky! (Scala in 2016) +Spark Approximates +Twitter Algebird