Evening w/ Martin Odersky! (Scala in 2016) +Spark Approximates +Twitter Algebird
Details
We're super excited to announce that Martin Odersky - the father of Scala as well as my co-worker at the IBM Spark Technology Center, Jakob Odersky - has agreed to come speak at our meetup!
Topic: Scala in 2016
http://photos4.meetupstatic.com/photos/event/3/d/4/b/600_446175691.jpeg
Also, come enjoy a code-level deep dive into Spark + approximation algorithms and probabilistic data structures such as Count Min Sketch, HyperLogLog, BloomFilters, MinHash/Locality Sensitive Hashing (LSH), and DIMSUM sampling.
Agenda
6:30 - 7:00: Arrive and Mingle
7:00 - 8:00pm: Scala in 2016 (Martin Odersky)
"After a fairly quiet 2015, things are heating up this year. There are lots of new developments building, spanning foundations, compilers, libraries, and the organization of the community. In my talk I will give an outline of what's ahead."
8:00 - 8:30pm: Spark & Probabilistic Algorithms, Twitter Algebird, CountMin Sketch, HyperLogLog, BloomFilters, Locality Sensitive Hashing, DIMSUM Sampling, and a whole lot of demos!! (Chris Fregly, IBM Spark Technology Center)
8:30 - 9pm: Demingle and Leave
Related Links and Notes
http://web.stanford.edu/class/cs345a/slides/05-LSH.pdf
http://web.stanford.edu/class/cs345a/slides/04-highdim.pdf
http://goo.gl/VZf7oK
http://cdn.oreillystatic.com/en/assets/1/event/105/Algebra%20for%20Scalable%20Analytics%20Presentation.pdf
http://www.infoq.com/presentations/abstract-algebra-analytics
http://eugenezhulenev.com/talks/interactive-audience-analytics/
https://databricks.com/blog/2015/10/13/interactive-audience-analytics-with-spark-and-hyperloglog.html
https://gist.github.com/debasishg/8172796
https://www.mapr.com/blog/some-important-streaming-algorithms-you-should-know-about
Custom HyperLogLog:
Redis HLL:
http://antirez.com/news/75
Custom Aggregations:
- Closed API, may not work in next version of Spark
- All code must go under org.apache.spark.sql
- Use org.apache.spark.sql.catalyst.expression.Sum for example
http://research.neustar.biz/
LSH: http://twitter.github.io/algebird/index.html#com.twitter.algebird.MinHasher32
http://www.slaney.org/malcolm/yahoo/Slaney2008-LSHTutorial.pdf
http://www.michael-noll.com/blog/2013/12/02/twitter-algebird-monoid-monad-for-large-scala-data-analytics/
http://github.com/twitter/algebird
http://github.com/avibryant/simmer
http://esumitra.github.io/algebird-boston-spark/#
https://github.com/twitter/algebird/wiki/Learning-Algebird-Monoids-with-REPL
http://research.neustar.biz/tag/count-min-sketch/
http://research.neustar.biz/2012/10/25/sketch-of-the-day-hyperloglog-cornerstone-of-a-big-data-infrastructure/
Apache NiFi as Streaming Producer:
http://ingest.tips/2014/12/22/getting-started-with-apache-nifi/
https://blogs.apache.org/nifi/entry/integrating_apache_nifi_with_apache
http://donlehmanjr.com/Science/03%20Decay%20Ave/032.htm
