Skip to content

October Presentation Night

Photo of Matthew Farrellee
Hosted By
Matthew F. and 3 others
October Presentation Night

Details

Schedule

  • 6:00 - 6:30: Mingling. Food and drink are provided.
  • 6:30 - 6:45: Opening remarks and sponsor pitch.
  • 6:45 - 6:50: Lightning Talk
  • 6:50 - 7:30: Feature Talk

Lightning Talk: "Flintrock: A faster, better spark-ec2" - Nicholas Chammas

spark-ec2 is a handy little tool for spinning up Spark clusters on EC2, but it has a few frustrating problems that are difficult to solve within its current architecture. In this lightning talk, Nick will give a very quick overview of a project he is working on which reimagines what spark-ec2 might look like if it were rewritten from scratch.

Feature Talk: "Counting with Apache Spark and Algebird" - Edward Sumitra, Curriculum Associates

Many Big-Data computations involve counting different types of data. The open-source Algebird library from Twitter enables easily structuring computations as "counting" problems that can be parallelized and executed on map-reduce frameworks like Spark. The talk will briefly cover the properties of abstract algebraic types like semigroups, monoids and rings and how to leverage their properties using Algebird and Spark in counting problems. The talk will demonstrate the use of these abstract algebraic types in implementations of standard Big Data probabilistic counting algorithms like Count-Min-Sketch (top-K items) and Hyper-Log-Log (unique items) in Algebird and Spark.

Photo of Boston Data Technology (Boston Data Group/BDT) group
Boston Data Technology (Boston Data Group/BDT)
See more events