The Hadoop Summit will be in Amsterdam again on April 2-3, 2014. There will be a NL-HUG meetup on the night before (yes, that would be April 1st). This is placeholder announcement to mark the date. Further information will follow when we finalize the agenda.
17.30: arrive, socialize, eat, drink
In-memory Computation and HDFS: What's Next?, Andrew Wang, Hadoop Committer
With increasing per-node RAM capacity, in-memory computation is becoming more and more attractive. Newer computational frameworks like Apache Spark are even explicitly designed to take advantage of this in-memory paradigm. However, where does this leave HDFS, which has was originally designed for spinning disks? In this talk, I'll discuss a number of recent changes in HDFS that help to address this gap, and extrapolate to future roadmap items that would further improve integration with frameworks like Spark.
A Quick Introduction to the Cascading Ecosystem, Chris Wensel, creator of Cascading
Cascading is an application development platform for building data applications on Hadoop. And the foundation for projects like Scalding, Lingual, and Cascalog.
Balancing Data Collection and Privacy, Doug Cutting, creator of Hadoop
Data has great potential to improve the world. For example, societies benefit from healthy, educated citizens. Healthcare and education improve when new patterns are discovered. However some valuable patterns are only discoverable through collection of personal data. Complete anonymization is impossible in many cases. Thus to obtain benefits, we must collect personalized data. Yet we must also avoid becoming a surveillance society, with reduced privacy. I will suggest some ideas for how we can both obtain the benefits of data collection without discarding our privacy.
20.30 - later: Drinks and socialize more