Past Meetup

Big Data Meetup - ApacheCon Edition

This Meetup is past

91 people went

Location image of event venue

Details

* This event is co-organized with the Data Science meetup group. If this meetup is full, check out if there are still places at the Data Science Meetup (http://www.meetup.com/budapest_data_science/events/212171402/) page.*

ApacheCon (http://events.linuxfoundation.org/events/apachecon-europe) is in town and we are organizing a special meetup where you can meet other Big Data enthusiasts visiting or living in Budapest!

This is a co-organized event with the Data Science meetup group. If this meetup is full, check out if there are still places at the Data Science Meetup (http://www.meetup.com/budapest_data_science/events/212171402/) page. But please don't register at both places!

List of talks:

Colin McCabe: Feeding the Elephant - Optimizing the Read Path of the Hadoop Distributed Filesystem
The Hadoop Distributed Filesystem (HDFS) is a key component of the Hadoop distributed computation framework. I'd like to talk about some important optimizations we made to the read path of HDFS, such as direct reads, short-circuit local reads, zero-copy reads, and HDFS caching. Along the way, I'll talk about lessons that I learned while working on HDFS, and emerging trends in data center hardware. Finally, I'll talk about some interesting ongoing and planned approaches to optimizing Hadoop and HDFS.

Colin McCabe is a Platform Software Engineer at Cloudera, where he works on HDFS and related technologies. He is a committer and PMC member on Hadoop. Prior to joining Cloudera, he worked on the Ceph Distributed Filesystem, and the Linux kernel, among other things. He studied Computer Science and Computer Engineering at Carnegie Mellon.

Martin Kleppmann: Lessons learnt from LinkedIn's data infrastructure

LinkedIn, with over 300 million members, has surmounted some interesting scalability challenges. Many core components of LinkedIn's data infrastructure are open source, so you can benefit from them too. This talk will give an overview of the approaches and tools that have proved successful at massive scale.

Martin is committer on Apache Samza and Apache Avro, and author of the O'Reilly book "Designing Data-Intensive Applications" (http://dataintensive.net). He has previously worked on data infrastructure at LinkedIn, and co-founded and sold two startups, Rapportive and Go Test It. His technical blog is at http://martin.kleppmann.com, and he's @martinkl on Twitter. He is based in Cambridge, UK.

Schedule:

18:30 Doors open

19:00 Talks start

21:00 Meetup finishes and we head to a nearby pub for some beers

This is an English speaking event. Venue, food and drinks are provided by Prezi.