46th Bay Area Hadoop User Group (HUG) Monthly Meetup

 

 

Detailed agenda and summaries to follow. General agenda:

  • 6:00 - 6:30 - Socialize over food and beer(s)
  • 6:30 - 7:00 - Privilege Isolation in Docker Containers
  • 7:00 - 7:30 - Managing Hadoop Cluster with Apache Ambari
  • 7:30 - 8:00 - Pushing the limits of Realtime Analytics using Druid

 

Session I (6:30 - 7:00 PM) – Privilege Isolation in Docker Containers

Containers are a fundamentally different form of virtualization designed to directly virtualize applications rather than the operating system.  The absence of a guest OS layer makes containers extremely lightweight, leading to almost imperceptible runtime overhead and startup latencies, orders of magnitude higher scalability and simplified management.  Additionally, the container model enables a number of use cases such as online OS upgrades which are only possible through application virtualization.


Docker is an ambitious program with a charter to make the container primitives of the kernel trivially accessible to end users.  Docker achieves the goal in part through a highly intuitive user interface that hides the complexity of kernel configuration by choosing the most appropriate defaults.  It also provides a community-curated repository of self-contained application images that can be portably run on any host, regardless of its underlying state and configuration.


In this talk, Dinesh Subhraveti presents a quick background on containers followed by Altiscale's recent contribution of user namespace support to make Docker containers secure for use in multitenant environments. User namespaces prevent containerized applications from compromising the security of the host or other containers by isolating the scope of their privilege to the container in which they run (see this blog for details.) This feature will be employed in Altiscale's purpose-built Hadoop as a Service to securely isolate Hadoop tasks of different tenant customers

Speaker: Dinesh Subhraveti, Principal, AltiScale

Bio:

Dinesh Subhraveti is responsible for the multi-tenancy and virtualization infrastructure at Altiscale. He developed the notion of Operating System level virtualization as a part of his Ph.D., which later came to be known in the industry as Containers. His work, published in OSDI 2002, demonstrated for the first time that enterprise applications could be virtualized and live-migrated.

Continuing his work on Containers, Dinesh drove industry's first Container virtualization product for enterprise Linux applications at Meiosys, the company behind Linux Containers that IBM acquired in 2005.

Dinesh has authored over 35 patents and papers in the areas of virtualization, storage and operating systems, and holds a B.E. degree in computer science from BITS-Pilani, India and M.S., M.Phil., and Ph.D. degrees in computer science from Columbia University, New York.

 

Session II (7:00 - 7:30 PM) – Managing Hadoop Cluster with Apache Ambari

The primary objective of Apache Ambari is to simplify the administration of Hadoop clusters. This entails addressing concerns like fault tolerance, security, availability, replication, zero touch automation and achieving all of this at enterprise scale. This talk focuses on key features that have been introduced in Ambari for cluster administration, and goes on to discuss existing concerns as well as future roadmap for the project. We will be demonstrating new features essential to managing enterprise scale clusters such as heterogenous cluster configuration using config-groups, maintenance mode for services, components and hosts, and bulk operations like decommission, recommission, rolling restart, and the ability to visualize Hive on Tez queries in the Ambari Web UI. Additionally, we will show how Ambari facilitates adding new Services, extending existing stacks, and extend the Web UI through the views framework

Speaker: Srimanth Gunturi, Software Developer, Hortonworks

Bio:

Srimanth Gunturi is an Apache Ambari committer and PMC member working at Hortonworks

Speaker: Sumit Mohanty, Member Technical Staff , Hortonworks

Bio:

Sumit Mohanty works at Hortonworks on the Apache Ambari project. He is an Apache Ambari Committer and PMC member

 

Session III (7:30 - 8:00 PM) - Pushing the limits of Realtime Analytics using Druid

By realtime we mean subsecond response, highly concurrent and realtime ingestion too. Real time Analytics has different facets each of which could be of equal importance if a system want to support a lot of users on huge data sets. Such system, if self-contained, should support:

- Highly concurrent queries

- Near real-time ingestion

- Sub-second to few seconds response time

- Diverse query needs

- Good caching mechanism

At Yahoo Media Analytics team we’re striving to provide real time analytics for our publishers and internal users. Currently using traditional grid to MySQL pipeline and have been diving into proprietary software, internal tools, and open source options altogether to find out the best suit for our end users. One of the areas of focus has been Druid, an open-source analytics data store designed for real-time exploratory queries on large-scale event data. Druid provides a highly concurrent and distributed real-time analytics platform for us, of course with its own limitations. In this talk, we will give a brief introduction about Druid, and focus on how we’re using Druid and how we’re using flexibility on the Grid, combined with a SQL layer on top of Druid with join support, and a flexible API layer to push the borders of Druid even further.

Speakers: Reza Iranmanesh, Software Development Engineer, Yahoo

Reza Iranmanesh is a Software Engineer at Yahoo Media Analytics team. He has a keen interest in Data Mining and Realtime Analtyics. Before joining Yahoo he worked as a Software Engineer in the Search Platform at Chemical Abstract Service and has had a few years of Experience crunching Medical, Insurance, and wellness data for large firms and finding relationships between a population’s daily habits and their medical conditions at Genseq, Malaysia. He’s been winner of multiple programming contests during college years, including the second rank in the nation-wide ACM-ICPC contest, Malaysia

Speakers: Srikalyan Chandrashekar, Software Development Engineer, Yahoo

Srikalyan Chandrashekar is a software engineer at Yahoo! working in media engineering group’s analytics team. Has been writing software since 2005 after graduation in Electronics and Communications Engineering[masked]) and a Masters in Computer science[masked]). His interests are Big data with focus on  data analytics. Has contributed some of his work to the open source community (https://github.com/srikalyc/). Srikalyan lives in Milpitas, CA.

 

Yahoo Campus Map:

Detail map

 

Location on Wikimapia:

http://www.wikimapia.org/#lat=[masked]&lon=[masked]&z=18&l=0&m=b&search=yahoo

 

Join or login to comment.

  • Lee

    Attend 3 day Big Data Bootcamp starting Moday September 8,2014 @Santa Clara Convention Center

    To attend any One Day: Price $799 ( $200 discount, Use Discount code MEETUP)
    To attend any Two Days: Price $1099 ( $200 discount, Use Discount code MEETUP)
    To attend all Three Days: Price $1499 ( $200 discount, Use Discount code MEETUP)

    Discount expires on August 14th Register :http://bit.ly/1oE2DoT September 8th-10th

    Global Big Data Conference is offering 3 days extensive bootcamp(September 8th - 10th) on Big Data. This is a fast paced,vendor agnostic. No prior knowledge of databases or programming is assumed. Big Data Bootcamp is targeted towards both technical and non-technical people who want to understand the emerging world of Big Data, with a specific focus on Hadoop, NoSQL & Machine Learning. Attendees will experience real Hadoop clusters and the latest Hadoop distributions.

    August 11

  • Jim R.

    Do we have a link to the recording on Docker? Can't find it on YouTube

    August 8

  • Vamshi G.

    The talk on docker is pointing to Druid recording (https://www.youtube.com/watch?v=73enCUDQ1DA&list=UU4MJvi5SyXYnoorWVBTFJKQ&index=2). Can you please share the recording on docker?

    July 29

  • Yahoo! HUG O.

    The slides are now available on the Slideshare.net/YDN

    2 · July 18

  • Umar Farooq M.

    Have the slides been made available?

    July 18

  • Jacky L.

    Can we have the slides?

    July 17

  • Anoop D.

    Will this event be streamed live for remote attendees?

    1 · July 16

  • Mohammad-Mahdi M.

    Looking forward for this!

    June 16

Our Sponsors

  • Yahoo! Inc.

    Meeting space, pizza and drinks are sponsored by the Yahoo! Hadoop team.

People in this
Meetup are also in:

Imagine having a community behind you

Get started Learn more
Rafaël

We just grab a coffee and speak French. Some people have been coming every week for months... it creates a kind of warmth to the group.

Rafaël, started French Conversation Group

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy