46th Bay Area Hadoop User Group (HUG) Monthly Meetup


Details
Detailed agenda and summaries to follow. General agenda:
6:00 - 6:30 - Socialize over food and beer(s) 6:30 - 7:00 - Privilege Isolation in Docker Containers 7:00 - 7:30 - Managing Hadoop Cluster with Apache Ambari 7:30 - 8:00 - Pushing the limits of Realtime Analytics using Druid
Session I (6:30 - 7:00 PM) – Privilege Isolation in Docker Containers
Containers are a fundamentally different form of virtualization designed to directly virtualize applications rather than the operating system. The absence of a guest OS layer makes containers extremely lightweight, leading to almost imperceptible runtime overhead and startup latencies, orders of magnitude higher scalability and simplified management. Additionally, the container model enables a number of use cases such as online OS upgrades which are only possible through application virtualization.
Docker is an ambitious program with a charter to make the container primitives of the kernel trivially accessible to end users. Docker achieves the goal in part through a highly intuitive user interface that hides the complexity of kernel configuration by choosing the most appropriate defaults. It also provides a community-curated repository of self-contained application images that can be portably run on any host, regardless of its underlying state and configuration.
In this talk, Dinesh Subhraveti presents a quick background on containers followed by Altiscale's recent contribution of user namespace support to make Docker containers secure for use in multitenant environments. User namespaces prevent containerized applications from compromising the security of the host or other containers by isolating the scope of their privilege to the container in which they run (see this blog for details.) This feature will be employed in Altiscale's purpose-built Hadoop as a Service to securely isolate Hadoop tasks of different tenant customers
Speaker: Dinesh Subhraveti, Principal, AltiScale
Bio:
Dinesh Subhraveti is responsible for the multi-tenancy and virtualization infrastructure at Altiscale. He developed the notion of Operating System level virtualization as a part of his Ph.D., which later came to be known in the industry as Containers. His work, published in OSDI 2002, demonstrated for the first time that enterprise applications could be virtualized and live-migrated.
Continuing his work on Containers, Dinesh drove industry's first Container virtualization product for enterprise Linux applications at Meiosys, the company behind Linux Containers that IBM acquired in 2005.
Dinesh has authored over 35 patents and papers in the areas of virtualization, storage and operating systems, and holds a B.E. degree in computer science from BITS-Pilani, India and M.S., M.Phil., and Ph.D. degrees in computer science from Columbia University, New York.
Session II (7:00 - 7:30 PM) – Managing Hadoop Cluster with Apache Ambari
The primary objective of Apache Ambari is to simplify the administration of Hadoop clusters. This entails addressing concerns like fault tolerance, security, availability, replication, zero touch automation and achieving all of this at enterprise scale. This talk focuses on key features that have been introduced in Ambari for cluster administration, and goes on to discuss existing concerns as well as future roadmap for the project. We will be demonstrating new features essential to managing enterprise scale clusters such as heterogenous cluster configuration using config-groups, maintenance mode for services, components and hosts, and bulk operations like decommission, recommission, rolling restart, and the ability to visualize Hive on Tez queries in the Ambari Web UI. Additionally, we will show how Ambari facilitates adding new Services, extending existing stacks, and extend the Web UI through the views framework
Speaker: Srimanth Gunturi, Software Developer, Hortonworks
Bio:
Srimanth Gunturi is an Apache Ambari committer and PMC member working at Hortonworks
Speaker: Sumit Mohanty, Member Technical Staff , Hortonworks
Bio:
Sumit Mohanty works at Hortonworks on the Apache Ambari project. He is an Apache Ambari Committer and PMC member
Session III (7:30 - 8:00 PM) - Pushing the limits of Realtime Analytics using Druid
By realtime we mean subsecond response, highly concurrent and realtime ingestion too. Real time Analytics has different facets each of which could be of equal importance if a system want to support a lot of users on huge data sets. Such system, if self-contained, should support:
-
Highly concurrent queries
-
Near real-time ingestion
-
Sub-second to few seconds response time
-
Diverse query needs
-
Good caching mechanism
At Yahoo Media Analytics team we’re striving to provide real time analytics for our publishers and internal users. Currently using traditional grid to MySQL pipeline and have been diving into proprietary software, internal tools, and open source options altogether to find out the best suit for our end users. One of the areas of focus has been Druid, an open-source analytics data store designed for real-time exploratory queries on large-scale event data. Druid provides a highly concurrent and distributed real-time analytics platform for us, of course with its own limitations. In this talk, we will give a brief introduction about Druid, and focus on how we’re using Druid and how we’re using flexibility on the Grid, combined with a SQL layer on top of Druid with join support, and a flexible API layer to push the borders of Druid even further.
Speakers: Reza Iranmanesh, Software Development Engineer, Yahoo
Reza Iranmanesh is a Software Engineer at Yahoo Media Analytics team. He has a keen interest in Data Mining and Realtime Analtyics. Before joining Yahoo he worked as a Software Engineer in the Search Platform at Chemical Abstract Service and has had a few years of Experience crunching Medical, Insurance, and wellness data for large firms and finding relationships between a population’s daily habits and their medical conditions at Genseq, Malaysia. He’s been winner of multiple programming contests during college years, including the second rank in the nation-wide ACM-ICPC contest, Malaysia
Speakers: Srikalyan Chandrashekar, Software Development Engineer, Yahoo
Srikalyan Chandrashekar is a software engineer at Yahoo! working in media engineering group’s analytics team. Has been writing software since 2005 after graduation in Electronics and Communications Engineering(2001-2005) and a Masters in Computer science(2008-2009). His interests are Big data with focus on data analytics. Has contributed some of his work to the open source community ( https://github.com/srikalyc /). Srikalyan lives in Milpitas, CA.
Yahoo Campus Map:
Detail map (http://photos4.meetupstatic.com/photos/event/2/8/e/d/600_21370477.jpeg)
Location on Wikimapia:
http://www.wikimapia.org/#lat=37.4181633&lon=-122.0250607&z=18&l=0&m=b&search=yahoo

Sponsors
46th Bay Area Hadoop User Group (HUG) Monthly Meetup