Building A Self-Service Analytics Platform on Hadoop


Details
This session takes a deep dive into the architecture and implementation details of the advanced analytics platform. The designed framework enables self-service data intake, data processing and report/model generation by the business users using distributed hybrid-cloud data ingestor. The solution is architected in such a way that storage and compute have been decoupled and encourages the concept of BYOC (bring your own compute).
The platform uses cloud instances to run Hadoop distribution and leverages Amazon S3 as a data warehouse storage layer (data lake), Spark as an ETL engine and Spark SQL as a distributed query engine. Data visualization is performed via Spark SQL using Tableau. The platform supports both batch and streaming use cases and enables the end-users to make data-driven decisions with minimal IT engagement.
The session also highlights the lessons learned and the challenges that were overcome on both the business and technology front.
Technology stack: Cloud Technologies, Apache Hadoop Distribution (YARN, Navigator, Sentry), Spark, Kafka, Spark SQL, and Spark Streaming.
Speaker: Avinash Ramineni is a cofounder and the principal architect at Clairvoyant and leads the engineering efforts in the big data space. He is a passionate technologist with a drive to understand the bigger picture and vision and convert them into pragmatic, implementable solutions. Avinash has over 13 years of experience in engineering and architecting systems on a large scale. He specializes in providing solutions in the areas of big data, cloud, NoSQL, SOA, and event-driven architectures. Avinash holds an MS in computer science from Arizona State University.
Notes: IgnitionOne will be our host for this evening's event. There is street parking along the side of the building as well as a paid lot under the Investco building. We will have pizza + drinks and social networking at 6:00pm, with the technical presentation starting at 6:30pm.
This will be a combined meetup with the Atlanta Apache Spark User Group (https://www.meetup.com/Atlanta-Apache-Spark-User-Group/).

Sponsors
Building A Self-Service Analytics Platform on Hadoop