Skip to content

Recent Apache Hive enhancements powering enterprise data analytics

Recent Apache Hive enhancements powering enterprise data analytics

Details

Apache Hive is an established technology of the Hadoop eco-system used by hundreds of customers for their data warehousing and business analytics needs. But innovation never rests in this project.

In recent times, two specific areas have been receiving tremendous focus from the Hive community:

• Improving performance to power sub-second interactive analytics is now possible with a new execution model for Hive called LLAP (Low Latency Analytical Processing, a.k.a Live Long and Process).

• On another front, work is progressing to make Hive datasets highly available for business process continuity by virtue of replication options being introduced in the newest versions of Hive.

At Hortonworks Bangalore, we have committers on the Apache Hive project who are experts on these new areas and love to share their knowledge with others.

So, come and join us in this first meetup, where we will discuss business use cases, architectural details and performance statistics related to Hive LLAP and Hive replication.

In addition, we will also try to give a glimpse of what lies ahead in the Hadoop and BigData roadmap in products from Hortonworks.

Detailed Agenda

3:30 - 3:50: Registration, Networking

3:50 - 4:20: Hortonworks 3.0 and DPS (Arun Murthy)

4:20 - 4:45: Break, Snacks and networking

4:45 - 5:25: Apache Hive LLAP (Rajesh Balamohan)

5:30 - 6:00: Replication in Apache Hive (Anishek Agarwal)

(Timings are tentative, we'll try to stick to them as much as possible)

• Talk 1: Hortonworks 3.0 and DataPlane Services

Speaker: Arun C Murthy, Founder & Chief Product Officer, Hortonworks Bangalore

Duration: 30m

Recently, Hortonworks announced its first version of a new suite of Data Management products called Hortonworks DataPlane Service that provides an engine that enables multi-cluster, multi-data source management and an extensible framework for enabling data management applications. The first of these applications deals with Data replication and is called Data Lifecycle Manager.

The service and applications are based on open source technologies like Apache Hive, Apache Atlas, Apache Ranger, Apache Knox etc.

In this talk, Arun will cover the vision we at Hortonworks see for how data management services will evolve in future and how DataPlane Service will power that vision.

• Talk 2: Apache Hive LLAP for Near Realtime Querying

Speaker: Rajesh Balamohan

Duration: 40m

Abstract:

Hive LLAP (Long Live and Process) is Hive’s new architecture that delivers MPP performance at Hadoop scale through a combination of optimized in-memory caching and persistent query executors that scale elastically within YARN clusters. LLAP pushes query latencies to sub-second range. This would allow users to deploy interactive dashboard and explorative analytics that demands low latency needs. Since LLAP is an evolution of the Hive architecture it does all this with the same comprehensive ANSI standard SQL support and proven scale that Hive has long been known for. In this session, I would be discussing the improvements in LLAP which makes it run a lot more efficiently on-prem and in cloud based environments.

Speaker Intro:

Rajesh Balamohan is a performance engineer who is contributing to enhancing the performance of Hive, Tez, Spark. He is a committer on Hive and Tez projects. Rajesh is a Principal Engineer working at Hortonworks, Bangalore.

• Talk 3: Replication in Apache Hive

Speaker: Anishek Agarwal

Duration: 30m

Abstract:

Big Data has become a critical infrastructure piece for lot of enterprises. To this end most of them require DR (Disaster Recovery) capabilities for critical software components. Hive has been an integral part of the big data stack providing SQL on Hadoop and is a critical component for BigData processing in enterprises. In this talk we will see how hive replication works, its limitations and significant recent advancements, which will all be available in upcoming Apache Hive 3.0

Speaker Intro:

Anishek Agarwal works as a committer on Apache Hive focusing on replication. He has been working on large scale streaming platforms, analytics platforms, a few consumer facing applications as well as enterprise software. Anishek works at Hortonworks Bangalore.

Photo of Future of Data: Bangalore group
Future of Data: Bangalore
See more events
Hortonworks Data Platform India Pvt Limited
Salarpuria Infinity, 2nd Floor, Rear Wing,, #5 Bannerghatta Road, Bengaluru, 560029 · Bangalore