Skip to content

Enriching Data using PySpark and Hive in a Cloud Environment

Photo of Future of Data
Hosted By
Future of D. and Nicolas P.
Enriching Data using PySpark and Hive in a Cloud Environment

Details

In this meetup, we’re going to put ourselves in the shoes of an electric car manufacturer that is deploying a recently developed electric motor out into their new cars. In an effort to track and analyze this new expensive motor, we’ll show how we can use PySpark to take data from multiple locations within the company’s data warehouse, stitch the data together, and ultimately create an enriched dataset that can be used to solve both engineering and business challenges. As a bonus, the data engineering platform we’ll use will let us easily monitor our data processes from one centralized location, all within a native cloud environment using the Cloudera Data Platform.
Come join us to see how we’ve linked all these concepts together and hopefully inspire similar solutions of your own!

For a preview of the content we'll be covering, we've got the following resources:

Video:
https://youtu.be/dXu4hZAeI8E

Blog:
https://blog.cloudera.com/next-stop-building-a-data-pipeline-from-edge-to-insight

Tutorial:
https://www.cloudera.com/tutorials/enrich-data-using-cloudera-data-engineering.html?utm_source=mktg-community&utm_medium=meetup

Cloudera Users Page:
https://www.cloudera.com/users.html

Due to the ongoing nature of the new corona virus pandemic, this will be an online event. Use the hyperlink provided to participants upon registration to view and interact with the "live stream".

Photo of Future of Data: Austin group
Future of Data: Austin
See more events