Enriching Data using PySpark and Hive in a Cloud Environment


Details
In this meetup, we’re going to put ourselves in the shoes of an electric car manufacturer that is deploying a recently developed electric motor out into their new cars. In an effort to track and analyze this new expensive motor, we’ll show how we can use PySpark to take data from multiple locations within the company’s data warehouse, stitch the data together, and ultimately create an enriched dataset that can be used to solve both engineering and business challenges. As a bonus, the data engineering platform we’ll use will let us easily monitor our data processes from one centralized location, all within a native cloud environment using the Cloudera Data Platform.
Come join us to see how we’ve linked all these concepts together and hopefully inspire similar solutions of your own!
For a preview of the content we'll be covering, we've got the following resources:
Video:
https://youtu.be/dXu4hZAeI8E
Blog:
https://blog.cloudera.com/next-stop-building-a-data-pipeline-from-edge-to-insight
Cloudera Users Page:
https://www.cloudera.com/users.html
Due to the ongoing nature of the new corona virus pandemic, this will be an online event. Use the hyperlink provided to participants upon registration to view and interact with the "live stream".

Sponsors
Enriching Data using PySpark and Hive in a Cloud Environment