Capturing Spark Data Pipeline Errors with Monads

Are you going?

109 spots left

Share:

Details

Transformation and validation errors and warnings in a data-pipeline is an essential feature than an exception case. There are several different approaches to how we, as Data engineers, have done this -

1. Using Accumulators or materialized collection that's collected at the Driver
2. Appending an error column in the source Dataframe and collecting at the end of the transformation chain
3. Making side-effecting IO calls to log the errors to an external datastore

In this session, Arun introduces an alternative functional approach to solving this problem - using Writer Monads.