Structuring your data is an important part of your data pipeline as it enables analysts and business groups to freely analyze the data without the burden of communicating with every single group that generated data. This is an N-squared communication overhead problem as there are N producers of data and N consumers of data.
In this session, we'll discuss strategies for structuring your data when it comes from many difference sources in a variety of forms. We'll show you how to treat your schemas as first class citizens by leveraging message envelopes and schema registries as part of your workflow.
We'll then discuss how structured data gives rise to rich use cases such as enrichment and routing that can provide additional operational and business value to your organization.
Presenter Bio: Mike Trienis loves building data products that scale. That means implementing simple solutions with minimal maintenance through automation and eloquent designs. My software experience spans the full stack; from system level deployment to application implementation. In particular, I have spent quite a bit of time working with streaming technologies such as Apache Kafka and Apache Spark.