December Presentation Night hosted by Quantum Black


Details
Quantum Black has agreed to sponsor our December Meetup. They will provide pizza and beer! Please bring an ID to get into the building.
1st Talk: Evolving Architectures: Monolith & Batch to Microservices & Streaming
Talk Summary:
Benefits and challenges of McGraw-Hill Education's ongoing transition from a monolithic architecture and batch ETL, to microservices, kafka, and spark structured streaming.
Walkthrough and demo of streaming event ingestion to parquet & postgres as well as approaches for handling backfill operations and fixing bad data.
1st Speaker Bio:
Matt stumbled into the EdTech and Analytics space as a lowly QA intern and stuck with it. Now leading streaming analytics efforts at McGraw-Hill Education he spends a lot of time thinking about the magic sauce that makes high performing teams, and how to enable continuous delivery in the tricky data/analytics space.
2nd Talk
Abstract:
DFDL (Data Format Description Language), and Apache Daffodil (Incubating) - How this new standard and implementation will solve the data format problem. This talk will cover what DFDL and Daffodil are, why they are important, and how this changes the game for systems that intake/export data from a wide array of complex data formats. Quick demonstration of Daffodil working with Spark will be shown.
Some of this material will be reprised from the ApacheCon NA 2018 talk on DFDL/Daffodil.
Slides pdf here: https://s.apache.org/G5M2 with audio here: LinkedInt (https://lnkd.in/e222P-R)
2nd Speaker Bio
Mike Beckerle is a committer on the Apache Daffodil (incubating) project and is co-chair of the DFDL working group of the Open Grid Forum, and a primary author of the DFDL standard. He has 25+ years of industry experience mostly in parallel processing for commercial data applications. He likes programming in Scala, and doing "data archeology" - figuring out obscure data formats, and has a love/hate relationship with XML.
(https://www.linkedin.com/in/mbeckerle/)
3rd Talk
Abstract:
Leverage Spark to find the nearest neighbor when using geospatial data. The talk will cover different approaches including using geohashes to optimize the joins.
Bio:
Prashant Yadav as a Senior Data Engineer and works closely with clients and Data Scientists in order to curate and transform data to construct complex features which feed into the analytics models.

December Presentation Night hosted by Quantum Black