Skip to content

Presentation: Simplified Data Ingestion with Apache Daffodil and DFDL

Photo of Eli
Hosted By
Eli
Presentation: Simplified Data Ingestion with Apache Daffodil and DFDL

Details

/* Tonight's Presentation Summary: */

Apache Daffodil is a project accepted into the Apache Incubator in
August of 2017 with the goal of creating an open source implementation
of the Data Format Description Language (DFDL,
https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl). DFDL is a
specification capable of describing a wide variety of data formats
(binary, text, scientific, military, financial, and more) using a
simplified subset of W3C XML Schema. Daffodil uses these DFDL data
descriptions to "parse" data into XML or JSON, allowing for ingestion
into big data systems such as Apache NiFi, Apache Spark, and Apache
Storm, for analysis, modeling, and manipulation without the need to
create custom, one-off, and error prone data parsers. Daffodil is also
capable of serializing, or "unparsing", XML/JSON back to the original
data format, useful for tasks such as filtering data to anonymize PII,
data fuzzing, and data normalization. Daffodil is in use in many systems
today, using a diverse set of DFDL data descriptions, including some
that are publicly available on GitHub (https://github.com/DFDLSchemas)
for formats such as NACHA, PNG, PCAP, JPEG, HL7, and others.

In this talk, you will learn about the DFDL standard, including examples
of how to model various types of data, and how it can simplify and
standardize data ingestion. We will discuss the Apache Daffodil project,
its history, its goals, and where it is headed. Lastly, we will discuss
integration into big data systems, finally concluding with a demo
showing Apache Daffodil's use in Apache NiFi to parse data to XML,
ingest, manipulate, and unparse back to the original file format.

/* Evening Schedule: */

• 6:30pm - Pizza will be served thanks to One for All Events! Low key, come hang out and chat.
• 7:00pm - We will begin the evening presentation.
• 8:30pm - Time to close up and leave.

/* Location Details: */

The meeting will take place at the downtown FITCI location at 118 N Market Street (next to Brewer's Alley). On-street parking is available and free at that time of the evening, or the large Church Street parking garage is directly behind the building and available at a nominal fee ($2)

Photo of Frederick Web Technology Group group
Frederick Web Technology Group
See more events
FITCI (downtown location)
118 N. Market St. · Frederick, MD