Cambridge Semantic Web Monthly Meetup

MIT Stata Center - Star Room

32 Vassar Street Star Room on 4th Floor · Cambridge, MA

Rm 32-D463, 4th floor, Star Conference Room, Stata Center, MIT

David Booth, Independent Consultant: The RDF Pipeline Framework: Automating Distributed, Dependency-Driven Data Pipelines

Semantic web technology is well suited for large-scale information integration problems involving multiple diverse data sources and sinks, each with its own data format, vocabulary and information requirements. The resulting data production processes often require a number of steps that must be repeated when source data changes -- often wastefully if only certain portions of the data changed. This presentation explains how distributed data production processes can be conveniently described in RDF as executable dependency graphs, using the RDF Pipeline Framework. Nodes in the graph can perform arbitrary processing and are cached automatically, thus avoiding unnecessary data regeneration. The framework is loosely coupled, using native protocols for efficient node-to-node communication when possible, while falling back to RESTful HTTP when necessary. It is data and programming language agnostic, using framework-supplied wrappers to allow pipeline developers to use their favorite languages and tools for node-specific processing.

A live demo of a simple data pipeline will be included.

The RDF Pipeline Framework is open source software available under an Apache 2.0 license.