Skip to content

Hadoop Can’t Query? (PostgreSQL Features)

Photo of Joshua Drake
Hosted By
Joshua D.
Hadoop Can’t Query? (PostgreSQL Features)

Details

With the explosion of data stores and cloud services, data now resides across many disparate systems and in a variety of formats. When multiple data sets exist in external systems, it is often necessary to perform a lengthy ETL (extract, transform, load) operation to get data into the database. But what if we only needed a small subset of the data? What if we only want to query the data to answer a specific question or to create a specific visualization? In this case, it's often more efficient to join data sets remotely and return only the results, rather than negotiate the time and storage requirements of performing a rather expensive full data load operation.

This talk explores Platform Extension Framework (PXF), an open-source project that enables users to query heterogeneous data sources via pre-built connectors. PXF's architecture enables users to efficiently query large datasets from multiple external sources, without requiring those datasets be loaded into Greenplum - a Postgres based MPP solution.

Photo of Houston Postgres group
Houston Postgres
See more events