Online: Introducing Datawave - Scalable Data Ingest and Query

Data Works MD
Data Works MD
Public group

Online event

Link visible for attendees


Big data storage can be challenging. Complex data models, scalability issues, and working with both structured and unstructured data. With Datawave, many of these issues are addressed with a flexible, scalable, and robust architecture that utilizes proven technologies such as Accumulo. Join us in July to learn what Datawave is and how it can help solve your big data needs.

12:00 PM -- Greetings

12:05 PM -- Introducing Datawave - Scalable Data Ingest and Query - Hannah Pellón

1:30 PM -- Closings

Zoom and YouTube Streaming
A link will be sent out prior to the event. Please note that Zoom is capped at 100, so if you do not get into Zoom, you will be able to watch via YouTube.

Introducing Datawave: Scalable Data Ingest and Query on Apache Accumulo
Out of the box, Accumulo's strengths are difficult to appreciate without first building an application that showcases its capabilities to handle massive amounts of data. Unfortunately, building such an application is non-trivial for many would-be users, which affects Accumulo's adoption.

In this talk, we introduce Datawave, a complete ingest, query, and analytic framework for Accumulo. Datawave, recently open-sourced by the National Security Agency, capitalizes on Accumulo's capabilities, provides an API for working with structured and unstructured data, and boasts a robust, flexible, and scalable backend.

We'll do a deep dive into Datawave's project layout, table structures, and APIs in addition to demonstrating the Datawave quickstart—a tool that makes it incredibly easy to hit the ground running with Accumulo and Datawave without having to develop a complete application.

Hannah Pellón received her B.S. in Mathematics from the University of Maryland while working as a software engineering intern at Northrop Grumman conducting RF signal analysis and spectrometry. She spent 11 years at Northrop Grumman thereafter contributing to IR&D efforts and programs centered around Accumulo and Hadoop. She is currently a software developer and lead at Tiber Technologies focusing on Datawave and distributed computing technologies

Datawave -