Skip to content

Big Data Week 2014 Meetup

Photo of Aaron Cordova
Hosted By
Aaron C.
Big Data Week 2014 Meetup

Details

Come out to the Apache Accumulo MeetUp at 1776 DC during Big Data week!

6:30 Networking, pizza and beer

7:00 Shell Improvements in Accumulo release 1.6 - Mike Drob, Cloudera

Apache Accumulo provides a convenient shell for performing most client operations. For many users, this is the first (and only) interface that they will use for interacting with Accumulo. Mike will give us a tour of improvements to this shell in the upcoming 1.6.0 release and show how its many new features can make your life simpler.

Mike Drob is an Apache Accumulo committer and Software Engineer for Cloudera. He is passionate about robust ops tooling and community involvement in open source. While not coding, Mike enjoys comic books, Ultimate, and exploring the neighborhoods around his home in Baltimore.

7:10 Lambda Architectures with Speed Learning - Ryan McKeown and Bradley Groff -Staff Technologists and Apache Spark users, Booz Allen Hamilton

"Lambda Architecture" is a term recently coined by Nathan Marz that allows for the combination of batch and real-time analytics by adding a speed layer to the architecture. This talk will provide a basic overview of lambda architectures with details of a specific project that implemented a lambda architecture by combining Hadoop 2.2, Accumulo 1.5 and Spark .9.

We will also discuss MLLib, a library for distributed machine learning applications with Spark. As the space of ML libraries becomes saturated, it's natural to wonder why this one deserves any particular attention. This presentation will give a brief overview of the capabilities of MLLib and will discuss its potential and realized advantages.

Ryan McKeown joined Booz Allen in September of last year after finishing his PhD in theoretical physics. Ryan has been doing high performance and distributed scientific computing for over 10 years. Since joining Ryan has become a full stack developer. He has been focusing on NoSql databases, cloud architectures, machine learning algorithms and interactive D3 visualizations for applications in the federal health space. He lives in Rockville with his wife and daughter.

Brad Groff joined Booz Allen in August of last year after finishing his PhD in mathematics. Since joining, he has been developing machine learning algorithms and visualizations for applications in the federal health space. He thought moving to DC from Chicago would mean less snow.

7:30 SQL-on-Accumulo - Don Miner, ClearEdge IT

Running SQL queries over data in Accumulo is easier said than done and has several nuanced design challenges that don't have clear answers. This talk will give an outline of the current state of the art in SQL-on-Accumulo technologies, while giving a realistic view on what is doable and what is not doable today.

Donald Miner is an avid user of Apache Hadoop and a practitioner of data science. He serves as Chief Technology Officer at ClearEdge IT Solutions, a company that provides Big Data professional services. He is author of the O’Reilly book MapReduce Design Patterns, which is based on his experiences as a MapReduce developer. Donald has architected and implemented a number of mission-critical and large-scale Hadoop systems within the U.S. Government and Fortune 500 companies. He received his PhD from the University of Maryland, Baltimore County in Computer Science, where he focused on Machine Learning and Multi-Agent Systems. He lives in Maryland with his wife and two young sons.

7:50 Using Accumulo to support Analytical Workflows - Aaron Cordova, Koverse Inc.

Apache Accumulo is designed with features that enable it to be a part of an analytical workflow at the center of an organization's data processing pipeline, including security labels and integration with MapReduce. Aaron discusses how to use Accumulo's unique features to best support this workflow, specifically to do data discovery, apply analytics, and real time delivery of results throughout the organization, while avoiding common pitfalls.

Aaron Cordova founded Apache Accumulo in 2008 and has been using it and other scalable distributed processing technologies to help organizations in several industries ask questions across and analyze all their data. Aaron is the CTO and co-founder of Koverse Inc.

8:10 Ryan Fishel - Apache Spark user, Cloudera

Photo of PipelineAI Advanced Spark and TensorFlow Meetup (Arlington) group
PipelineAI Advanced Spark and TensorFlow Meetup (Arlington)
See more events
1776 DC
1133 15TH ST. NW 12TH FLOOR · Washington, DC