LineageDB Architecture for Big Data Analytics & Data Quality

Data Science & Business Analytics
Data Science & Business Analytics
Public group
Location image of event venue


University of Colorado Boulder - Wednesday December 3, 2014 @ 6:00pm MST

NOTE: For folks unable to attend in person register and we will email you a livestream link 2 hours prior to event.

Location: ATLAS -[masked]th St Bldg 223, Boulder, CO - Room 100 - Map:


6:00 - 6:20 Schmooze - Food shall be served in Lobby

6:20 - 6:30 Announcements

6:30 - 7:30 LineageDB Architecture for Big Data Analytics by Charles Clifford

7:30 - 8:30 Top 20 Data Quality Solutions for Data Science by Ken Farmer

8:30 - 9:30 Network at Old Chicago at 1102 Pearl St. (western end of Pearl Street pedestrian mall, directly facing Boulder Bookstore). Please support our sponsor, Old Chicago in Boulder, and make new friends. See:

LineageDB Architecture for Big Data Analytics - Abstract

The traditional approach to data analytic platforms are:

• tightly coupled to expensive relational data services;
• limited to star and snow-flake schema (notoriously difficult to maintain); and
• heavily dependent on brittle, expensive ETLs.

RDBMS can be scaled vertically (at a big price point), but eventually you run out of run-way because a b-tree does not scale linearly. The morphing of relational services into MPP appliances have resulted in platforms that are not flexible enough to support rapidly changing data analytic needs. These limitations in can be overcome by adopting the LineageDB architecture, a polyglot composed from loosely coupled, open-source:

• key-value storage service;
• index service;
• graph service;
• SQL service; and
• in-memory data service.

Charles Clifford - Bio

Charles Clifford has been designing and developing both transaction, as well as analytic, business solutions since the early 90s. He has delivered distributed solutions to a variety of industries, from tel-com, to capital markets, to health care, to software powerhouses. His current focus is on the design and delivery of DaaS solutions.

Top 20 Data Quality Solutions for Data Science - Abstract

Data quality continues to be one of the chief challenges, costs and reasons for project failure in data science. Problems in this space limit accuracy, destroy credibility and can result in harmful solutions. And unlike challenges such as scalability and cost it has seen no major breakthrough improvements. This presentation will cover the types of problems, as well as their impacts, causes and various solutions.

Ken Farmer - Bio

Ken Farmer is the senior data architect/wrangler/librarian for ProtectWise where he is developing their analytical data solution. Previously, he has developed, maintained, managed and consulted on analytical data architectures for IBM, MapQuest, Verizon, and others.