Streaming Data Pipelines & Data Science in Healthcare


Details
**** PRE-REGISTER HERE: https://goo.gl/forms/3GsNFWl6hNPWN9242 Please complete this simple form to expedite the sign-in process.
We will be on the 5th Floor.
Please join us and MapR for this FREE event to learn more about exiting advancements in Healthcare using Apache Spark for Streaming Data Pipelines and Data Science to improve healthcare outcomes, improve access to appropriate care, better manage cost, and reduce fraud, waste and abuse.
*** TALK 1
CAROL MCDONALD
Solutions Architect
Carol McDonald is a Solutions Architect at MapR focusing on big data, Apache HBase, Apache Drill, Apache Spark, and machine learning in healthcare, finance, and telecom. Previously, Carol worked as a Technology Evangelist for Sun, an architect/developer on a large health information exchange, a large loan application for a leading bank, pharmaceutical applications for Roche, telecom applications for HP, OSI messaging applications for IBM, and SIGINT applications for the NSA. Carol holds an MS in computer science from the University of Tennessee and a BS in Geology from Vanderbilt University and is an O’Reilly Certified Spark Developer and Sun Certified Java Architect and Java Programmer. Carol is fluent in French and German.
ABSTRACT
In the past, big data was interacted with in batch on a once-a-day basis. Now data is dynamic and data driven businesses need instant results from continuously changing data. Data Pipelines, which combine real-time Stream processing with the collection, analysis and storage of large amounts of data, enable modern, real-time applications, analytics and reporting. In this talk we will:
- Use Apache Spark streaming to consume Medicare Open payments data using the Apache Kafka API
- Transform the streaming data into JSON format and save to the MapR-DB document database.
- Query and Analyze the data with Apache Spark SQL
*** TALK 2
DAVID BAUER, Ph.D.
Co-Founder & CTO
Dr. Bauer has been a foremost leader in Big Data and Distributed Computing in the U.S. intelligence community since 2005. He has dedicated his efforts to solving hard problems using Distributed Computing. He has pioneered code that has executed across 2 million CPUs and developed the first Cloud and Big Data Platform Certified & Accredited for use in the Federal Government for highly sensitive and classified data. Among some of the major challenges he has tackled include applying machine learning to Synthetic Biology to create Chemical Warfare sensors; developing the first MULTIINT data fusion platform using Big Data and AI for the U.S. Intelligence Community; developing the first distributed computing platform to demonstrate super-linear scalability up to 2 million CPUs at Lawrence Livermore; and developing a major report for the GAO and DHS National Communications System that combined analysis of the effects of pandemic influenza. Dr. Bauer has led the development in key areas for study in Exa-scale computing at DARPA.
ABSTRACT
Unified Deep Learning in a Secure End-to-End AI Platform - Production systems such as those found in the Healthcare space require data-level security for most analyses. In this talk we will be outlining the areas of the AI platform affected, and how we have addressed these issues to construct a robust, large-scale and highly secure capability for Deep Learning over Unified Data. We will briefly outline how we address Security Compliance for Certification and Accreditation, and data-level security throughout the data lifecycle: from Extract-Transform-Load (ETL) to Data Fusion to Exploratory Data Analysis (EDA) to Deep Learning.
=====================
Parking Options: http://bit.ly/2jPYKBY
WeWork Directions: https://www.wework.com/buildings/tysons--washington-DC
=====================
"WeWork is the platform for creators. We provide the space, community and services you need to create your life’s work. To learn more send an email to joinus@wework.com.”

Streaming Data Pipelines & Data Science in Healthcare