Skip to content

Fast-Data Meetup event

H
Hosted By
Harsha B.
Fast-Data Meetup event

Details

Agenda:

6:15 – 6:45 Social hour (Networking and food)

6:45 – 7:00 Welcome & Introductions

7:00 – 7:45 Big-data streaming Platform Eco-system

7:45 – 8:15 Cyber-security application on fast-data platform use case

8:15 – 8:30 Closing & Wrap-up.

Presentation topic- 1:

The Big Data Streaming Systems landscape is constantly changing, and many of the competing projects are complementary in nature. The current state of the art is to mix and match multiple systems to arrive at a complete end-to-end solution. In this session, we present one such architecture which is gaining in popularity in the community. In this architecture, Apache NiFi is used as a scalable first stage data gathering system. Once the streaming data is collected from multiple geographical locations, it is stored for staging in Apache Kafka. Then, either a real stream processing engine such as Apache Storm or a micro-batch streaming engine such as Apache Spark Streaming is used for real-time processing (filtering, database lookup and joins, projections, etc.) is used to format the incoming data. The results are then stored in a time-series database such as druid, which keeps the incoming data in time-windows to manage storage requirements, and finally an analytics framework such as Apache Spark is used to perform queries or machine learning tasks on the past window. We present a hands on demo on what this architecture looks like in action, and provide some best-practices knowledge.

Speaker Bios:

Reza Farivar is a Data Engineering Manager at Capital One, where he works on Big Data / Fast Data Cloud Computing platforms. Before joining Capital One, he was a senior software engineer at Yahoo working on Big/Fast Data platforms including Apache Storm and Spark. He has done both his PhD and postdoctoral works at the University of Illinois at Urbana-Champaign, with his research focusing on Big Data and Cloud platforms, programming models and the application of these technologies in diverse domains including finance, machine learning and bioinformatics. He holds a special interest in the application of specialized hardware accelerators such as GPUs in big data computing platforms. He is also a Research Assistant Professor at the Computer Science department of the University of Illinois, where he has been involved in research and teaching courses (including on coursera.org website) on Cloud Computing, Big Data and Operating Systems since 2011.

Presentation topic- 2:

: "Apache Metron is a Cybersecurity, Data-Analytics Platform. It is used for ingesting, parsing, enriching, and monitoring different types of security data in real-time. It also provides real-time alerts and automated responses, generates a "single pane of glass” to view all of the data, and has machine learning capabilities."

Speaker bio-

Jai Rao is a Director, Enterprise Data Services @ Capital One. Jai is currently leading the delivery of a cloud based Big Data Cyber Security solution based upon the Apache Metron project. Previously, he led the build out of Capital One’s Digital Analytics technology stack and led the delivery of Capital One Bank’s 1st Big Data project. Prior to Capital One, Jai worked in the Internet space leading development teams at AOL and PayPal

Photo of Fast Data DC (NoVA/MD/DC) group
Fast Data DC (NoVA/MD/DC)
See more events
McLean Auditorium
1680 Capital One Tower Dr · Mc Lean, VA