Big Data Science Meetup


11:00 A.M. - 11:30 P.M. Networking

11:30 A.M. - 12:10 P.M. Session 1

Title: Data Science On Apache Spark Made Easy

Abstract: Data Scientists and Data Engineers seeking to solve analytics and data processing challenges in Apache Spark encounter the following difficulties:

· Integrating various technology components (Sources, messaging queues, Spark, Hadoop or NoSQL data stores, Indexing and search, BI / Visualization tools)

· Ingesting various required data sources

· Data cleansing / quality

· ETL (Data transformation and Enrichment)

· Transferring / Migrating predictive models from platforms like R, SAS to Hadoop/ Spark

· Easily creating and testing Machine learning models

· Knowledge and skills to manually code in Scala / Java or other languages

· Being able to test and debug easily during the development process

· Run time: Lack of visibility into performance & functional errors

· Moving models / applications across different environments (Development, Test and Production)

· Version Management of Data Science Application on Spark (Roll back , Roll forward)

· Building Data visualization without coding

We will present and demonstrate an approach and a platform to solve all these problems and make Apache spark application development really easy.

Speaker: Anand Venugopal, head of Product Strategy for StreamAnalytix

Speaker Bio: Anand Venugopal heads Product Strategy for StreamAnalytix - an open source powered Enterprise grade Streaming Analytics platform developed by Impetus Technologies. Prior to this he built the Big Data Solutions practice at Impetus delivering high impact business solutions based on Big Data and Analytics to large enterprises in many Industry verticals. He has 20 years of software innovation and business development experience in the Enterprise and Telco space and is specifically passionate about Customer experience and Operational intelligence solutions using a combination of real-time and batch predictive analytics - leveraging open source technology stacks.

Impetus is focused on creating big business impact through Big Data Solutions for Fortune 1000 enterprises across multiple verticals. The company brings together a unique mix of software products, consulting services, Data Science capabilities and technology expertise. It offers full life-cycle services for Big Data implementations and real-time streaming analytics, including technology strategy, solution architecture, proof of concept, production implementation and on-going support to its clients.

StreamAnalytix is a product of Impetus Technologies, enables enterprises to analyse and respond to events in real-time at Big Data scale. Now featuring support for Apache Spark Streaming. It is currently the industry's only platform that provides the powerful advantage of offering users with multi-engine support-which provides the flexibility to match the choice of stream processing engine to the requirements of a particular use case.

12:10 P.M. - 12:15 P.M. Q/A

12:15 P.M. - 12:30 P.M. Break

12:30 P.M. - 1:00 P.M. Session 2

Title: Agile Data Science: Full-Stack Analytics App Dev

Speaker: Russell Jurney

Abstract: Agile Data Science 2.0 (O'Reilly 2017) defines a methodology and a software stack with which to apply the methods. *The methodology* seeks to deliver data products in short sprints by going meta and putting the focus on the applied research process itself. *The stack* is but an example of one meeting the requirements that it be utterly scalable and utterly efficient in use by application developers as well as data engineers. It includes everything needed to build a full-blown predictive system: Apache Spark, Apache Kafka, Apache Incubating Airflow, MongoDB, ElasticSearch, Apache Parquet, Python/Flask, JQuery. This talk will cover the full lifecycle of large data application development and will show how to use lessons from agile software engineering to apply data science using this full-stack to build better analytics applications. The entire process of big data application development is discussed. The system starts with plumbing, moving on to data tables, charts and search, through interactive reports, and building towards predictions in both batch and realtime (and defining the role for both), the deployment of predictive systems and how to iteratively improve predictions that prove valuable.

Speaker Bios: Russell Jurney is a principal consultant at Data Syndrome, a product analytics consultancy dedicated to advancing the adoption of the development methodology Agile Data Science, as outlined in the book Agile Data Science 2.0. I’ve worked as a data scientist building data products for over a decade, starting in interactive web visualization and then segwaying towards data products, machine learning and artificial intelligence at companies such as Ning, LinkedIn and Hortonworks. I am a self taught visualization software engineer, data engineer, data scientist, writer and most recently, I’m becoming a teacher. In addition to applied work building analytics products, Data Syndrome offers live and video training courses.

1:00 P.M. - 1:05 P.M. Q/A

1:05 P.M. - 1:30 P.M. Networking

Food and Drinks will be served.

Sponsored By: