Big Data Madison promotes the understanding and adoption of technologies used to acquire, store, and analyze data in all its forms. This spans everything from data engineering to data science.

Everyone is encouraged to attend, no level of experience is too basic to join and learn.

We will focus on some of the technologies used in the Big Data ecosystem (Hadoop, Spark, streaming data and data processing, etc), as well as topics in Data Science (machine learning, data visualization, analytics and more). We will try to balance the topics between technology talks, use cases, and demos.

Data Engineering with Airflow, R and Postgres at Education Analytics

Madison Central Public Library

Abstract: Education Analytics (EA) partners with the CORE Districts—a consortium of eight school districts in California that serve more than 1 million students attending around 1,500 schools—to provide actionable metrics to district partners and stakeholders. To deliver timely data, our team at EA has built a data pipeline that uses the Python package Apache Airflow, the statistical programming language R, and PostgreSQL databases. We use Airflow to schedule runs of the system and to determine which new data to process, we use R to process data and calculate metrics, and we use PostgreSQL to store data in a custom longitudinal research data warehouse. This data feeds a custom, user-centered dashboard as well as other analytics and reports oriented around continuous improvement for the CORE districts. This data pipeline has become an integral part of the work that the CORE districts do in their improvement communities. Some of the challenges we faced in building this system include (1) passing information between Python and R for logging, conditional execution, and error handling; (2) automating the processing of complex statistical methods like causal estimates of school effects on student outcomes and long term predictive models; and (3) designing robust quality control processes for automated systems. In this discussion, we share some lessons learned about the solutions we have arrived upon and preview some challenges we continue to work on solving. Bio: Jordan Mader is the Director of Analytics Engineering at Education Analytics. Jordan currently manages a team that specializes in building software for complex statistical analyses and automating data processing systems for analytics to help school districts and states use timely data to make better decisions. Jordan holds a B.A. in Economics and History from the University of Wisconsin-Madison. Sponsors: I would like to thank American Family for the food and Cloudera for an after meetup round of drinks.

How Confluent helps customers achieve Streaming ETL

Madison Central Public Library

Abstract: coming soon. Sponsors: I would like to thank Confluent for the food and drinks.

Data Science Careers : A Primer for Academics looking to Switch to Industry

American Family Insurance - DreamBank

Greetings everyone! Time to announce the next event, which is a special collaboration between this meetup and the Women in Big Data group. Please see below for details on the presentation and the speaker. PLEASE REGISTER with Women in Big Data group for this event: https://www.meetup.com/Women-in-Big-Data-Wisconsin-Chapter/events/265858687/ Cheers, Pitt Speaker Bio: Dr. J. Pocahontas Olson (Pokie) is a data scientist with the Data Science Analytics Lab at American Family Insurance. In her tenure at American Family Insurance, she has worked on a variety of projects, centering on NLP but also other endeavors such as latent trait modeling for the charitable analysis of financial insecurity in Wisconsin (http://insecurity-survey-wi.amfamlabs.com/). Pokie earned an M.S. and Ph. D. degree in theoretical physics from the University of Notre Dame. Her research on numerically simulating the collapse of massive stars on the university’s cloud computing cluster first sparked her interest in the challenges of big data, distributed systems and making predictions at scale. Intrigued, she decided to pursue a data science career in industry. After an intensive data science boot camp, she became the first member of the data science team at Virtustream, a cloud service provider seeking to use machine learning to improve latency and storage requirements for their petabyte data warehouses. In early 2017 she started at American Family, where she gets to search for insights in data, still drawing on the resources of the cloud. Event co-organizer: Big Data Madison Meetup (https://www.meetup.com/BigDataMadison/) Sponsors: We would like to thank American Family Insurance for the food, drink, and venue. NOTE: There will be non-sponsored drinks afterwards at Tangent with Pokie.

How a Kaggle Grandmaster won their competitions (H2O.ai)

Madison Central Public Library

Abstract: coming soon. Sponsors: I would like to thank H2O.ai for the food and drinks.

UW Data Science Bazaar (Day 2)

Wisconsin Institute for Discovery

