- Machine Learning Design Patterns
I am pleased to announce the following event. Lak Lakshmanan, Director (Global Head), Data Analytics and AI Solutions on Google Cloud, will speak about Machine Learning Design Patterns. Below is the abstract along with a more complete speaker bio.
I would like to thank the sponsor of this talk, O'Reilly Media, who has graciously offered 5 free copies of the book authored by Lak (Machine Learning Design Patterns). I will draw 5 names at random from the people who attend the talk to receive this generous gift.
Hope to see everyone there!
Design patterns are formalized best practices to solve common problems when designing a software system. As machine learning moves from being a research discipline to a software one, it is useful to catalog tried-and-proven methods to help engineers tackle frequently occurring problems that crop up during the ML process. In this talk, I will cover a few patterns that are useful in the context of data representation -- of incomplete vocabularies or dealing with schema changes. The patterns are drawn from the book "Machine Learning Design Patterns" published by O'Reilly Media.
Lak is the Director for Data Analytics and AI Solutions on Google Cloud. His team builds software solutions for business problems using Google Cloud's data analytics and machine learning products. He founded Google's Advanced Solutions Lab ML Immersion program and is the author of three O'Reilly books and several Coursera courses. Before Google, Lak was a Director of Data Science at Climate Corporation and a Research Scientist at NOAA.
Follow him on Twitter at @lak_gcp, read articles by him on Medium, and see more details at www.vlakshman.com
- Data Science Career Panel Discussion
Welcome to 2021! For the second year in a row the UW-Madison is putting on a Data Science Bazaar. It is a virtual event last year and events are spread out from Feb 3 - Feb 25. There is a link further below on how to register.
This specific virtual event is for a Data Science Career Panel. The panelists will cover things like what their roles entail, where they see the field going, and advice for folks wanting to get into data science.
Here is a link to EventBrite to register for the entire Data Science Bazaar, of which this panel is just one event. Please note that you need to register to get the Zoom link. https://www.eventbrite.com/e/2021-data-science-research-bazaar-tickets-134421276657
Here are the panelists.
Aravind Moorthy, PhD - Dr. Aravind Moorthy comes to data science with a background in both computer science (BA Computer Science, Rice University) and economics (Phd Economics, UCLA). Before arriving at Valve, where he studies video games and gamers using data from Steam, Dr. Moorthy used data science to study international development programs. Why, when he was your age, he used to trek 10 miles through the Sahara Desert to copy database records by hand in a dusty storeroom using a dull pencil (true story!). He is also a meditation teacher in the Seattle area.
Jacqueline Nolis, PhD - Dr. Jacqueline Nolis is a data science leader with over 15 years of experience in leading data science teams and projects at companies ranging from DSW to Airbnb. She has a PhD in Industrial Engineering, and co-authored the book Build a Career in Data Science. For fun Jacqueline likes to use data science for humor—like using deep learning to generate offensive license plates.
Pitt Fagan - Pitt has an MS in Biometry (statistics) and in Soil Science from UW-Madison. He has had many roles in the tech industry over the past 21 years before focusing on data science. He is a Senior Data Scientist for Zendesk, a customer support tech company headquartered in San Francisco. He also runs this meetup!
Sophia Liu - Sophia graduated from Carnegie Mellon University with a master's degree in Statistical Practice. She worked as a Statistical Analyst for WCER (Wisconsin Center for Educational Research) at UW-Madison and then an R&D Analyst in the auto insurance field, before becoming a Data Scientist for Amazon.
- Lessons Learned Leading Data & Why It’s Not As Different As Many Think
SPECIAL EVENT - THIS IS OUR 100th MEETUP!!
Thank you to all of the presenters, attendees and sponsors over the past 7.5 years that have come together to form a wonderful data community here in Madison.
Computer Engineering has come a long way in the last 30 years. And we’re not just talking Moore’s law and how much faster, smaller, and storage-laden computers have become, not to mention that whole Cloud thing. Ignoring all that, we’ve learned much better how to build software and data products better, faster, and more customer centric-ally.
Interestingly, not all of the improvements and innovations in how have been equally applied across the computer engineering disciplines. In this talk, Peter will focus on the hows that have worked – and some that have not worked - in leading data engineering/data science/machine learning projects at Amazon, Groupon and others.
Peter Commons is VP Engineering at Zendesk where he is the Madison, WI site leader focused on a wide range of technical & leadership areas within the company.
Prior to Zendesk, Peter has driven design and development of customer-impacting and revenue-driving software for multi-billion-dollar companies. He has led diverse, worldwide, product, engineering, and data teams from back end to front end for internal and external customers in CTO, CPO, and VP level roles.
After growing up on the West Coast, Peter moved to the Midwest in early 2018 (first Chicago and now Madison), having realized how amazing a part of the country this is to live, work & play.
I would like to thank Zendesk for the presenter and meetup support.
- Waveform AI in Healthcare: Terabyte scale data, infrastructure, and challenges
We’ll look at how our approaches have continued to evolve along with the size of our data, from gigabytes to terabytes to hundreds of terabytes. In considering both our ML models and the underlying infrastructure, our needs have changed as well as out of the box offerings from AWS/GCP. How we think about staying innovative in a fast paced research environment and some of the challenging problems of building algorithms from human labeled healthcare data.
Sam Rusk is co-founder and Head of AI at EnsoData, a startup with technology using AI and machine learning to analyze wave-form data to save clinicians time on labor-intensive, complex data interpretation and help them across the care continuum. He co-founded EnsoData after completing his degree in Electrical Engineering at University of Wisconsin-Madison in 2014 and has since been leading Enso’s research efforts over hundreds of terabytes of data, transferring bleeding edge AI publications and infrastructure to clinical workflows, and building out an incredible team along the way.
I would like to thank EnsoData for the presenter and Zendesk for the meetup support.
- Data Science in the Pandemics Age
Data scientists at the American Family Insurance Data Science Institute quickly pivoted in mid-March to offer their expertise to state and local health professional around multiple aspects of the COVID-19 pandemic. Early work focused on projecting hospital bed capacity, which built communication channels that morphed into an advising role for the state and UW System emergency planning bodies. Lately, this DSI COVID-19 Research Group is collaborating with WI Department of Health Services on visual dashboards to help state and county health workers better diagnose COVID-19 trends around the state. This group’s Thursday seminars bring in speakers about data models and pandemic health challenges, such as campus reopening and effect of COVID-19 on prison populations and industries.
Dr. Brian Yandell
David R. Anderson Interim Director, American Family Insurance Data Science Institute
I would like to thank the UW Data Science Institute for the presenter and Zendesk for the meetup support.
- How Confluent Helps Customers Achieve Streaming ETL
Data integration in architectures built on static, update-in-place datastores inevitably end up with pathologically high degrees of coupling and poor scalability. This has been the standard practice for decades, as we attempt to build data pipelines on top of databases that do a poor job modelling the fundamental objects that drive our businesses and systems: events.
Events carry both notification and state, and form a powerful primitive on which to build systems for developers and data engineers alike. Developers benefit from the asynchronous communication that events enable between services, and data engineers benefit from the integration capabilities. Everyone gains from using the standards-based, scalable and resilient streaming platform.
In this talk, we’ll discuss the concepts of events, their relevance to both software engineers and data engineers and their ability to unify architectures in a powerful way. We’ll see how stream processing makes sense in both a microservices and ETL environment, and why analytics, data integration and ETL fit naturally into a streaming world.
Brian Likosar (“Liko” pronounced lick-OH) is an open source geek with a passion for working at the intersection of people and technology. He has spoken on security topics at Kafka Summit, which is one of the top viewed recordings from that session. Prior to joining Confluent, he spent 10 years at Red Hat helping folks with Linux, OpenShift, and Ansible. He’s based in the Chicago area and enjoys sports, live music, and theme parks.
I would like to thank Confluent for the presenter and Zendesk for the meetup support.
- Data Science Careers : A Primer for Academics looking to Switch to Industry
Time to announce the next event, which is a special collaboration between this meetup and the Women in Big Data group. Please see below for details on the presentation and the speaker.
PLEASE REGISTER with Women in Big Data group for this event: https://www.meetup.com/Women-in-Big-Data-Wisconsin-Chapter/events/265858687/
PLEASE NOTE THAT THIS EVENT WILL HAPPEN ONLINE NOW!
Dr. J. Pocahontas Olson (Pokie) is a data scientist with the Data Science Analytics Lab at American Family Insurance. In her tenure at American Family Insurance, she has worked on a variety of projects, centering on NLP but also other endeavors such as latent trait modeling for the charitable analysis of financial insecurity in Wisconsin (http://insecurity-survey-wi.amfamlabs.com/).
Pokie earned an M.S. and Ph. D. degree in theoretical physics from the University of Notre Dame. Her research on numerically simulating the collapse of massive stars on the university’s cloud computing cluster first sparked her interest in the challenges of big data, distributed systems and making predictions at scale. Intrigued, she decided to pursue a data science career in industry. After an intensive data science boot camp, she became the first member of the data science team at Virtustream, a cloud service provider seeking to use machine learning to improve latency and storage requirements for their petabyte data warehouses. In early 2017 she started at American Family, where she gets to search for insights in data, still drawing on the resources of the cloud.
Event co-organizer: Big Data Madison Meetup (https://www.meetup.com/BigDataMadison/)
Sponsors: We would like to thank the Data-Driven Wisconsin Conference for the use of their Zoom link.
- Data Engineering with Airflow, R and Postgres at Education Analytics
Education Analytics (EA) partners with the CORE Districts—a consortium of eight school districts in California that serve more than 1 million students attending around 1,500 schools—to provide actionable metrics to district partners and stakeholders. To deliver timely data, our team at EA has built a data pipeline that uses the Python package Apache Airflow, the statistical programming language R, and PostgreSQL databases. We use Airflow to schedule runs of the system and to determine which new data to process, we use R to process data and calculate metrics, and we use PostgreSQL to store data in a custom longitudinal research data warehouse. This data feeds a custom, user-centered dashboard as well as other analytics and reports oriented around continuous improvement for the CORE districts. This data pipeline has become an integral part of the work that the CORE districts do in their improvement communities.
Some of the challenges we faced in building this system include (1) passing information between Python and R for logging, conditional execution, and error handling; (2) automating the processing of complex statistical methods like causal estimates of school effects on student outcomes and long term predictive models; and (3) designing robust quality control processes for automated systems. In this discussion, we share some lessons learned about the solutions we have arrived upon and preview some challenges we continue to work on solving.
Jordan Mader is the Director of Analytics Engineering at Education Analytics. Jordan currently manages a team that specializes in building software for complex statistical analyses and automating data processing systems for analytics to help school districts and states use timely data to make better decisions. Jordan holds a B.A. in Economics and History from the University of Wisconsin-Madison.
I would like to thank American Family for the food and Cloudera for an after meetup round of drinks.
- UW Data Science Bazaar (Day 2)
Below is the site for this event. Please note that registration for this event is closed because all open slots have been filled).
- UW Data Science Bazaar (Day 1)
Below is the site for this event. Please note that registration for this event is closed because all open slots have been filled).