
What we’re about
ODSC brings together the open source and data science communities with the goal of helping its members learn, connect and grow.
The focus of this Meetup group is to allow ODSC to work with Meetup groups, non-profits, and other organizations to present informative lectures, workshops, code sprints and networking events to help grow the use of open source languages and tools within the data science and data-centric community. As such, our specific goals are:
1. Build a collaborative group to work with other Meetup groups, non-profits, and other organizations.
2. Promote the use of open source languages and tools amongst data scientists and others.
3. Host educational workshops.
4. Spread awareness of new open source languages and tools that can be used in data science.
5. Contribute back to the open source community.
Who is this meetup for?
• Data engineers, analysts, scientists, and other practitioners
• R, Python and other software engineers who work with data or want to learn
• Data visualization developers and designers
• Non-technical team leads, executives, and other decision makers from data centric startups and large companies looking to utilize open source tools
Get Involved with our Meetups:
• Meetup/Webinar Speaker Submission Form https://forms.gle/STEDWxgWBMnLnt8F8
• Suggest a Meetup Topic Form
https://forms.gle/FAnBGMnC6puP1zLs6
• Volunteer Form
https://forms.gle/rJB2k8ZvU7mj1R3c8
• Host or Sponsor Form
https://forms.gle/bVdnzttfSuKkWrHq5
• Showcase your Startup Form
https://forms.gle/2Z31dmGPe7RTw28B9
ODSC Links:
• Get free access to more talks/trainings like this at Ai+ Training platform:
https://hubs.li/H0Zycsf0
• ODSC blog: https://opendatascience.com/
• Facebook: https://www.facebook.com/OPENDATASCI
• Twitter: https://twitter.com/_ODSC & @odsc
• LinkedIn: https://www.linkedin.com/company/open-data-science
• Slack Channel: https://hubs.li/Q02zdcSk0
• Code of conduct: https://odsc.com/code-of-conduct/
Upcoming events (1)
See all- WEBINAR "Differentially-Private Synthetic Data for Everyone"Link visible for attendees
Pre-registration is REQUIRED.
Add to your calendar - https://hubs.li/Q03lF-X-0In this hands-on session, you'll learn how to generate high-quality synthetic data that preserves privacy using differential privacy techniques. We’ll walk through how to train differentially private generative models with MOSTLY AI’s open-source Synthetic Data SDK and explore how this method compares to traditional anonymization approaches in terms of both utility and risk. You’ll gain practical insights into configuring privacy parameters, understanding the impact of privacy budgets, and evaluating synthetic data output.
We’ll also cover how to assess the fidelity of synthetic datasets using predictive and discriminative machine learning models, and how to create hybrid datasets that blend real and synthetic data for improved utility. Through live demonstrations and real-world examples, you’ll develop a strong understanding of the privacy-utility trade-offs and how to confidently apply privacy-safe synthetic data in your own data science workflows.Session Outline:
Lesson 1: Introduction to Differential Privacy Get familiar with the core concepts of differential privacy and how it differs from traditional anonymization techniques. By the end of this lesson, you’ll be able to explain what differential privacy is, what a privacy budget (epsilon) means, and why it provides stronger privacy guarantees than pseudonymization or masking.Lesson 2: Setting Up and Using the Synthetic Data SDK Learn how to install and configure MOSTLY AI’s open-source Synthetic Data SDK to generate synthetic datasets with differential privacy enabled. You’ll run the SDK in LOCAL mode using a prepared dataset, explore the configuration options for privacy settings, and review the structure of the synthetic output.
Lesson 3: Evaluating Utility vs. Privacy Trade-offs Compare synthetic datasets generated with different privacy settings to understand how utility is impacted by stricter privacy budgets. By the end of this lesson, you’ll be able to evaluate the usefulness of differentially private synthetic data using predictive models and summary statistics.
Lesson 4: Creating Hybrid Datasets with Real and Synthetic Data Explore how to combine real and synthetic data to create hybrid datasets that retain utility while improving privacy. You’ll walk through a practical example and learn how to use synthetic data to augment or replace sensitive parts of your dataset.
Difficulty: Intermediate
Pre-reqs:
This tutorial is designed for data engineers, data scientists, ML engineers, and analysts with basic Python skills and familiarity with working in Jupyter Notebooks. Attendees should have a general understanding of machine learning workflows and working with tabular datasets (e.g., CSV files or pandas DataFrames). No prior experience with synthetic data is required. To participate fully in the hands-on exercises, attendees should have the following installed before the session: Python 3.11+, Git Installation and setup of the Synthetic Data SDK will be covered as part of the tutorial. But feel free to get started beforehand by visiting https://github.com/mostly-ai/mostlyai.Speaker: Dr. Michael Platzer, Co-Founder and CTO of MOSTLY AI
Dr. Michael Platzer is co-founder and CTO of MOSTLY AI, a leader in privacy-safe synthetic data generation. He earned his degrees in mathematics and in business with distinction, led consumer analytic teams at global technology leaders, before starting his venture to pioneer the field of synthetic data. His company's mission is to democratize data access and data insights in a safe and responsible way for everyone.ODSC Links:
• Get free access to more talks/trainings like this at Ai+ Training platform:
https://hubs.li/H0Zycsf0
• ODSC blog: https://opendatascience.com/
• Facebook: https://www.facebook.com/OPENDATASCI
• Twitter: https://twitter.com/_ODSC & @odsc
• LinkedIn: https://www.linkedin.com/company/open-data-science
• Slack Channel: https://hubs.li/Q038cQBy0
• Code of conduct: https://odsc.com/code-of-conduct/