What we're about

ODSC brings together the open-source and data science communities with the goal of helping its members learn, connect and grow.

The focus of this Meetup group is to allow ODSC to work with Meetup groups, non-profits, and other organizations to present informative lectures, workshops, code sprints and networking events to help grow the use of open source languages and tools within the data science and data-centric community. As such, our specific goals are:

1. Build a collaborative group to work with other Meetup groups, non-profits, and other organizations.

2. Promote the use of open source languages and tools amongst data scientists and others.

3. Host educational workshops.

4. Spread awareness of new open source languages and tools that can be used in data science.

5. Contribute back to the open-source community.

Who is this meetup for?

• Data engineers, analysts, scientists, and other practitioners

• R, Python and other software engineers who work with data or want to learn

• Data visualization developers and designers

• Non-technical team leads, executives, and other decision-makers from data-centric startups and large companies looking to utilize open-source tools

Get Involved with our Meetups:

• Speaker Form ( https://docs.google.com/a/odsc.com/forms/d/1trkCoecAMa8za_ZzfN5bW6ZNBaRlmqJSQvuME_2nbJA/edit?usp=drive_web ) - Submit a talk, tutorial, or panel.

• Suggest a Meetup Topic Form ( https://docs.google.com/forms/d/1rEjO3UMMXRXtY8Yr_J_jj3ebYwsIFqcGA6FZzWK4rd0/edit )

• Volunteer Form ( https://docs.google.com/forms/d/1Vu3B72avz2I1xx618pEFGsuywZE9t4n78br9vSEX9oE/edit )

• Host or Sponsor Form ( https://docs.google.com/forms/d/1eyM9hJ3l8TlNmw35re65mH7mFCmsPoRZ1p5RJQEVhnk/edit )

• Showcase your Startup Form ( https://docs.google.com/forms/d/1oz8A4fbfe6HHs71v4nMpcf9FP_kpS9CcCfd3qIBS5HU/edit )

Get free access to more talks like this at LearnAI

· LearnAI: https://learnai.odsc.com/

· Facebook: https://www.facebook.com/OPENDATASCI/

· Twitter: https://twitter.com/odsc & @odsc (https://twitter.com/odsc)

· LinkedIn: https://www.linkedin.com/company/open-data-science/

· Slack Channel: http://bit.ly/2RkOf9l

Upcoming events (5)

LIVE TRAINING: Advanced Fraud Modeling

Online event

This is a PAID event.

Registration is required: https://aiplus.odsc.com/courses/live-training-april-20-advanced-fraud-modeling

Date: April 20th at 1 - 5 PM (ET)

Instructor's bio:
A Teaching Associate Professor in the Institute for Advanced Analytics, Dr. Aric LaBarr is passionate about helping people solve challenges using their data. There he helps design the innovative program to prepare a modern workforce to wisely communicate and handle a data-driven future at the nation's first Master of Science in Analytics degree program. He teaches courses in predictive modeling, forecasting, simulation, financial analytics, and risk management. Previously, he was Director and Senior Scientist at Elder Research, where he mentored and led a team of data scientists and software engineers. As director of the Raleigh, NC office he worked closely with clients and partners to solve problems in the fields of banking, consumer product goods, healthcare, and government. Dr. LaBarr holds a B.S. in economics, as well as a B.S., M.S., and Ph.D. in statistics — all from NC State University.

Abstract:
The Association of Fraud Examiners (ACFE) consistently estimates that organizations lose approximately 5% of their revenues due to fraud. Based on world GDP estimates, this would be anywhere from $3-4 trillion annually. Fraud is one of the most interesting problems to try and solve because the people in your data are not trying to be found. Data science techniques are now at the forefront of this industry to help fight the battle against criminals. This course outlines the typical fraud framework at an organization and where data science can play a role. It will also lay out how to build an analytically advanced fraud system at an organization. Moving beyond just simple rules and anomaly detection, these supervised and unsupervised approaches to fraud modeling will help an organization combat the every present problem of fraud. These fraud modeling approaches can also be used in other industries to help organizations find unique customers or problems that might exist in their current systems.

By the end of the course, participants will be able to:
-Use network analysis to create good features for fraud models like centrality and connectivity
-Properly oversample or undersample a rare event data set as well as use synthetic sampling techniques like SMOTE
-Build a supervised fraud classification model using one of the following: logistic regression, tree based algorithms, and naive Bayes models
-Build a supervised NOT-fraud classification model using one of the above techniques
-Interpret a complicated model using LIME

Course Outline
1. Review of Fraud
2. Data Preparation
3. Supervised Fraud Models
4. Clustering and Implementation

Which knowledge and skills you should have?
- Introductory R/Python
- Basic introduction to decision trees (this isn't required, but helpful for understanding)
- Basic introduction to classification models like logistic regression, decision trees, etc. (this isn't required, but helpful for understanding)

What is included in your ticket?
1. Access to the live training and a QA session with the Instructor
2. Access to the on-demand recording
3. Certificate of completion

Webinar "AI-Based Analytics in the Cloud"

Online event

To access this webinar, please register here: https://app.aiplus.training/courses/ai-based-analytics-in-the-cloud

Topic: AI-Based Analytics in the Cloud

Speaker: Karl Weinmeister, Cloud AI Advocacy Manager at Google
https://www.linkedin.com/in/karlweinmeister/

Karl Weinmeister is a Cloud AI Advocacy Manager at Google, where he leads a team of data science experts who develop content and engage with communities worldwide. Karl has worked extensively in machine learning and cloud technologies. He was a contributor to one of the first AI-based crossword puzzle solvers that is still referenced today.

Abstract:
Even if you have terabytes of business data, it may not be so easy to apply AI-based analytics on it. The bottleneck is often Machine Learning (ML) expertise and scalable infrastructure.

In this session, we'll start with how a data analyst can directly access vast amounts of data from the data warehouse directly in a spreadsheet. The data analyst can use tools such as charts and pivot tables to discover insights about their data. By connecting directly to the source with Connected Sheets, data integrity and security is preserved at all times.

Next, we'll look at how developers can build ML models in the cloud without deep ML expertise. Using SQL syntax, BigQuery ML enables developers to create robust models for regression, classification, time-series forecasting, and more. After the model is built, we'll see how an app developer could integrate the modeling code into the spreadsheet using JavaScript. This will enable the data analyst to train new models and predict right from their spreadsheet.

Finally, we'll look at an end-to-end scenario, solving a business problem with AI analytics. We'll see how a data scientist can go through the steps of training, evaluation, prediction, and even model retraining with BigQuery ML.

In this session, attendees from a variety of backgrounds, including data analysts, developers, data scientists, and managers, will see how to harvest insights from their business data in the cloud.

[June] Get your Virtual ODSC Europe 2021 pass with 40% OFF - http://bit.ly/38z7q84

ODSC Links:
• Get free access to more talks/trainings like this at AI+ Training platform:
https://app.aiplus.training/
• Facebook: https://www.facebook.com/OPENDATASCI
• Twitter: https://twitter.com/odsc & @odsc
• LinkedIn: https://www.linkedin.com/company/open-data-science
• Slack Channel: http://bit.ly/2RkOf9l
• Europe Conference June 8th - 10th: https://odsc.com/europe/
• West Conference November 15th - 18th: https://odsc.com/california/
• Code of conduct: https://odsc.com/code-of-conduct/

Deep Learning for Time Series in Industry: The Promise and the Barriers

To access this webinar, please register here: https://app.aiplus.training/courses/deep-learning-for-time-series-in-indusrtry-the-promise-and-the-barriers

Topic: Deep Learning for Time Series in Industry: The Promise and the Barriers

Speaker: Isaac Godfried, Machine Learning Researcher at CoronaWhy

Isaac Godfried is a data scientist and AI researcher focused on applying deep learning to real world problems. Isaac has extensive experience forecasting time series data in many different industries such as healthcare (patient vitals, infectious disease spread), retail (store sales, surplus stock), and climate (stream flows, precipitation). Outside of time series Isaac has trained and deployed models for NLP and CV and has a particular interest in models that can leverage textual and image data to improve forecasts.

Abstract:
Time series forecasting and classification remains an important yet challenging problem for a variety of businesses. Deep learning based methods have recently shattered time series research benchmarks yet remain seldom used in industry. In this seminar, we will discuss how to use deep learning to forecast and classify real world time series datasets. We will walk through the use of several open source frameworks like PyTorch, Flow Forecast (a new open source deep learning for time series forecasting library), Kubernetes, and Airflow to train, validate, interpret, and deploy deep time series models at scale. Finally, we will discuss techniques for monitoring existing models in production, re-training models periodically, and checking for distribution drift on changing datasets. Participants will leave with a practical understanding of how to leverage open source, deep learning packages to solve real world business needs like sales/revenue forecasting, predictive maintenance, demand prediction, and much more.

[June] Get your Virtual ODSC Europe 2021 pass with 40% OFF - http://bit.ly/38z7q84

ODSC Links:
• Get free access to more talks/trainings like this at AI+ Training platform:
https://app.aiplus.training/
• Facebook: https://www.facebook.com/OPENDATASCI
• Twitter: https://twitter.com/odsc & @odsc
• LinkedIn: https://www.linkedin.com/company/open-data-science
• Slack Channel: http://bit.ly/2RkOf9l
• Europe Conference June 8th - 10th: https://odsc.com/europe/
• West Conference November 15th - 18th: https://odsc.com/california/
• Code of conduct: https://odsc.com/code-of-conduct/

A Data Scientist’s Rosetta Stone: Reconciling Disparate Data with Ontologies

To access this webinar, please register here: https://app.aiplus.training/courses/a-data-scientists-rosetta-stone-reconciling-disparate-data-with-ontologies

Topic: A Data Scientist’s Rosetta Stone: Reconciling Disparate Data with Ontologies

Speaker: Elizabeth Michel, Senior Analytics Engineer at Tamr
https://www.linkedin.com/in/elizabeth-michel-7944703b/

Elizabeth Michel is a Senior Analytics Engineer at Tamr, a Boston-based enterprise data mastering software company. She graduated with a degree in engineering modified with economics from Dartmouth College in 2019, and works to help Tamr’s clients derive analytic value from their mastered data, as well as to integrate the analytic value with Tamr’s core products.

Abstract:
Reconciling data from disparate datasets can be a tricky and time-consuming process. Even when the data points refer to the same real-world entities, different data sources may use different conventions for describing their properties. For example, maintaining a global, up-to-date, and accurate dataset of infections and tests related to the COVID-19 pandemic is a challenging task, in part due to the different taxonomies that distinct nations and municipalities used to classify outcomes.

Ontologies are a simple solution to this problem. Ontologies are collections of class and relationship definitions. Data scientists can align disparate taxonomies with a centralized ontology - a “source of truth” for data classification, and unify their datasets in a consolidated hierarchy. As new datasets are added, they can be easily matched to the same ontology and reconciled with the existing data. Automating this process ensures that the datasets are unified in a consistent manner, and reduces the possibility of discrepancies arising from manual data curation. While manual taxonomy alignment may be easier in the short term, maintaining a process for ongoing taxonomy reconciliation is the only effective long-term solution.

In this session, we demonstrate how taxonomies from distinct datasets can be quickly reconciled and unified using a centralized ontology. As an example, we extract the taxonomies used in two open-source retail product datasets and align them with a common retail ontology. We also demonstrate the use of knowledge graph visualizations to showcase the impact of cross-dataset standardization. Finally, we discuss how this unification pipeline can be deployed at scale, using either open-source Python libraries or proprietary solutions like Neo4j.

Main learning points:
1. The importance of having a unified taxonomy across data sources and the difficulties involved in building that universal taxonomy
2. How to use ontologies to find common ground between disparate taxonomies to align them in a systematic and sustainable way

[June] Get your Virtual ODSC Europe 2021 pass with 40% OFF - http://bit.ly/38z7q84

[November] Get your ODSC West 2021 pass with 75% OFF - https://bit.ly/2Rc9nRB

ODSC Links:
• Get free access to more talks/trainings like this at AI+ Training platform:
https://app.aiplus.training/
• Facebook: https://www.facebook.com/OPENDATASCI
• Twitter: https://twitter.com/odsc & @odsc
• LinkedIn: https://www.linkedin.com/company/open-data-science
• Slack Channel: http://bit.ly/2RkOf9l
• Europe Conference June 8th - 10th: https://odsc.com/europe/
• West Conference November 15th - 18th: https://odsc.com/california/
• Code of conduct: https://odsc.com/code-of-conduct/

Photos (123)