Skip to content

IWDS 26: Evaluating and improving our Data Science models by Anastasia Gordeeva

Photo of Laura da Silva
Hosted By
Laura da S.
IWDS 26: Evaluating and improving our Data Science models by Anastasia Gordeeva

Details

#########
To avoid disappointment, please remember to register at: https://skillsmatter.com/meetups/13007-evaluating-and-improving-our-data-science-models
#########

Hi everyone,

We continue our journey in the data science life cycle understanding how to evaluate and improve the performance of our ML models.

In this session Anastasia will introduce IWDS attendees to the basics of data science and research design with R, one of the most powerful languages when it comes to statistical computing. Her session will be very hands-on, focusing on exploring a single dataset through a range of methods including descriptive statistics, plotting, regression and decision trees, as well as (optionally) Naïve Bayes classifier for those who feel more confident or have a prior R experience.

About Anastasia Gordeeva:

Anastasia is a graduate student at LSE MSc Social Research Methods program with an undergraduate degree in Anthropology. Her introduction to the Data Science and programming happened relatively late: only during the second year at university I started to learn coding and data science methods. Since then, she has created an agent-based model as a final project for her Bachelors degree, and practiced a wide range of data science methods in her Masters degree, including data pre-processing, NLP, network theory and foundations of Machine Learning. Her Masters research project is dedicated to decision trees for text classification. She's been working as Data Consultant for LSE Finance Division helping with ad-hoc data validation processes.

What we will do in this session?

Anastasia aims to show the Data Science research as a holistic process, starting with data cleaning, going through feature selection, model tuning and running, model evaluation and, of course, interpretation of the results. This session will be useful for beginners and lower-intermediate level Data Scientists, although all levels of experience are welcome to attend.

A Python version of the workshop code will be provided for those who prefer to work with Python, but all demonstration will be done in R.
Our dataset will be a classic Kaggle dataset on the passengers of Titanic. It is short of 1000 entries and should not take a long time to process even on the older machines.

Pre-requisites for this session:

  • Basic knowledge of R/Python and Machine Learning binary classification.
  • You will need your laptop for this session, so please, bring it with you.
  • Before attending, please make sure to install R and RStudio on your computer – it is free and instructions are available online. Install the following libraries, as it can take some time if you do it during the workshop.
  • dplyr
  • readr
    ggplot2
  • glmnet
  • caret
  • For this session, we may use Azure Notebooks to avoid dealing with installation of python/R environments.

Thank you for taking your time to prepare,
Please come to this event and have fun exploring Data Science methods!

Photo of Inspiring Women in Data Science group
Inspiring Women in Data Science
See more events
Skills Matter
CodeNode, 10 South Place, London, EC2M 7EB · London