Skip to content

Structured Dataset: Credit Card Fraud Detection using ML and Deep Learning

Photo of George Zoto
Hosted By
George Z.
Structured Dataset: Credit Card Fraud Detection using ML and Deep Learning

Details

Join us for our 4th community coding adventure in Deep Learning! Just bring your curiosity and get ready to meet our growing community πŸ˜€ We are using Deep Learning to detect fraud in an ocean of transactions using ML and Deep Learning!

Join Zoom Meeting:
https://us02web.zoom.us/j/84402592502?pwd=d1lVSkxQZE1sSGljR3dXaEZwYmNEdz09

Phone: +1 929 205 6099 US
Meeting ID: 844 0259 2502

Agenda:

Basic regression: Predict fuel efficiency: https://www.tensorflow.org/tutorials/keras/regression

Petfinder and classifying structured data with feature columns
https://www.tensorflow.org/tutorials/structured_data/feature_columns

Classification on imbalanced data on this same credit card dataset
https://www.tensorflow.org/tutorials/structured_data/imbalanced_data

  • Step 4 πŸ˜€
    The goal is to identify fraudulent credit card transactions using ML or Deep Learning models. Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification. We recommend using the TensorFlow tf.keras.metrics.AUC(curve='PR') as our performance and scoring metric.

  • Step 5 πŸ˜€
    To get ready for this week's coding challenge, check out this blog post by Rachel Draelos introducing our performance metric for the week: AUPRC (area under the precision-recall curve). She provides a nice description of the AUPRC metric, explains why it is especially useful for imbalanced classification problems, and discusses what you can expect when applying it to your dataset.
    https://glassboxmedicine.com/2019/03/02/measuring-performance-auprc/

  • Step 6 πŸ˜€
    Along with having a common metric ,AUPRC, for comparison, it is important to have the same test data to run the metric against. So let's all of us use the following approach:
    df = pd.read_csv('creditcard.csv')
    from sklearn.model_selection import train_test_split
    train_df, test_df = train_test_split(df, stratify=df['Class'], test_size=0.2, random_state=51)

It is not necessary for comparison, but many people will further split the training data into training and validation via something like:
train_df, val_df = train_test_split(train_df, stratify=df['Class'], test_size=0.2, random_state=51)

  • Step 7 πŸ˜€
    Have fun πŸŽ‰ and share your journey, findings, lessons learned, success or failures with us and be ready to take a deeper dive in our code. For us, it's the effort that counts and not the final result. Most importantly you should enjoy exploring this interesting dataset and learn something new πŸ˜€

Source:
https://www.kaggle.com/mlg-ulb/creditcardfraud

Photo of Deep Learning Adventures group
Deep Learning Adventures
See more events