42nd Meetup: Identifying Fake Audio Using Social Media Metadata & Audio Features
Details
With a rise in AI generated audio, more commonly referred to as audio deepfakes, it has become increasingly important to detect fake audio content especially as it spreads across social media. While most detection methods rely only on analyzing the sound itself, they often miss important clues found in social media engagement data. In this work, we introduce a new approach that combines both audio features (such as, spectrograms, energy levels, tempo, and pitch-related measurements) and social media metadata (such as likes, dislikes, and comment counts) to determine whether an audio clip is real, or AI generated. We developed an ensemble model that brings together five different machine learning methods being, Logistic Regression, Naive Bayes, Gradient Boosting, AdaBoost, and a Neural Network which uses a stacking approach to improve overall prediction. We compared the results acquired from this model to the results of well known papers, including CFAD, Singfake, and ASVspoof 2019. Our model achieved an AUC of 0.847, accuracy of 84.6%, and an Equal Error Rate (EER) of 20.4%, showing that it performs similarly, if not better, than existing methods that use only audio. These results suggest that combining sound with social context offers a promising and practical way to detect deepfake audio, especially in a real world social media setting. While our model achieved moderate performance (AUC of 0.847), it highlights the potential and limitations of combining social and audio features in deepfake audio detection, and opens avenues for future tuning and dataset expansion.
