Decoding cybersecurity data with LLMs
Details
The next generation of cybersecurity engineers will be data engineers specializing in cybersecurity, leveraging modern technology to interpret the vast amounts of data they collect. We are constantly inundated with information about GPT, ML, AI, and various other acronyms. The critical question is: how can cybersecurity engineers utilize these tools effectively for more efficient analysis? Using ML and AI from a cybersecurity researcher's perspective, we will discuss concrete examples that demonstrate how to uncover insights and make discoveries. A key highlight will be a discussion on benchmarks—evaluating the most promising language models for applications of ML in cybersecurity.
To set the stage, we will examine the types of data commonly encountered in a cybersecurity ecosystem. Through a combination of the traditional matrix data structure, pandas, paired with the capabilities of LLMs, we will explore methods for extracting key patterns and features. The end goal of this exploration is to classify behaviors as either malicious or benign.
Next, we will delve into the framework of Exploratory Data Analysis (EDA), that is employing statistical methods and visualizations to make sense of opaque datasets. We will demonstrate how AI can assist in “questioning” data, drawing actionable conclusions, and building anomaly detection models.
Finally, we will showcase how practitioners can use and tune language models to classify data as malicious or benign, offering insights into the effectiveness of different LLMs for tackling cybersecurity challenges. This presentation includes open-source demonstrations using Jupyter notebooks and public datasets featuring known network attacks, allowing participants to reproduce and adapt these demonstrations for their own projects.
The ultimate goal of this talk is to demonstrate how defenders can use data to uncover malicious activity using traditional ML techniques, such as EDA and classification, implemented with the help of various language models. By comparing language models’ effectiveness in traditional ML tasks for cybersecurity applications, we aim to inspire creativity and resourcefulness among cybersecurity engineers. Through this journey from raw data to actionable models, we will highlight the vast potential ML and AI offer to redefine the field of cybersecurity.
Demo repository (subject to changes): https://github.com/mundruid/cyberdata-mlai
Agenda:
- Join us at 5:30 for snacks, drinks, conversation and socializing.
- 6:00 - Main talk(s)
- 7:00 - 7:30 - Wrap up and hang out some more
- 7:30 - ? - After Party*
======================================
"Call for Papers"! .. if you have some knowledge to share with the community, please talk to Dave at the next meetup. We'd love to have 1 or 2 talks per meetup lined up for the coming months.
*After Party? .. The meetup is right next to Revelry Brewing. If folks are interested, we can clean up at the CLC and keep the conversation going over a nice local brew.
