Securing Language Models Against Prompt Injection
Details
In this talk, we will explore ways to build trust in AI systems by identifying risks and implementing effective mitigation strategies.
We’ll discuss key challenges in LLM-based applications, with a focus on vulnerabilities such as prompt injection.
Practical solutions for risk reduction will be presented, including input/output validation and sanitization, adversarial testing, and tools like Rebuff and Llama Guard to enhance model security.
In addition, the importance of AI red teaming and the use of the AttaQ dataset for evaluating harmful responses will be highlighted.
Artificial Intelligence