๐ Breaking and Securing LLMs๐


Details
๐ First 2025 Meetup - 3 Special lectures ๐
๐ 18:00 - Pizza and mingling
๐ฎ๐ฑ 18:15 - Main event
๐๏ธ Raffles - ReactNext and NodeTLV tickes!!! ๐๏ธ
*** To get all the events that happen in Gav-Yam - register here: ***
๐ฃ๏ธ Ran Bar-zik
Senior Software architect @ Cyberark
Dancer | Poet | Artist | Seรฑor Senior soup maker
๐คฉ Practical Attacks on Artificial Intelligence ๐คฉ
ืืชืงืคืืช ืืขืฉืืืช ืขื ืืื ื ืืืืืืชืืช
ืืขืืื ืฉืื ืืื ื ืืืืืืชืืช ื ืื ืกืช ืืืืชืจ ืืืืชืจ ืืืฆืจืื, ืืฉ ืื ื-ืจ-ื-ื ืืืชืจ ืืชืงืคืืช ืืคืฉืจืืืช. ืืกืฉื ืืื ืืจืื ืืชืงืคืืช ืืืขืืื ืืืืืชื ืฉืขืืื ืขื ืืืฆืจืื ืืืืชืืื ืื ืืื ืืื ืืืงืจืื ืขืืืืื ืืขืืื ืืืืฉ ืฉื ื LLM
๐ฃ๏ธ Niv Rabin
Principal Software Architect @ Cyberark
Niv Rabin is a Principal Software Architect at CyberArk with over 15 years of experience in software development and architecture. In recent years, he has focused on AI security, specializing in LLM attack methodologies and detection techniques. His work combines hands-on research and engineering expertise to mitigate risks in AI-driven security.
๐คฉ Evolving Jailbreaks and Mitigation Strategies ๐คฉ
As large language models (LLMs) become more integrated into applications, understanding and preventing jailbreak attacks is critical. This talk explores cutting-edge techniques for bypassing LLM safeguards and the strategies to defend against them. Weโll start with semantic fuzzing, showcasing how category-based and language-disruptive paraphrasing can evolve to defeat alignment. Then, weโll delve into iterative refinement mechanisms, where multiple LLMs collaborate to create increasingly effective jailbreak prompts.
The session will also cover evaluation methods, including how to numerically distinguish compliance from rejection in LLM outputs. Finally, weโll present mitigation strategies, highlighting the strengths and limitations of model alignment, external safeguards, LLMs as judges, and hybrid defenses.
Attendees will gain practical insights into both attacking and securing LLMs, leaving equipped to build safer, more resilient AI systems.
Key Takeaways:
- Learn how semantic fuzzing generates prompt variations to bypass LLM defenses.
- Understand the role of iterative feedback loops in evolving jailbreak prompts.
- Discover effective methods for evaluating LLM responses numerically.
- Explore multi-layered mitigation strategies to prevent harmful content generation.

๐ Breaking and Securing LLMs๐