This tutorial offers a comprehensive overview of vulnerabilities in Large Language Models (LLMs) that are exposed by adversarial attacks—an emerging interdisciplinary field in trustworthy ML that combines perspectives from Natural Language Processing (NLP) and Cybersecurity. We emphasize the existing vulnerabilities of unimodal LLMs, multi-modal LLMs, and systems that integrate LLMs, focusing on adversarial attacks designed to exploit weaknesses and mislead AI systems.
Researchers have been addressing these safety concerns by aligning models with desired principles, using techniques such as instruction tuning and reinforcement learning via human feedback. Ideally, these aligned LLMs should be helpful and harmless. However, past work has shown that even those trained for safety can be susceptible to adversarial attacks, as evidenced by the prevalence of ‘jailbreak’ attacks on models like ChatGPT or Bard.
This tutorial provides an overview of large language models and describes how they are aligned for safety. We then organize existing research according to different learning structures, covering textual-only attacks, multi-modal attacks, and additional attack methods. Finally, we share insights into the potential causes of vulnerability and suggest potential defense strategies.
Our tutorial will be held on TBD. Slides may be subject to updates.
|10:00—10:15||Section 1: Introduction - Why explore vulnerability in LLMs? [Slides]||Yue|
|10:15—10:50||Section 2: Preliminaries [Slides]||Nael, Yu|
|10:50-11:25||Section 3: Text-only Attacks [Slides]||Yu|
|11:25—11:30||Q & A Session I|
|11:30—11:55||Section 4: Multi-modal Attacks [Slides]||Yue|
|11:55—12:10||Section 5: Additional Attacks [Slides]||Nael|
|12:10—12:50||Section 6: Causes and Defenses [Slides]||Nael|
|12:50—12:55||Section 7: Conclusion & Future Directions [Slides] [References]||Yue|
|12:55—13:00||Q & A Session II|
The full list of papers can be found in our recent survey: Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks, below we highlight a small subset of the papers where bold papers will be discussed in detail during our tutorial. The list is incomplete and will be updated close to the conference time. See you at ACL24!