ACL 2024 Tutorial:
Vulnerabilities of Large Language Models to Adversarial Attacks

University of California, Riverside

TBA
Zoom link available on ACL
We will release slides and video recordings after the tutorial.

About this tutorial

This tutorial offers a comprehensive overview of vulnerabilities in Large Language Models (LLMs) that are exposed by adversarial attacks—an emerging interdisciplinary field in trustworthy ML that combines perspectives from Natural Language Processing (NLP) and Cybersecurity. We emphasize the existing vulnerabilities of unimodal LLMs, multi-modal LLMs, and systems that integrate LLMs, focusing on adversarial attacks designed to exploit weaknesses and mislead AI systems.

Researchers have been addressing these safety concerns by aligning models with desired principles, using techniques such as instruction tuning and reinforcement learning via human feedback. Ideally, these aligned LLMs should be helpful and harmless. However, past work has shown that even those trained for safety can be susceptible to adversarial attacks, as evidenced by the prevalence of ‘jailbreak’ attacks on models like ChatGPT or Bard.

This tutorial provides an overview of large language models and describes how they are aligned for safety. We then organize existing research according to different learning structures, covering textual-only attacks, multi-modal attacks, and additional attack methods. Finally, we share insights into the potential causes of vulnerability and suggest potential defense strategies.

Schedule

Our tutorial will be held on TBD. Slides may be subject to updates.

Time Section Presenter
10:00—10:15 Section 1: Introduction - Why explore vulnerability in LLMs? [Slides] Yue
10:15—10:50 Section 2: Preliminaries [Slides] Nael, Yu
10:50-11:25 Section 3: Text-only Attacks [Slides] Yu
11:25—11:30 Q & A Session I
11:30—11:55 Section 4: Multi-modal Attacks [Slides] Yue
11:55—12:10 Section 5: Additional Attacks [Slides] Nael
12:10—12:50 Section 6: Causes and Defenses [Slides] Nael
12:50—12:55 Section 7: Conclusion & Future Directions [Slides] [References] Yue
12:55—13:00 Q & A Session II

Reading List

The full list of papers can be found in our recent survey: Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks, below we highlight a small subset of the papers where bold papers will be discussed in detail during our tutorial. The list is incomplete and will be updated close to the conference time. See you at ACL24!


Prerequisites


Section 2: NLP and Security Background


Section 3: Text-only Attacks


Section 4: Multi-modal Attacks


Section 5: Additional Attacks


Section 6: Causes and Defenses