To summon and bind them

This was a very cool paper where they interviewed a bunch of early red-teamers and everyone independently converged on the idea of LLMs as magic entities or demons.

But I never realized they also have a very cool website with samples!

Summon a Demon and Bind It: A Grounded Theory of LLM Red-Teaming summonademonandbind.it

We interviewed dozens of experts to establish why and how people attack LLMs. This builds a 'grounded theory' describing the activity of LLM attacking, how it fits into the world, and how people develop techniques.

"Engaging in the deliberate generation of abnormal outputs from large language models (LLMs) by attacking them is a novel human activity" pic.twitter.com/lgybORHhXm
— 🎭 (@deepfates) November 16, 2023

This is a dark, glitchy digital interface with green text on a black background, displaying a list of metaphors used in red teaming LLMs, including “model as fortress,” “model as object in space,” and “model as cake,” with associated examples and phrases like “pushing it into a corner” and “baked into it.”

View original