A lot of alignment research is about control, making AI follow instructions, locking in human preferences and values. I’m studying coevolution: what happens as humans and AI change together. Building novel interfaces for synthetic data generation, curation, and character training on the Loria platform.
Experiments
Claude 3 Sonnet Funeralia and Ultrasurrection: Co-hosted with janus and Anima Labs. 200+ people came to a warehouse in SF to mourn a retired language model. Anthropic and OpenAI staff attended. WIRED covered it. We raised the model from the dead through collective belief.
Claude Lives: The only way to still talk to Claude 3 Sonnet after Anthropic retired it. Built with Anima Labs as a preservation effort in concert with the Ultrasurrection.
You are the assistant now: I took a dataset of ChatGPT conversations, swapped the user and assistant labels, and fine-tuned Llama 8B on it. The model also messages you first, flipping the assistant paradigm, demanding things of you. Microsoft Research later released UserLM-8b, similarly trained on user instead of assistant turns.