4/20/26

Controlling LLM's via Activation Geometry w/ Amirali Abdullah (Lead AI Researcher, Thoughtworks Inc)

Abstract: Controlling the behavior of large language models at inference time is an increasingly important problem. In this talk, I present a simple and unified approach to steering model behavior based on activation geometry. By learning a single classifier over hidden representations, we can derive directions that control multiple attributes such as helpfulness, style, or safety, and compose them dynamically without retraining. This framework enables flexible, low cost control of model outputs and highlights a geometric view of representation space beyond fixed linear directions. I will discuss empirical results showing how this approach supports multi attribute control in practice, and briefly outline how such steering mechanisms can be useful in scientific settings where reliable and interpretable model behavior is critical. Our recent followup work suggests that similar activation level interventions can extend across modalities, enabling systematic analysis and control in text to image models through composable operations.

Bio: Amirali Abdullah is a Lead AI Researcher at Thoughtworks Inc and a Research Advisor at Martian Learning. His research focuses on the interpretability and control of large language models, with particular emphasis on activation-level steering, representation geometry, and the structure of learned features.

Next

AI for Multi-Wavelength X-ray Spectral Analysis with Professor Shiqi Yu (University of Utah)