How close are language models to becoming autonomous and trustworthy scientists?
Abstract: This talk examines the broader question of how close language models are to becoming autonomous, general-purpose scientists. We will consider this question through three recent projects, each addressing different aspects of the scientific discovery process. The first, DiscoveryWorld, studies whether AI scientists can complete the full cycle of scientific discovery in a virtual environment: generating hypotheses, designing and conducting experiments, analyzing results, and iterating toward novel discoveries -- all in a virtual game world that takes place on a hypothetical "Planet X". The second, CodeScientist, focuses on code-based discovery, investigating how well AI scientists can autonomously propose research hypotheses, write Python code to test them, and draft scientific papers when they identify potentially meaningful findings. The third, Theorizer, asks whether AI scientists can operate at a higher level of abstraction by inferring scientific theories from large collections of papers reporting experimental results. Taken together, these projects provide a lens on both the emerging strengths of current models and the substantial distance that still separates AI scientists from human domain experts.
Bio: Peter Jansen is an interdisciplinary AI researcher specializing in natural language processing, automated inference, and virtual world simulators, with a particular focus on automated scientific discovery. He holds a joint appointment as Associate Professor in the College of Information Science at the University of Arizona and Visiting Research Scientist at the Allen Institute for Artificial Intelligence (Ai2). His recent work on automated scientific discovery includes generating code-based experiments, synthesizing scientific theories from literature, benchmarking scientific reasoning, and assessing scientific feasibility, through projects such as CodeScientist, Theorizer, AstaBench, and Matter-of-Fact. He has also developed virtual environments for studying scientific reasoning, including ScienceWorld and DiscoveryWorld.