Back to All Events

Spring 2026 CosmicAI Seminar Series Talk #1

From Astronomical Data to Knowledge: A Few Unexpected Insights from Machine Learning

Presenter: Stephanie Juneau

Abstract: Modern astronomy is increasingly defined by the scale and complexity of its data, with large surveys producing measurements for millions to billions of objects across space, wavelength, and time. In this talk, I will provide a broad overview of contemporary astronomical data, with an emphasis on large datasets that are particularly well-suited for machine learning and AI-driven analysis. I will briefly outline emerging ideas for the CosmicAI Data Platform, focusing on data curation, access patterns, and data services designed to support modern ML workflows. I will then highlight examples from optical spectroscopic surveys such as SDSS and DESI to illustrate how unsupervised approaches can uncover structure in complex spectroscopic data beyond what is typically accessed via traditional analyses. I will conclude by briefly discussing how these results motivate future work on scalable, interpretable, and multimodal approaches to astronomical data analysis.

Speaker Bio: Dr. Stéphanie Juneau is an Associate Astronomer at NSF NOIRLab whose work focuses on galaxy evolution, supermassive black holes, and the role of feedback in regulating star formation. She obtained an Astronomy PhD from the University of Arizona, has extensive experience working with large spectroscopic and multi-wavelength surveys, and her recent efforts explore the use of machine learning and representation learning to extract physical insight from complex astronomical datasets. She is actively involved in the NSF–Simons CosmicAI Institute as a science lead, contributing to research case studies and the development of scalable, AI-enabled data platforms for astronomy.

LinkedIn | Google Scholar | GitHub

Foundation Models for Survey Astronomy

Presenter: Francois Lanusse

Abstract: Deep Learning has seen a recent shift in paradigm, from training specialized models on dedicated datasets, so-called Foundation Models, trained in a self-supervised manner on vast amounts of data and then adapted to solve specific tasks with state-of-the-art performance. This new paradigm has been exceptionally successful not only for large language models (LLMs) but also in other domains such as vision models. However, applications of this new approach in scientific application domains are still very scarce, for reasons ranging from the need for new architectures to the (surprising) lack of availability of suitable large-scale datasets.

In this talk, I will present the efforts of the Polymathic AI initiative to bring this new foundation model paradigm to survey astronomy. I will discuss our efforts to bring together domain scientists to assemble large-scale and wide-ranging training datasets of scientific data, as well as building models and training strategies that can benefit from large-scale training compute.

I will specifically cover applications to survey astronomy, where we have deployed at scale multimodal self-supervised generative pretraining techniques. I will show how these approaches can be used to build models flexible to very diverse and inhomogeneous observations (e.g. different types of measurements such as spectra, time-series, or images, but also different instruments, etc...) and how they can then be leveraged by the end-user for a variety of downstream applications (e.g. redshift estimation, morphology classification, physical parameter inference, searching for rare objects) with very simple machine learning methods and still reach near-optimal performance.

Speaker Bio: Dr. Lanusse is a permanent CNRS researcher at the Astrophysics Department of CEA Paris-Saclay (France), and a founding member of the Polymathic AI team. He received his PhD in cosmology and inverse problems in 2015 in Paris, and further developed an interdisciplinary expertise in Deep Learning for cosmology as a postdoctoral researcher at Carnegie Mellon University (2015-2018) and UC Berkeley (2018-2019) through multiple collaborations with their respective Machine Learning and Statistics Departments. He is now broadly interested in developing scientific applications of state-of-the-art Deep Learning techniques by combining concepts of Bayesian inference, deep neural networks, and physical forward modeling.

LinkedIn | Google Scholar | GitHub


Zoom: https://utexas.zoom.us/j/84742203545?pwd=lUosaf3T6bkIS1QAaIQYHYiiClE2ZP.1

Previous
Previous
January 6

CosmicAI & SkAI Special Session at the 247th AAS Meeting

Next
Next
March 13

SXSW 2026: Revolutionizing Astronomy with Next Generation Big Data