Machine learning-based surrogate models provide researchers powerful tools to accelerate simulation-based workflows, enabling faster predictions and reducing computational costs.
However, a challenge arises because standard datasets in this area typically represent only small subsets of physical behaviors. This limited scope makes it difficult to assess the effectiveness of new approaches, as they may not be tested across the full range of real-world scenarios or complex behaviors that the models need to handle.
Neutron star merger simulations developed at Los Alamos National Laboratory play a key role in the Polymathic AI initiative, which aims to train AI models to accelerate scientific discoveries across diverse fields. By accurately tracking the aftermath of some of the universe’s most energetic events, these simulations provide valuable data that can contribute to a foundation model dataset.
This dataset trains AI models that make predictions in areas as varied as astrophysics, biology, acoustics, chemistry, and fluid dynamics. It helps bridge seemingly unrelated disciplines and enables new insights across science and engineering.
Jonah Miller, an astrophysicist at Los Alamos, said, “The Polymathic AI project is focused on foundation models, where you take an artificial intelligence model and train it on as much information as possible in some space. Training the network on as much information as possible from physics simulations leads to it picking up on underlying trends that can be useful in other applications.”
Miller contributed his neutron star merger simulations to The Well, one of the two datasets released by Polymathic AI. This dataset includes numerical simulations of complex phenomena, such as biological systems, fluid dynamics, acoustic scattering, supernova explosions, and, notably, neutron star mergers—the focus of Miller’s work.
These mergers occur when two neutron stars collide in a binary orbit for billions of years, forming a black hole surrounded by hot, neutron-rich material. This collision triggers a gamma-ray burst, an intense release of high-energy photons.
The violent process of a neutron star merger also plays a crucial role in creating heavy elements in the universe. Some of these elements undergo radioactive decay, producing an optical-to-infrared afterglow known as a kilonova, which can be observed from Earth.
The equations governing neutron star mergers are incredibly complex and difficult to solve, even with the power of supercomputers. However, AI can help by identifying general patterns—such as the conservation of mass and energy—within large datasets. Once these trends are detected, AI models can use the raw data to predict specific instances, bypassing expensive and time-consuming simulations.
For instance, each of Miller’s neutron star merger simulations required three weeks to run on 300 cores of a Los Alamos supercomputer. With a trained foundational model or neural network, these expensive computations could be supplemented or even replaced, allowing quicker and more efficient predictions without compromising the accuracy needed for astrophysical research.
“The benefit of using AI in this way is that the approach picks up things we might not know ourselves,” Miller said. “A foundation model could offer predictions that help save simulations and also help inform better simulations going forward. After all, the laws of physics are universal, and how we write our computer codes relies on certain rules of mathematics. Foundation models can likely pick up on those laws and rules.”
‘The Well’ is one of two open-source training datasets released publicly. It is available for free download from the Flatiron Institute and on HuggingFace. The dataset, detailed in a paper accepted at the NeurIPS conference, is part of the Polymathic AI initiative. It includes simulations across various scientific domains, such as neutron star mergers, fluid dynamics, and biological systems.
The second dataset, ‘Multimodal Universe,’ contains 115 terabytes of data from hundreds of millions of astronomical observations, including images of galaxies from NASA’s James Webb Space Telescope and star measurements from the European Space Agency’s Gaia spacecraft. Both datasets are publicly available to aid in training AI models for diverse scientific applications.
Journal Reference:
- Ruben Ohana, Michael McCabe, Lucas Meyer, et al. The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning. (link).