One Unarchived Monte Carlo Seed Code Collapsed a Galaxy Formation Simulation

Jun 11, 2026 By Alice Chen

In late 2023, a team of astrophysicists at the Max Planck Institute for Astrophysics faced a bewildering problem. Their flagship galaxy formation simulation, a billion-particle model that had run for weeks on a supercomputer, produced a universe that looked nothing like the one they had published two years earlier. The galaxy morphologies were wrong. The star formation rates were off by nearly 20 percent. The team had changed nothing in the code — or so they thought. After months of debugging, they traced the culprit to a single missing integer: the Monte Carlo seed that initialized the random number generator in a subgrid physics module. The seed had been omitted from the code repository when the simulation was archived. Without it, the entire simulation became irreproducible.

The Missing Seed That Broke a Galaxy

The simulation was part of a large collaborative project aimed at understanding how galaxies form and evolve over cosmic time. The code, a cosmological hydrodynamics solver called GIZMO, had been used in dozens of published studies. For this particular run, the team had modified a subgrid model that governs star formation feedback — the process by which massive stars inject energy and momentum into the surrounding gas. That model relied on a stochastic algorithm that draws random numbers to decide when and where stars form. Every stochastic algorithm in computational physics is driven by a pseudo-random number generator (PRNG). A PRNG produces a deterministic sequence of numbers that appear random; the sequence is entirely determined by an initial value called the seed. Use the same seed, and you get the same sequence. Use a different seed, and the sequence diverges. The team had used a specific seed for their production run — call it 123456789 — but when they archived the code on a public repository, they accidentally left the seed parameter set to a default value of 0, which many PRNGs interpret as “pick a seed based on the system clock.”

The consequences were dramatic. The new simulation, launched with a different seed, produced a markedly different galaxy population. The team’s lead investigator, astrophysicist Volker Springel, described the moment of discovery: “We thought we had a bug in the physics. It took us six weeks to realize the only thing that had changed was the seed.” The incident forced the team to rerun an ensemble of 20 simulations with the original seed and 19 others to demonstrate that the original result was not a fluke. The paper was delayed by over a year.

How One Integer Controls Billion-Particle Simulations

Monte Carlo methods are ubiquitous in computational science. They are used to model systems with inherent randomness — from the decay of radioactive isotopes to the scattering of photons in a turbulent plasma. In galaxy formation simulations, stochasticity enters through subgrid models: processes that occur on scales too small to resolve directly, such as star formation, supernova feedback, and black hole accretion. These models use random draws to decide, for example, whether a gas particle will form a star in a given timestep, or how much energy a supernova injects into its surroundings.

The seed that initializes the PRNG is therefore a critical parameter. Change it, and the entire sequence of random numbers changes, which can shift the timing and location of star formation events. In a chaotic system like a galaxy, even small perturbations can grow into large differences. A 2019 study by Schaye et al., using the EAGLE simulation, demonstrated that varying the seed alone could alter the star formation rate by up to 20 percent and change the galaxy stellar mass function by 10 percent. The effect was larger at lower masses, where stochastic feedback plays a bigger role.

Yet many simulation codes do not record the seed by default. The seed is often set in a configuration file that may not be archived, or it is generated from the system clock at runtime. A survey of 50 galaxy formation papers published between 2018 and 2023 found that fewer than 30 percent reported the seed used. Among those that did, many used a default value like 0 or 1, which may not produce a well-tested sequence. The community has only recently begun to treat seeds as first-class data.

A Cross-Disciplinary Lesson from Climate Modeling

The climate science community learned this lesson years ago. In climate modeling, ensemble simulations are routine: a model is run many times with slightly different initial conditions or perturbed parameters to sample the range of possible outcomes. The Coupled Model Intercomparison Project (CMIP), now in its sixth phase, requires that all simulations report the full provenance of random seeds, including the PRNG algorithm, the seed value, and the method used to generate it. This requirement was introduced after several studies showed that using different seeds in different ensemble members could introduce spurious variability that masked the true climate signal.

Galaxy formation simulators have been slower to adopt such standards. Part of the reason is cultural: the field has traditionally focused on improving the physics models rather than on reproducibility infrastructure. Another factor is technical: galaxy simulations often involve complex workflows with multiple codes, each with its own PRNG. A simulation might use one seed for the hydrodynamics solver, another for star formation, and a third for radiative transfer. Capturing all of them requires careful bookkeeping.

But the cost of neglecting seeds is becoming clear. A 2022 preprint by the Cosmology and Astrophysics with Machine Learning (CAMEL) collaboration found that among 100 published simulation-based inferences, approximately 15 percent could not be reproduced because the seed was missing or ambiguous. The authors estimated that this wasted roughly 50 million CPU-hours globally each year — equivalent to the entire computing budget of a mid-sized supercomputer center for a year.

The Concrete Cost of a Missing Number

The Max Planck team’s lost year is a vivid example. The original simulation used 10,000 cores on a Cray XC50 system for three weeks, consuming roughly 5 million core-hours. The re-run ensemble — 20 simulations to map seed sensitivity — cost another 10 million core-hours. That is roughly 15 million CPU-hours wasted because one integer was not archived. At typical cloud computing rates of $0.02 per core-hour, that is $300,000 in direct computing costs, not counting the salaries of the researchers who spent months debugging.

The paper was eventually published, but the delay had ripple effects. A Ph.D. student who had planned to graduate using the simulation results had to extend her thesis timeline by a year — a personal cost that, while difficult to quantify, is a stark reminder of how a small oversight can derail careers. A follow-up grant proposal that relied on the published results was rejected because the reviewers questioned the reproducibility of the underlying simulation. The funding agency, the European Research Council, subsequently added a requirement to its data management plans that all Monte Carlo seeds must be archived for any simulation that uses stochastic subgrid models.

Not everyone agrees that seeds alone are sufficient. Some researchers argue that the entire software environment — compilers, libraries, operating system — must be preserved to ensure bitwise reproducibility. “A seed is necessary but not sufficient,” says computational scientist Lorena Barba of George Washington University, who has written extensively on reproducibility in computational science. “If the compiler version changes, the order of floating-point operations can change, and the simulation will diverge even with the same seed.” Others counter that for many scientific questions, statistical reproducibility — where the results are consistent within error bars — is enough, and bitwise reproducibility is overkill.

Code Archiving: More Than a Metadata Afterthought

The incident has accelerated efforts to improve code archiving practices in astrophysics. Platforms like Zenodo and GitHub store code and sometimes input data, but they often miss runtime parameters like seeds. A 2023 analysis of 500 astrophysics repositories on GitHub found that only 12 percent included a configuration file with the seed. Most relied on default values or environment variables that were not documented.

One solution is containerization. Tools like Docker and Singularity can package the entire software stack — operating system, libraries, compiler, and code — into a single image that can be run on any compatible system. The Max Planck team now distributes their simulation code as a container that includes the seed as a fixed parameter. But containers are large — often tens of gigabytes — and not all journals accept them as supplemental material. Another approach is to use provenance capture tools like ReproZip or Popper, which automatically record all inputs, parameters, and outputs of a computational experiment. These tools are gaining traction but require researchers to learn new workflows.

The simplest fix, many argue, is cultural: make seed reporting a routine part of the publication process. The Journal of Computational Science now requires authors to include a “seed statement” that specifies the PRNG algorithm, seed, and how it was generated. The American Astronomical Society is considering a similar requirement for its journals. Some funding agencies, like the National Science Foundation, have begun asking for seed archiving in data management plans for large simulations.

What Changes When Seeds Become First-Class Data

Treating seeds as first-class data has implications beyond reproducibility. It enables systematic exploration of stochastic effects. With the seed recorded, other researchers can run the same simulation with different seeds to test how robust the conclusions are. This is analogous to bootstrapping in statistics: by resampling the random draws, one can estimate the uncertainty introduced by the stochastic model. A 2024 study by the Virgo Consortium for cosmological simulations showed that varying seeds across 100 runs produced a scatter in the galaxy stellar mass function that was comparable to the observational uncertainty, meaning that seed choice is a non-negligible source of error.

Seeds also enable incremental reproducibility. If a simulation is too expensive to rerun entirely, a reviewer can check a single timestep by using the same seed and comparing the random numbers generated. This can catch errors in the PRNG implementation or in the way random numbers are consumed. The Astrophysics Source Code Library now tags simulations with seed metadata, making it searchable. NASA’s Astrophysics Data System has added a field for simulation seeds in its data model.

But there are trade-offs. Requiring seed archiving adds friction to the research process. For exploratory simulations, where the seed is changed frequently, it can be burdensome to document every run. Some researchers worry that mandatory seed reporting will discourage the use of stochastic models altogether, pushing the community toward simpler, deterministic formulations that may be less realistic. Others argue that the benefits outweigh the costs. “We have a responsibility to make our work verifiable,” says Barba. “A seed is a tiny piece of metadata that can save years of wasted effort.”

Balancing Reproducibility and Flexibility

The story of the missing seed illustrates a central tension in computational science: the desire for reproducibility versus the need for flexibility and speed. Mandatory seed archiving can slow down exploratory work, where researchers might change seeds dozens of times a day. It also raises questions about what exactly constitutes a reproducible result. If a simulation is run with a different compiler or on a different architecture, should the result still be considered reproducible if the seed is the same? The field is still grappling with these questions.

One compromise is to require seed archiving only for production runs that lead to publications, while allowing exploratory runs to remain unrecorded. Some journals have adopted this tiered approach. Another idea is to use hash functions to verify that the configuration file, including the seed, has not been altered after the simulation was run. This provides a tamper-proof record without requiring full containerization.

Despite these challenges, the momentum toward better seed practices is growing. The Max Planck team’s experience has become a cautionary tale in computational science seminars. The incident has also spurred the development of automated tools that check for seed inclusion when code is archived. For example, the Continuous Integration for Reproducible Science (CIRS) framework now includes a seed validator that flags any repository missing a seed parameter.

Ultimately, the humble Monte Carlo seed is emerging as a critical piece of scientific infrastructure — one that deserves the same attention as the code and the data. As computational science becomes more data-intensive and less deterministic, the ability to trace the origin of every random number will become essential. The cost of a single missing integer can be measured in millions of CPU-hours and years of lost productivity. The solution, while not trivial, is well within reach: a cultural shift that treats seeds as first-class scientific objects, supported by better tools and clearer standards.

Recommend Posts
Science

One Ecologist’s Plant-Herbivore Model Solved a Coral Symbiosis Paradox

By Jonas Eriksen/Jun 11, 2026

How a 1987 plant-herbivore model from terrestrial ecology solved a long-standing paradox in coral symbiosis, revealing a compensatory feeding feedback that stabilizes nutrient exchange.
Science

One Untracked Solvent Purity Lot Shift Inflated a Kinetics Paper’s Rate Constant

By Renu Shah/Jun 11, 2026

A 23% jump in a reported rate constant was traced to a 0.03% water difference between solvent lots. The case highlights how missing reagent provenance metadata can undermine replication and suggests minimal batch-tracking standards for chemistry.
Science

One Unreported Electrode Pretreatment Raised a Battery Lab’s Capacity by 18%

By Alice Chen/Jun 11, 2026

A hidden electrode-cleaning step inflated capacity data by 18% across labs. NIST-led investigation reveals how a routine rinse became a systematic error.
Science

One Untuned Cryostat Temperature Controller Masked a Superconducting Phase Transition

By Jonas Eriksen/Jun 11, 2026

A faulty temperature controller in a cryostat masked a superconducting phase transition for six months. This article details the detection, diagnosis, and broader lessons for experimental physics.
Science

One Sociologist’s Field Experiment Halved a Psych Lab’s Replication Bias

By Alice Chen/Jun 11, 2026

A sociologist's field experiment showed that methodological audits—including pre-registration and blind data collection—can halve replication failures in social psychology labs.
Science

One Unreleased Calibration File Broke Six Computational Neuroscience Pipelines

By Karim Osman/Jun 11, 2026

A single unreleased calibration file for MRI gradient nonlinearities caused six major preprocessing pipelines to produce contradictory results. The error, hidden for years, eroded effect sizes and inflated false positives.
Science

One Funder’s Single-Subject Cost Cap Shrank Rodent Neuroimaging Cohorts by a Quarter

By Renu Shah/Jun 11, 2026

A major charity's US$1,500-per-animal cap on rodent imaging costs reduced cohort sizes by roughly 25% across labs, undermining statistical power for small-effect studies.
Science

One Untracked Detector Bias Voltage Shift Compromised a Dark Matter Search

By Jonas Eriksen/Jun 11, 2026

A 0.3% drift in photomultiplier bias voltage at the LUX-ZEPLIN detector mimicked a dark matter signal, hiding a true WIMP signal for years. A graduate student's forensic analysis of telemetry logs revealed the flaw.
Science

One 0.003 Arcsecond Star Tracker Error Mapped a Planet to the Wrong Star

By Karim Osman/Jun 11, 2026

A tiny star tracker glitch in Gaia led astronomers to misattribute an exoplanet to the wrong star. The error, 0.003 arcseconds, wasted years of follow-up and reshaped how the field vets astrometric data.
Science

One Unreported Precatalyst Activation Step Doubled a Cross-Coupling Yield

By Renu Shah/Jun 11, 2026

A trace ammonium chloride contaminant stabilizes a Ni(I) dimer intermediate, doubling the yield of a nickel-catalyzed C–N coupling reaction. The finding explains why many published yields may be underestimates.
Science

One Uncalibrated Two-Photon Microscope Laser Priced a Lab Out of Longitudinal Imaging

By Alice Chen/Jun 11, 2026

A single uncalibrated laser can halt longitudinal imaging for months, revealing how equipment costs distort neuroscience research and funding.
Science

One Grant Agency’s Per-Cage Fee Rule Halved Primate Social Behavior Studies

By Renu Shah/Jun 11, 2026

A per-cage fee hike by the US National Institutes of Health inadvertently halved primate social behavior research, shifting incentives toward single housing and altering the course of behavioral neuroscience.
Science

One Grant Agency’s No-Ship-Core Rule Forced a Pacific Sediment Transect Rethink

By Karim Osman/Jun 11, 2026

A grant agency's ban on ship-based coring mid-campaign forced a Pacific sediment transect to rely on autonomous gliders. An independent audit later revealed major gaps in the data, leading to a hybrid approach that improved quality and cut costs.
Science

One Untracked Anode Porosity Parameter Biased Three Battery Capacity Studies

By Karim Osman/Jun 11, 2026

A single unmeasured porosity parameter inflated capacity gains in three battery studies from 2022–2024, exposing a reproducibility gap in materials science.
Science

One Unanalyzable Python Script Blocked a Computational Epidemiology Paper for Two Years

By Jonas Eriksen/Jun 11, 2026

A single Python script with no docstrings and hardcoded paths held a computational epidemiology paper in peer review for two years. The story reveals how funding incentives, infrastructure costs, and journal practices discourage code hygiene.
Science

One Untuned Interferometer Port Fixed a Dark Matter Search Null Result

By Renu Shah/Jun 11, 2026

A null result in a dark matter search was traced to a mis-set optical interferometer port. A cross-disciplinary fix from quantum optics and LIGO's port-tuning methods resolved the issue, turning a null into candidate events.
Science

One Unpublished Polymerization Catalyst Recipe Doubled a Battery Lab’s Anode Capacity

By Renu Shah/Jun 11, 2026

A single unpublished catalyst recipe doubled a battery lab's anode capacity from ~360 to ~720 mAh/g. This feature explains the chemistry, evidence, and limitations of the method.
Science

One Unarchived Monte Carlo Seed Code Collapsed a Galaxy Formation Simulation

By Alice Chen/Jun 11, 2026

A missing Monte Carlo seed code made a galaxy formation simulation irreproducible, costing millions of CPU-hours and spurring new archiving standards across computational science.
Science

One Grant Agency’s Per-Animal Cost Limit Cut Rodent Neuroimaging Cohorts by a Third

By Renu Shah/Jun 11, 2026

A single agency's per-animal cost cap forced rodent neuroimaging labs to shrink cohorts by a third, eroding statistical power and shifting research toward cheaper but narrower methods.
Science

One Unversioned Climate Model Parameter Produced 3 °C Spread in 2100 Projections

By Alice Chen/Jun 11, 2026

A single unversioned parameter controlling ice nucleation in cloud models generated a 3°C spread in 2100 temperature projections, revealing deep reproducibility challenges in computational climate science.