One Unarchived Monte Carlo Seed Haunts a Computational Ecology Paper

Jun 11, 2026 By Renu Shah

In 2018, a team of ecologists published a paper in a high-impact journal arguing that seed-dispersal limitation—the failure of seeds to reach suitable sites—was a primary driver of tree diversity in tropical forests. The claim rested on an agent-based simulation that ran tens of thousands of stochastic iterations. But when another group tried to reproduce the ensemble a year later, they hit a wall. The authors had deposited their R scripts on a public repository, but the Monte Carlo seed—the single integer that initializes the pseudorandom number generator—was missing. Without it, no one could replicate the exact sequence of random draws that produced the published results. The seed, it turned out, existed only in an email attachment sent from a former postdoc’s university account, and that account had been deactivated. The paper’s central conclusion became, in effect, unverifiable.

This incident is not an outlier. It is a symptom of a systemic weakness in how computational ecology—and much of computational science—handles the evidence it produces. The missing seed is a concrete, tractable problem, but it points to deeper questions about funding, incentives, infrastructure, and training. This article follows one seed’s trajectory through a research ecosystem that rarely rewards the work of archiving it, and asks what would need to change so that future seeds are not lost.

The Seed That Should Not Have Been Lost

The Monte Carlo seed for the 2018 population model was a 9-digit integer, chosen arbitrarily by the lead author when they set up the simulation. In any pseudorandom number generator, that seed determines the entire stream of random values that follow. Change the seed, and every subsequent random draw changes. In a simulation that relies on thousands of stochastic events—whether a seed lands in a gap, whether a seedling survives herbivory, whether a tree falls—different seeds produce different outcomes. The ensemble of runs, typically 1,000 or more, is supposed to average over this variability. But if the seed is not recorded, no one can confirm that the ensemble was generated as described.

The authors did share their R scripts, and the supplementary PDF included a table of parameter values. But the seed was not in the table. It was not in the script comments. It was not in the data management plan. The funding agency, a national science foundation, had required a data management plan but not a code archiving plan, and never checked whether simulation metadata were deposited. When the reanalysis team contacted the lead author, they learned that the seed had been stored in a local text file on a lab laptop that had since been wiped. The only copy was in an email attachment sent to a co-author, who had left academia and no longer had access to the account. The seed was effectively gone.

The cost of this loss is not just the frustration of one reanalysis. It is the erosion of trust in the paper’s findings. Without the seed, any attempt to reproduce the ensemble is necessarily approximate. One can try a range of plausible seeds, but that introduces the very uncertainty the original ensemble was designed to control. The paper’s effect sizes—the magnitude of the seed-dispersal limitation effect—cannot be verified. The research community is left with a claim that may be true, but that cannot be independently confirmed.

Computational Ecology's Reproducibility Gap

The missing seed is part of a larger pattern. A 2022 survey of 500 papers published in Methods in Ecology and Evolution found that fewer than 30% archived the simulation code needed to reproduce their results. Among those that did archive code, fewer than half included the seed value or the random-number generator type and settings. A similar review of 200 papers in Ecological Modelling in 2023 found that only 12% reported the seed. The numbers are not much better in other computational fields: a meta-analysis of reproducibility in computational biology, published in 2024, estimated that roughly 25% of simulation studies provide enough information to fully rerun the analyses.

Why so low? One reason is that seeds are often treated as trivial—a technical detail not worth reporting. Another is that journals rarely require their inclusion. The Methods in Ecology and Evolution survey found that only 5% of journals in ecology had explicit policies mandating code archiving, and even fewer required simulation metadata. Without a policy, authors have little incentive to spend the extra time documenting seeds, generator algorithms, and initialization procedures. The result is a literature in which many computational findings are, in practice, irreproducible.

The consequences can be substantial. In agent-based models, which are common in ecology, the seed can shift confidence intervals by 10–40% depending on the model’s sensitivity to initial conditions. Heavy-tailed distributions—common in ecological data, such as seed dispersal distances or mortality rates—amplify this sensitivity. A single seed that happens to produce an extreme outlier in the ensemble can bias the average. Without archiving, the reader cannot tell whether the published result is robust or an artifact of one particular random stream.

Why Seeds Matter in Agent-Based Models

Agent-based models simulate the behavior of individual entities—trees, animals, cells—and their interactions. In ecology, these models are used to predict population dynamics, species distributions, and ecosystem responses to climate change. They rely heavily on pseudorandom number generators to introduce stochasticity: where a seed falls, whether a predator finds prey, whether a fire ignites. Each random draw is deterministic given the seed, but the sequence is designed to mimic randomness. The seed is the key that locks the sequence.

Consider a forest-gap model, a type of agent-based model used to simulate tree regeneration in tropical forests. The model initializes a grid of patches, each representing a potential tree location. At each time step, a random subset of patches becomes gaps—openings in the canopy where light reaches the forest floor. Seeds from surrounding trees disperse into these gaps, and the probability of establishment depends on gap size, light availability, and competition. The model runs for hundreds of years, and the output is the species composition and diversity at the end of the simulation.

In a 2019 study, researchers ran a forest-gap model 1,000 times with the same parameters but different seeds. They measured the variance in patch occupancy—the proportion of patches occupied by a given species—across runs. The variance due solely to seed choice was as large as the variance due to a 20% change in seed dispersal distance, a key ecological parameter. In other words, the random seed had as much influence on the model’s output as a biologically meaningful parameter. Without archiving the seed, the model’s predictions are not fully specified. A result that appears significant in one run may vanish in another.

This is not a theoretical concern. In the 2018 forest-gap paper, the reanalysis team ran the model with 100 different seeds, using the same R scripts and parameter values, and found that the main result—that seed-dispersal limitation drives diversity—held in only about 60% of the runs. In the other 40%, the effect reversed or disappeared. The original paper had reported a strong, consistent effect. But without the original seed, the reanalysis could not determine whether the original result was a statistical fluke or a robust finding. The paper’s conclusion remains in limbo.

The Economics of Code Archiving

If archiving seeds and code is so important, why is it not standard practice? The answer lies partly in economics. Depositing code on a platform like Zenodo or Figshare costs nothing in terms of storage fees, but the time required to prepare a clean, documented repository is real. Lab PIs estimate that it takes 2–5 hours to organize scripts, write a README, remove hard-coded file paths, and add comments. For a postdoc or graduate student on a short-term contract, those hours are often not budgeted. No funding line in most US federal grants explicitly covers archival labor. The National Science Foundation and National Institutes of Health require data management plans, but those plans rarely include code or simulation metadata.

Reviewers and editors also contribute to the problem. Most journals in ecology do not require code archiving as a condition of publication. A 2023 survey of 200 ecology editors found that fewer than 20% said they routinely check whether code is deposited. Reviewers, who volunteer their time, rarely request seeds or random-number generator details. The incentives point away from archival: authors are rewarded for publishing new results, not for documenting old ones. The time spent preparing a repository is time not spent on the next paper.

The result is that code and seeds rot on personal hard drives, lab laptops, and university servers that are decommissioned when students graduate or PIs move institutions. The 2018 forest-gap paper is not unusual. A 2024 study of 100 randomly selected ecology papers with simulation components found that 45% of the authors could not locate the original seed or random-number generator settings when contacted. The seeds were on machines that had been recycled, in email accounts that had been closed, or in file formats that were no longer readable. The digital decay happened within five years of publication.

Infrastructure exists to prevent this. Platforms like Code Ocean and WholeTale allow researchers to package code, data, and runtime environments into containers that can be rerun years later. These platforms capture the full computational provenance, including the seed, the generator algorithm, and the software versions. But they cost money—roughly $50 per month per project for cloud compute and storage, depending on the size of the simulation. For a lab running multiple projects, that adds up. Compare that to journal page charges, which often run $1,000–3,000 per article. The cost of archiving is small relative to publication fees, but it is not covered by the same funding streams.

A Worked Example: The 2018 Forest-Gap Paper

The 2018 forest-gap paper is worth examining in detail because it illustrates how a single missing seed can destabilize an entire research program. The paper, published in Ecology Letters, used an agent-based model to simulate tree recruitment in a 50-hectare plot in Panama. The model incorporated seed production, dispersal, germination, and seedling survival, all driven by stochastic processes. The authors reported that seed-dispersal limitation—the failure of seeds to reach gaps—explained 70% of the variance in species diversity, a striking result that challenged the prevailing view that competition for light was the dominant driver.

The paper was cited over 200 times in the first five years, and several follow-up studies built on its findings. But when a group at a different university tried to extend the model, they could not reproduce the baseline results. They contacted the lead author, who shared the R scripts but could not provide the seed. The lead author later acknowledged that the seed was “somewhere on a laptop” that had been donated to a university surplus program. The scripts, it turned out, contained a comment that said “set.seed(12345)” but that seed did not produce the published figures. The actual seed had been changed during debugging and never updated in the script.

The reanalysis team then ran the model with 1,000 different seeds, using the same parameters and scripts. They found that the effect of seed-dispersal limitation varied from 30% to 85% of variance explained, depending on the seed. The original result of 70% was near the upper end of this range. The paper’s conclusion, which had been treated as robust, was actually highly sensitive to the random seed. The lead author, in a later email, said that the seed had been chosen because it “looked nice” and that he had not realized it would matter. This is a common attitude: seeds are seen as arbitrary, not as a critical component of the evidence.

The case is now often used in graduate seminars on reproducibility as a cautionary tale. But it has not led to widespread changes in practice. The journal did not issue a correction or a retraction; the paper remains in the literature as if the result were solid. The follow-up studies that relied on the original finding may themselves be compromised. The missing seed has created a small but persistent doubt that cannot be resolved without a time machine.

Infrastructure That Could Fix the Haunt

The technical solutions to the seed problem are straightforward. Containerized environments like Docker or Code Ocean lock the entire software stack—operating system, libraries, scripts, and seed—into a single unit that can be rerun identically years later. Continuous integration services can automatically rerun simulations on each code commit, ensuring that changes do not break reproducibility. Platforms like WholeTale capture the full computational provenance, including every input and output, so that the seed is never lost. These tools exist and are used by a small fraction of the research community.

The barrier is not technical but cultural and economic. Adopting these tools requires upfront investment in learning and setup. For a lab that already has a workflow, the transition can feel like a distraction. The cost of cloud compute and storage for a typical ecology lab running multiple models might be $50–100 per month, a non-trivial expense for a lab with limited discretionary funds. But compare that to the cost of a single irreproducible paper: the wasted effort of failed reanalyses, the lost trust, the opportunity cost of building on a shaky foundation. The economics favor archiving, but the incentives do not.

Some funders are beginning to act. The European Research Council now requires that simulation code be deposited in a recognized repository at the time of publication. The National Science Foundation’s Office of Advanced Cyberinfrastructure has pilot programs that include code archiving in data management plans. But these are exceptions. Most funding agencies still treat code as an afterthought, and most journals still do not enforce existing policies. The result is a patchwork: some labs archive diligently, most do not, and the literature accumulates seeds that may or may not be recoverable.

Graduate curricula are also starting to include reproducibility checklists. A few universities now require students in computational ecology courses to document seeds and random-number generators as part of their assignments. But these programs are rare. The majority of graduate students in ecology receive no formal training in computational reproducibility. They learn from their advisors, who learned from theirs, and the cycle of neglect continues.

Small Changes, Large Returns

Fixing the seed problem does not require a massive overhaul. Small changes in policy and practice could yield large returns. Journals could mandate seed archiving as a condition of publication, and enforce it by requiring that code and metadata be deposited before final acceptance. Funders could require simulation metadata alongside data management plans, and include a line item for archival labor in grant budgets. Graduate programs could add a one-hour module on reproducibility to existing methods courses. Peer reviewers could ask, as a standard question: “Where is the seed and the random-number generator?”

These changes would not eliminate all reproducibility problems. Seeds are only one part of the puzzle; software versions, compiler flags, and hardware differences can also affect results. But seeds are the easiest part to fix. They are a single integer. They take up negligible storage. They are trivial to include in a README file. The fact that they are routinely lost is a sign of how little the system values the long-term usability of computational evidence.

The 2018 forest-gap paper is not an isolated incident. Similar stories play out in fields from epidemiology to climate science. A missing seed here, an unrecorded parameter there—each one small, but cumulatively they erode the foundation of computational science. The cost of archiving is modest; the cost of not archiving is measured in lost time, lost trust, and lost knowledge. The next time a researcher sets a seed, they might pause and ask: will anyone be able to find this in five years? The answer, right now, is too often no.

Recommend Posts
Science

One Grant Agency’s Three-Year Funding Cycle Broke a Decade-Long Longitudinal Study

By Alice Chen/Jun 11, 2026

How a three-year funding cycle interrupted a ten-year panel study on childhood resilience, losing critical data and raising questions about how grant agencies evaluate long-term research.
Science

One Grant Agency’s No-Cloud-Storage Rule Buried a Computational Reproducibility Audit

By Alice Chen/Jun 12, 2026

A European biomedical funder's rule requiring all data on local drives blocked a computational reproducibility audit, revealing misaligned incentives between policy and verification.
Science

One List Experiment Revealed a 14-Point Gap in Self-Reported Altruism

By Jonas Eriksen/Jun 12, 2026

A simple checklist experiment reveals that people rate themselves as far more altruistic than they rate others. The 14-point gap has sparked debate among scientists about what self-reports actually measure.
Science

One Uncorrected Drift in a Single Paleoclimate Proxy Reroutes a Deglaciation Timeline

By Alice Chen/Jun 11, 2026

A tiny correction for detrital contamination in a Chinese stalagmite shifted the deglaciation timeline by 2,500 years, reshaping our understanding of global climate synchrony.
Science

One Unversioned Solver Tolerance Broke a Computational Fluid Dynamics Benchmark

By Renu Shah/Jun 12, 2026

A default solver tolerance change, unmentioned in release notes, caused inconsistent results across labs in a widely used CFD benchmark, highlighting reproducibility challenges in computational science.
Science

How an Optical Tweezer Stabilization Code Crossed Into Cellular Biophysics

By Jonas Eriksen/Jun 12, 2026

The story of how a feedback stabilization algorithm, originally developed to pin cold atoms in place, migrated into cellular biophysics and transformed single-molecule force measurements.
Science

One Uncapped Spectrograph Saturation Limit Cost a Galaxy Survey 2,000 Redshift Estimates

By Karim Osman/Jun 12, 2026

A single saturation threshold in a spectrograph pipeline caused the loss of roughly 2,000 redshift estimates from a major galaxy survey, discovered years later by a graduate student. The error highlights how small instrumentation decisions can have outsized consequences.
Science

One Untracked Refrigerant Lot Shift Gave a Protein Crystallography Lab False Structures

By Alice Chen/Jun 12, 2026

A contaminated batch of refrigerant R-134a derailed three doctoral projects in a UK crystallography lab, revealing how overlooked consumable variables can undermine research integrity and highlighting systemic gaps in funding and quality control.
Science

One Radio Telescope’s Phased-Array Feed Tripled a Galaxy Redshift Survey’s Count

By Renu Shah/Jun 12, 2026

A phased-array feed on the Westerbork telescope created 64 simultaneous beams, tripling the number of galaxies detected in a neutral hydrogen survey and transforming radio astronomy.
Science

One Unfunded Telescope Time Request Buried a Supernova Survey for Five Years

By Jonas Eriksen/Jun 12, 2026

A single rejected proposal for Gemini North telescope time blocked a five-year supernova survey, leaving a gap in transient science that archival data cannot fill.
Science

One Untracked Sea Surface Drifter Buoy Cost Split a Paleoclimate Reanalysis

By Karim Osman/Jun 11, 2026

A single US$25,000 drifter buoy introduced a 0.3°C shift in a 2-million-year paleoclimate reanalysis, triggering a funding audit and reshaping the consensus on Pleistocene temperature variability.
Science

One Untracked Stellar Population Model Rerouted a Galaxy Evolution Timeline

By Alice Chen/Jun 12, 2026

How ignoring stars formed in accreted dwarf galaxies skewed age estimates for massive ellipticals by billions of years, and how the fix reshaped galaxy formation theory.
Science

One Untracked Solvent Grade Shift Hollowed a Metal-Organic Framework Paper

By Renu Shah/Jun 12, 2026

A trace impurity in a solvent batch derailed a high-profile MOF paper, revealing how invisible variables in routine synthesis can undermine reproducibility and waste resources across the field.
Science

One Untracked Housekeeping Gene Threshold Invalidated Fourteen Cancer Biomarker Studies

By Karim Osman/Jun 12, 2026

How a single, unvalidated cutoff for a housekeeping gene led to the retraction of fourteen cancer biomarker studies, costing millions in wasted research funding.
Science

One Unarchived Monte Carlo Seed Haunts a Computational Ecology Paper

By Renu Shah/Jun 11, 2026

A missing Monte Carlo seed from a 2018 ecology paper blocks reanalysis, revealing how fragile simulation-based conclusions can be when code archiving is overlooked.
Science

One Funder’s Capped Cruise Days Forced a Pacific Aerosol Transect Reroute

By Karim Osman/Jun 11, 2026

When an NSF grant capped ship days at 45, a Pacific aerosol transect was rerouted, leaving a 20° longitude data gap that stalls climate model improvements.
Science

One Uncorrected Motion Artifact Swapped the Sign of a Fear Circuitry Study

By Renu Shah/Jun 12, 2026

A 2015 fear-conditioning fMRI study had its main effect reversed by uncorrected head motion. New methods and a practical checklist for reviewers are reshaping how the field handles motion.
Science

One Structural Equation Modeler’s Covariance Fix Rescued a Neuroscience Meta-Analysis

By Renu Shah/Jun 12, 2026

A statistician's insight from psychometrics reduced heterogeneity by 40% in a floundering fMRI meta-analysis, tightening confidence intervals and reshaping funding requirements.
Science

One Uncorrected Guide Star Catalog Tie Flattened a Galaxy Rotation Curve

By Jonas Eriksen/Jun 12, 2026

A 0.3-arcsecond misalignment in a Guide Star Catalog tie systematically flattened rotation curves for 14 galaxies in the SPARC sample, mimicking a dark matter signal. Gaia DR3 revealed the error, now correctable.
Science

One Untracked Vacuum Chamber Leak Rate Skewed a Spectroscopy Paper’s Line Shape

By Jonas Eriksen/Jun 11, 2026

A tiny helium leak in a vacuum chamber at NIST led to a retracted spectroscopy paper. The incident reveals how vacuum quality, often overlooked, can distort spectral line shapes and undermine precision measurements across fields.