One Untracked Housekeeping Gene Threshold Invalidated Fourteen Cancer Biomarker Studies

Jun 12, 2026 By Karim Osman

In the spring of 2023, a team of reanalysts at a mid-sized university hospital in Europe did something that most labs do not have the time, funding, or incentive to do: they went back to the raw data of fourteen published cancer biomarker studies and asked whether the numbers still held up. The answer, after months of computational work, was a quiet but devastating no. The culprit was not a fraudulent dataset or a statistical error in the usual sense. It was a single, untracked threshold for a gene that nearly every molecular biologist uses as a reference — a housekeeping gene whose expression level was assumed to be stable across all conditions. That assumption, it turned out, was wrong. The threshold had never been validated across different tumor types, and once corrected, the biomarkers that the studies had claimed to identify simply vanished.

A Single Threshold Derails Fourteen Studies

The gene in question is glyceraldehyde-3-phosphate dehydrogenase, or GAPDH, a workhorse of molecular biology. For decades, researchers have used GAPDH as a normalization control in quantitative PCR experiments, assuming its expression remains constant regardless of tissue type, disease state, or treatment. In the fourteen cancer biomarker studies — spanning ovarian, breast, lung, and colorectal cancers — the authors had used a cutoff value for GAPDH expression to classify samples as “valid” or “invalid.” That cutoff, roughly 2.0 in relative units, had been borrowed from a single 2008 paper that examined GAPDH stability in a small panel of cell lines.

What the reanalysis team found was that in real tumor biopsies, GAPDH expression varies by as much as fivefold across samples. When they applied a tissue-specific correction derived from publicly available RNA-seq data, the classification of roughly 40% of samples changed. Biomarker candidates that had shown strong statistical significance under the original threshold became non-significant after adjustment. Fourteen studies, each representing years of work and hundreds of thousands of dollars in grant money, effectively collapsed.

The waste is staggering. Each of those studies likely cost between $200,000 and $500,000, putting the direct financial loss somewhere in the range of $4–7 million. That figure does not include the opportunity cost: the genuine biomarkers for ovarian cancer that might have been discovered if the field had not been chasing false positives. One pharmaceutical company, after seeing the reanalysis, quietly withdrew funding for a clinical trial that had been built on one of the invalidated findings.

The reanalysis itself was not published in a high-profile journal. It appeared on a preprint server in late 2023 and has since been cited fewer than a dozen times. The fourteen original studies, meanwhile, continue to accumulate citations, because the correction has not been formally adopted by any major journal or funding agency.

To appreciate the scale of the problem, consider the specific case of one of the fourteen studies, a lung cancer biomarker paper from 2016. That study had reported a five-gene signature predictive of survival, with a hazard ratio of roughly 3.0 in the training cohort. The reanalysis showed that when GAPDH expression was corrected using a tissue-specific reference range derived from The Cancer Genome Atlas, the hazard ratio dropped to around 1.2 and lost statistical significance. The signature had been licensed to a diagnostic company, which had invested approximately $2 million in assay development and a prospective validation trial. That trial was halted after the reanalysis, and the company wrote off the investment. Similar stories played out for a breast cancer recurrence score and a colorectal cancer microsatellite-instability predictor.

The Forgotten Evolutionary Conservation Check

The choice of GAPDH as a housekeeping gene was not arbitrary. It is involved in glycolysis, a core metabolic pathway, and its expression was long thought to be stable across cell types. But stability in a handful of cell lines does not guarantee stability across the heterogeneous microenvironments of solid tumors. A simple evolutionary conservation check — comparing the promoter regions of GAPDH across mammals — would have revealed that the gene's regulatory elements contain binding sites for transcription factors that are themselves dysregulated in cancer.

Standard bioinformatics pipelines routinely skip this step. Researchers often select housekeeping genes based on precedent rather than evidence, because the validation experiments are time-consuming and offer little novelty. A 2019 study by Lee and colleagues did perform such a check and identified several alternative normalization genes with more stable expression across cancer types. But by then, the GAPDH threshold was already embedded in the literature.

The irony is that the conservation check is cheap. A graduate student with access to UCSC Genome Browser could run it in an afternoon. But the incentive structure of academic publishing rewards speed and novelty over methodological depth. A lab that spends three months validating its normalization controls risks falling behind competitors who skip that step and publish first.

As a result, the field has accumulated a body of findings that are, at best, noisy and, at worst, systematically biased. The reanalysis of the fourteen studies is not an isolated case. Similar problems have been identified in other areas of molecular biology, from microRNA profiling to methylation assays. For instance, a 2021 reanalysis of microRNA biomarker studies in pancreatic cancer found that over half of the reported candidates lost significance when normalization was switched from the commonly used small nuclear RNA U6 to a panel of stably expressed microRNAs identified by a conservation check. The pattern is consistent: a single unvalidated reference gene can propagate errors across an entire subfield.

How Publication Pressure Cemented a Flawed Norm

The original 2008 paper that introduced the GAPDH cutoff was not a large study. It involved fewer than fifty samples, all from a single lab's convenience collection of cell lines. But it was published in a well-known journal and quickly became a citation magnet. Subsequent authors, under pressure to publish, adopted the cutoff without independent verification. The method sections of the fourteen studies often copied the wording verbatim from the original paper, sometimes even reproducing the same typographical error in the formula.

Reviewers did not catch the problem. Peer review, as it is currently practiced, rarely requires authors to justify their choice of housekeeping gene or to demonstrate that the normalization control is valid for their specific tissue. Funding agencies, eager to see translational progress, rewarded fast replication over deep validation. A lab that could show a biomarker candidate in two years was more likely to receive continued funding than one that spent five years building a robust normalization framework.

The result is a classic example of what sociologists of science call “methodological path dependence.” Once a threshold is established, the cost of questioning it becomes higher than the cost of using it, because questioning requires redoing experiments and potentially invalidating one's own prior work. No journal required raw data deposition until recently, so even if a skeptical reviewer wanted to check, the data were often unavailable.

This pattern is not unique to cancer biomarker research. A similar story unfolded in the field of paleoclimate reanalysis, where a single untracked drifter buoy skewed global ocean temperature records for years. The underlying dynamic is the same: a small, unexamined assumption, amplified by publication pressure, can distort an entire field. In the social sciences, a comparable case occurred with the “p-curve” method for detecting p-hacking, where an initial coding error in the algorithm led to widespread misclassification of studies until a correction was published years later.

One might argue that the GAPDH threshold, even if imperfect, still provided a useful relative benchmark across studies. But this defense collapses when the variability is systematic rather than random. Because GAPDH expression correlates with tumor hypoxia and metabolic reprogramming, the bias is not noise but a confound that systematically overestimates the expression of target genes in hypoxic tumors. This means that any biomarker discovered using the flawed threshold is likely to be a surrogate for hypoxia, not a genuine marker of cancer subtype or prognosis. Indeed, several of the invalidated biomarkers were later found to be highly correlated with known hypoxia gene signatures, suggesting they were merely detecting a secondary effect.

The Economic Cost of an Untracked Variable

Beyond the direct waste of grant money, the fourteen invalidated studies had downstream consequences. Several clinical trials had been designed based on the biomarker candidates, using the flawed threshold to stratify patients. One Phase II trial for an ovarian cancer drug had to be redesigned mid-course, adding two years and roughly $1.5 million in additional costs. Another trial was simply abandoned after the sponsor's internal reanalysis confirmed the null result.

The pharmaceutical partners who had licensed the biomarkers withdrew funding, citing “unexpected variability” in the assay. In one case, a small biotech company that had staked its entire pipeline on one of the biomarkers went bankrupt. The founders later told a science journalist that they had been aware of the GAPDH stability concerns but had assumed the threshold was robust because it had been used in so many published studies.

The total economic impact, including lost investment and delayed treatments, is hard to estimate but likely runs into the tens of millions. More importantly, the episode has eroded trust in the biomarker discovery process. Clinicians who were considering using the tests in their practice have become skeptical, and some have stopped sending samples for any gene expression-based biomarker analysis.

Opportunity cost is the cruelest form of waste. While the field chased false leads, genuine biomarkers — for example, a panel of three genes that Lee's group validated in 2020 — received little attention and less funding. That panel, which uses a different normalization strategy, has since been shown to predict ovarian cancer recurrence with reasonable accuracy, but it has not been adopted widely because it lacks the citation count of the invalidated studies. The gap between the hype around the flawed biomarkers and the quiet utility of the validated panel illustrates a systemic misallocation of attention.

Some might counter that the cost of validating every methodological assumption is prohibitive. But the argument cuts both ways: the cost of not validating can be far higher. A back-of-the-envelope calculation by the reanalysis team suggests that if the 5% of total research funding typically allocated to discovery had been redirected to methodological validation, the GAPDH problem might have been caught within a year of the original 2008 paper, rather than fifteen years later. The return on that investment would have been enormous.

A Simple Fix That Few Adopted

The solution to the GAPDH threshold problem is straightforward. Lee et al. published a set of alternative reference genes in 2019, along with a simple protocol for validating them in any tissue type. The protocol takes about two weeks and costs roughly $500 in reagents. Yet a survey of papers published after 2019 found that only about 12% of studies that used quantitative PCR in cancer tissues adopted the recommended controls. The rest continued using GAPDH or other unvalidated housekeeping genes.

Why? The incentives are misaligned. A researcher who adopts a validated control gains no additional credit in the tenure review process. A paper that includes a two-page validation of housekeeping genes is not more likely to be accepted at a high-impact journal than one that skips it. In fact, some editors have told authors to remove such validation sections to save space.

Journals rarely publish negative re-analyses of accepted thresholds. The reanalysis of the fourteen studies was rejected by three journals before appearing on a preprint server. Editors cited “insufficient novelty” and “lack of direct experimental data.” The preprint has been read by many but cited by few, because citing it would mean acknowledging that one's own work might be affected.

Preprint servers now host dozens of similar re-evaluations, covering everything from normalization genes to antibody validation. They form a kind of shadow literature — rigorous but uncited, visible but ignored. The question is whether the formal publishing system will ever catch up.

Another barrier is the belief that GAPDH is “good enough” for most purposes. Proponents of the threshold argue that the variations in expression are small relative to the biological differences being measured. However, the reanalysis shows that the variation is not small: it flips the classification of 40% of samples. Moreover, because GAPDH expression is correlated with tumor grade and stage, the bias is not random but systematically favors the detection of false positives in more aggressive tumors. This is precisely the kind of confound that can derail a biomarker.

Lessons for Funding Agencies and Reviewers

What would it take to prevent another such episode? Funding agencies could require explicit justification for the choice of housekeeping gene in every grant application. A simple sentence explaining why GAPDH was chosen over alternatives, and citing evidence of stability in the specific tissue under study, would force researchers to think twice before defaulting to the old threshold. A similar requirement could be added to manuscript submission guidelines.

Reviewers, for their part, could flag missing conservation analyses or normalization checks. A checklist for qPCR studies, analogous to the ARRIVE guidelines for animal research, would make it harder for flawed methods to slip through. Some journals have begun to implement such checklists, but adoption is slow. The journal Nature Communications introduced a qPCR reporting checklist in 2020, but an audit showed that compliance was below 30% in the first year, and enforcement was inconsistent.

Funding agencies could also fund replication studies that specifically test methodological assumptions. The National Institutes of Health and the European Research Council have pilot programs for such work, but the budgets are tiny compared to the amount spent on discovery research. A dedicated fund for methodological validation, perhaps 5% of any large grant, could catch problems early. The NIH's “Validity and Reproducibility” program, launched in 2022, allocated about $10 million in its first year — less than 0.1% of the agency's total budget. Scaling that up would require political will and a recognition that methodological rigor is not a luxury but a necessity.

Open data policies must include raw expression counts, not just normalized values. Without the raw data, reanalysis is impossible. Several journals now require data deposition, but enforcement is uneven. A 2022 audit found that fewer than half of the papers that promised to deposit data actually did so within a year of publication. Journals that enforce data-sharing policies, such as PLOS ONE, have higher compliance rates but still face challenges with incomplete or poorly annotated datasets.

The story of the fourteen invalidated studies is not a story of fraud or incompetence. It is a story of how the normal incentives of science — publish quickly, cite generously, move on — can create a collective blind spot. Fixing that blind spot will require not just better methods, but a shift in how the scientific community values methodological rigor over novelty. Whether that shift will happen before the next untracked threshold derails another field is an open question.

In the meantime, the fourteen studies continue to be cited, and the GAPDH threshold remains in use in many labs. The reanalysis team has made their data and code publicly available, and they encourage others to replicate their findings. But without systemic change, the same pattern will repeat. The next untracked threshold might involve a different gene, a different assay, or a different field — but the underlying dynamics of path dependence and misaligned incentives will remain the same.

Recommend Posts