Cells routinely make ‘wrong’ proteins that are more abundant than the right ones

The central dogma of molecular biology, that DNA is transcribed into RNA, which is translated into protein according to the genetic code, carries a quiet assumption: that the ribosome reads the code correctly, producing exactly the protein the gene specifies. A study published June 24 in Nature by researchers at Northeastern University, Alnylam Pharmaceuticals, and the Broad Institute shows that this assumption is wrong in thousands of cases, and that the “wrong” proteins are often more stable and more abundant than the canonical ones.

Shira Tsour, Rainer Machne, Andrew Leduc, and colleagues analyzed over 1,000 human samples spanning six cancer types from the Clinical Proteomic Tumor Analysis Consortium and 26 healthy tissues from public proteomic databases. They searched for peptides that deviated from the genetic code, instances where the ribosome had incorporated a different amino acid than the one specified by the codon. The pipeline combined deep RNA sequencing (to build patient-specific protein databases), high-resolution mass spectrometry (to detect mass shifts), and a multi-layer validation cascade that included deep-learning re-scoring and stringent false-discovery rate filtering.

They found 60,803 fragmentation spectra supporting 8,746 unique amino acid substitutions in 1,767 genes. Of those, 1,955 sites could be confidently localized to a specific position in the protein.

Not errors, regulation

The key finding that rules out simple translation errors is abundance. For roughly 10 percent of the detected substitutions, about 360 proteins, the alternately translated form was more abundant than the canonical, genetically encoded version. Random translation mistakes would produce vanishingly small amounts of variant protein. The fact that these alternate forms are the dominant version of the protein in the cell points to a regulated, programmed process that the authors call sense-codon recoding.

Several factors associate with the phenomenon. Rare codons, those with a smaller pool of matching transfer RNAs, show higher substitution rates. Protein stability plays a role: the alternately translated forms degrade more slowly (measured by SILAC pulse-chase experiments, with a p-value below 10⁻¹⁰). RNA modifications, particularly uracil modifications such as pseudouridylation, overlap significantly with substitution sites (p < 10⁻¹⁰). Intrinsically disordered regions of proteins are also positively associated with higher substitution ratios.

Which proteins are affected

The recoded proteins are not random. Functional enrichment analysis identifies proteasome subunits (PSMA5, PSMB6, PSMB7), Ras-family signaling proteins, RNA-recognition-motif proteins, and thioredoxin-domain proteins as significantly over-represented. Lamin A shows a substitution that is elevated in tumors compared to matched normal tissue across three cancer types. A specific asparagine-to-glycine substitution in the PP1-beta catalytic subunit (PP1CB) is consistently higher in lung cancer patients. Pancreatic ductal adenocarcinoma shows a strong tendency toward glycine-to-serine substitutions.

Importantly, the alternately translated forms are conserved between humans and mice. Fifty-five identical substitutions are detectable in mouse tissues, and their relative abundance correlates significantly across species (p < 10⁻⁸), as does their tissue specificity. This conservation across approximately 90 million years of evolution strongly suggests the phenomenon is functional rather than pathological.

What this means

Only 110 of the approximately 9,000 detected substitution sequences match any known Ensembl or UniProt translation product. The vast majority are invisible to DNA and RNA sequencing, they can only be detected by directly observing the protein. This implies that the mammalian proteome is substantially larger and more diverse than the genome alone predicts.

The clinical implications are broad. Many drug targets, including proteasome subunits and signaling proteins, may exist in alternate forms with different stability, activity, or drug-binding properties. The link to neurodegeneration (proteasome subunits, which clear misfolded proteins, are among the most heavily recoded) is particularly suggestive. If cancer-specific substitution profiles can be validated as biomarkers, they could open a new axis for both diagnosis and therapy.

Source: Tsour, S., Machne, R., Leduc, A. et al. Alternate RNA decoding results in stable and abundant proteins in mammals. Nature (2026). DOI: 10.1038/s41586-026-10678-2