AI-driven optimization opens the door to genome-wide transcriptomics in cell-free systems

For years, cell-free systems have been a powerful but limited tool in synthetic biology. They allow researchers to carry out transcription and translation outside a living cell, making them ideal for prototyping genetic circuits, producing difficult proteins, and building biosensors. But there has been a trade-off: the yield of messenger RNA (mRNA) from these systems has generally been too low for genome-wide transcriptomic profiling.

A team from INRAE and Universite Paris-Saclay has now broken through that barrier. Using Bayesian optimization to search a combinatorial space of 1.6 million possible buffer compositions, they increased mRNA yield 20-fold in a standard E. coli cell-free system, enough, for the first time, to perform full transcriptome sequencing and reveal progressive layers of gene regulation. The work was published June 27 in Nature Communications.

Teaching an old system new tricks

The starting point was an E. coli BL21(DE3) lysate system driven by T7 RNA polymerase, one of the most widely used platforms in cell-free synthetic biology. The team, led by Matthieu Jules and Olivier Borkowski at the Micalis Institute, needed to overcome a fundamental problem: the standard buffer formulation was optimized decades ago for protein production, not for mRNA yield.

They identified eight buffer components, including magnesium, potassium, amino acids, NTPs, and PEG-8000, and varied each across six concentration levels. The full combinatorial space was 1,679,616 possible compositions. Testing even a fraction by brute force would be prohibitive.

So the team turned to active learning. A Bayesian optimization algorithm, starting from 100 compositions chosen by Latin Hypercube Sampling, explored the landscape of 1.6 million possibilities by testing just 653 compositions in the lab. After ten active learning cycles, it identified a formulation that increased mRNA yield 20-fold relative to the reference buffer.

“Active learning guided us to a region of the composition space that would have been extremely unlikely to find through trial and error,” the authors note. The key adjustment: higher magnesium and NTP concentrations, lower potassium, amino acids, and PEG-8000.

From optimization to transcriptome

The 20-fold yield improvement made something possible that had eluded cell-free systems: genome-wide direct RNA sequencing. The team turned to bacteriophage T7, a well-characterized virus with a compact genome, and performed direct RNA-seq using Oxford Nanopore’s MinION platform across three systems of increasing biological complexity.

The first system used only purified T7 RNA polymerase, DNA template, and nucleotides, the minimalist configuration. It captured promoter-strength hierarchies: which T7 promoters are strong or weak in their native genomic context. But with no RNA degradation machinery present, coverage was heavily skewed toward the 5′ ends of transcripts.

The second system used the optimized cell-free extract with its full complement of E. coli proteins. This restored RNase III activity, evidenced by mRNA maturation sites in the T7 transcript, and produced uniform coverage across transcripts, a true steady-state snapshot. It provided an accurate estimation of in vivo expression levels.

The third system was the full cellular context, E. coli undergoing T7 infection. This added a layer of regulation absent in cell-free lysates: 3′-end-biased coverage caused by membrane-associated RNase E.

The comparison revealed what the authors call “progressive layers of regulation”: promoter strength, mRNA degradation, mRNA maturation via RNase III, and 3′-end-specific degradation via RNase E. Each system added one or more layers, creating a gradient of biological complexity that allowed the team to dissect each process individually.

The broader significance

The study demonstrates that cell-free systems, long considered unsuitable for transcriptomics, can now profile entire bacterial transcriptomes. “Cell-free transcriptomics could enable the exploration of transcriptional landscapes of non-cultivable bacteria,” the researchers note, organisms that remain poorly characterized simply because their RNA is inaccessible under laboratory conditions.

The active-learning pipeline itself is generalizable beyond buffer optimization. Any multi-parameter biological optimization problem, media formulation, protein purification conditions, metabolic engineering, could benefit from the same approach of probing 0.04% of a combinatorial space to find near-optimal conditions.

Limitations remain. The study was performed exclusively in E. coli BL21(DE3) lysate with T7 RNA polymerase. The optimized buffer has not been validated for endogenous E. coli RNA polymerase or for other organisms. The cell-free system captures transcription and degradation but not the 3′-end-specific degradation mediated by membrane-associated RNase E, which is lost during lysate preparation. And the paper is published as an advance version that has not yet undergone editorial refinement.

Still, the work marks a turning point for cell-free biology. By adding transcriptomics to the capabilities of cell-free systems, it opens a door to studying gene regulation in organisms that cannot be cultured, prototyping synthetic circuits at the RNA level, and accelerating the design-build-test cycle in synthetic biology.

The paper, “Active-learning-guided optimization of cell-free systems for genome-wide transcriptomic profiling reveals progressive layers of regulation,” is published in Nature Communications (DOI: 10.1038/s41467-026-74559-y) by Lea Wagner, An Hoang, Olivier Rue, and colleagues at INRAE, Universite Paris-Saclay, and AgroParisTech.