Automated NMR Gets its High-Throughput Moment with Sub $25 Analysis
A new pipeline marries de novo protein design with robotic sample prep and NMR spectroscopy — producing atomic-resolution data at a pace that was unthinkable just a few years ago.
AlphaFold and other protein and drug discovery tools are predicting thousands of potential proteins each day with high probability of them being the next big cure. However, keeping aside the digital accuracy rates, real world understanding about how a protein moves — the flickering conformational states, the millisecond-scale flexibility, the subtle disorder that often defines biological function — remains stubbornly hard to study at scale. NMR spectroscopy is the gold-standard tool for that job. The catch? It’s slow, expensive, and typically limited to one carefully chosen protein at a time and a dedicated researcher .
Researchers at the University of Basel and ETH Zürich describes a platform called NMR-APP (NMR-Automated Protein Production) that sets out to fix exactly that bottleneck — and the numbers are striking.
The Pipeline at a Glance
Before diving into the mechanics, consider what this platform actually costs: potentially under $25 per protein confirmation, with the dominant expense being the synthetic DNA. That price tag buys you not just a folding confirmation, but atomic-resolution data on both structure and backbone dynamics — information that previously required months of dedicated NMR time and reagent costs that could run into the tens of thousands of dollars per target. At that price point, running hundreds of proteins is no longer a heroic resource commitment; it’s a routine experimental design choice.
The approach combines three technologies that have each matured independently, but have never been systematically integrated at this scale:

What They Found
Of 379 expressed proteins, 239 (62%) produced high-quality 2D NMR fingerprints suitable for detailed analysis. Another 79 gave detectable but low-quality spectra, while 61 produced no interpretable signal — likely due to aggregation or multimerization.
The success rate was strongly tied to concentration: samples above 10 µM achieved an 87% pass rate. The authors note that simply doubling expression volumes in future runs would deliver a roughly four-fold improvement in acquisition speed, pushing throughput close to 1,000 spectra per week on a single instrument.
Beyond yield statistics, the large dataset revealed something biologically interesting. Unsupervised clustering of the 239 high-quality spectra using Chamfer distance — measuring pairwise similarity between 2D peak lists — produced a dendrogram that cleanly separated predominantly helical folds from mixed α/β architectures. No sequence labels, no structure annotations needed: the NMR fingerprint alone encodes secondary structure class.
The Dynamics Gap — and Why It Matters
For nine proteins selected for deeper characterization, the team recorded full backbone relaxation experiments (¹⁵N R₁, R₂, and {¹H}–¹⁵N NOE) to map residue-level motion — on the very same samples that came off the high-throughput production line, at concentrations of 10–50 µM. This is worth pausing on: dynamics data of this quality, on proteins that cost less than a lunch to produce, represents a genuine step-change in what a single lab can realistically characterize. The results were notable for what they didn’t find: essentially no correlation (R² < 0.12 across all comparisons) between computational flexibility predictions — pLDDT, RMSF from structural ensembles — and experimental relaxation parameters.
Put more plainly: the generative models that designed these proteins had no reliable idea which loops would be mobile and which would be rigid. Several proteins showed clear local dynamics in regions that had been designed as static. Some of these dynamic signatures were concentrated at loop segments; others cropped up within secondary structure elements.
This finding points to a real gap in the current generation of protein design tools. These models are trained on static crystal structures and inherit biases toward rigid, well-packed folds. The physical energy landscape of a designed protein — including frustrated states, local minima, and partially disordered segments that have been documented in previous work on de novo proteins — is largely invisible to them.
The authors are direct about what this means: generating large, standardized NMR datasets is precisely what is needed to eventually train models capable of predicting dynamics from sequence without relying on evolutionary information.
Structure Validation Holds Up
For one representative protein (p2 A4), the team went all the way to a full NMR solution structure, recording NOESY spectra and calculating an ensemble with CYANA. The backbone RMSD between the NMR structure and the designed model was 1.3 Å — excellent agreement. The structure also revealed a minor second conformational state, traced to cis/trans proline isomerization at residue 82. Minor states were detected in several other proteins at populations as low as 5%, demonstrating the platform’s sensitivity to low-abundance conformers.
Secondary chemical shift analysis confirmed correct folding for all nine characterized proteins.
What Comes Next
The authors outline several obvious extensions. Side-chain-specific methyl labeling on deuterated backgrounds would push the approach toward larger proteins and multiprotein complexes. The pipeline is not inherently limited to designed sequences — natural proteins and complexes can be processed with appropriate labeling strategies.
The remaining bottleneck is sequence-specific resonance assignment, which currently still requires additional spectroscopy and semi-manual curation. The team anticipates that accumulating larger NMR datasets will eventually make machine learning-based automated assignment tractable, potentially allowing full assignment to be inferred from the 2D HMQC fingerprint alone — given that the protein structure is known by design.
Bottom Line
NMR-APP is a compelling demonstration that high-throughput structural biology isn’t just an X-ray crystallography or cryo-EM story. By combining the manufacturing advantages of de novo designed proteins with robotics and automated NMR, the Basel/ETH team has compressed what previously took years into a matter of weeks — and exposed a clear blind spot in current generative models along the way.
The cost angle is hard to overstate. For under $25 per protein — less than most lab consumable line items — researchers can now obtain atomic-resolution data on both fold and backbone flexibility. That’s a price point at which studying protein dynamics statistically, across large designed libraries rather than one carefully chosen target at a time, stops being a dream and becomes an experimental routine. For researchers building pipelines around protein dynamics, multi-state behavior, or intrinsic disorder, this is a platform worth watching closely. If the cost of synthetic DNA comes down, the overall costs can substantially come down.
Müntener et al., “Large-scale exploration of protein space by automated NMR,” bioRxiv preprint, February 2026. DOI: 10.64898/2026.02.16.706194
