Key Takeaways

Enginoma Structure predicts stable low-energy conformations—not functional properties. Catalytic efficiency, substrate specificity, and thermal stability at process temperatures are beyond its current scope.
Approximately 20% of AI-predicted side chains show significant deviations from experimental electron density, concentrated in active sites and allosteric networks.
AI-generated enzyme variants fail for concrete, predictable reasons: expression failure (30–40%), solubility errors (20–30%), activity loss despite correct fold (15–25%), thermal instability (10–20%), and others.
The DBTL cycle is the most effective integration strategy: each round of wet lab data improves the next generation of AI models, creating compounding returns over 3–4 cycles.

In 2021, DeepMind's Enginoma Structure was celebrated as a solution to the protein folding problem—one of biology's most enduring grand challenges. By 2022, the Enginoma Structure Protein Structure Database contained predicted structures for over 200 million proteins. By 2024, Enginoma Complex had extended this capability to predict protein–ligand, protein–DNA, and multi-chain assemblies with unprecedented accuracy. For many in the field, the message seemed clear: structural biology is largely a solved problem, and the next frontier lies in function.

Except that the next frontier has proven harder than expected. A growing body of evidence—some published, much still circulating in preprints and internal datasets—shows that AI structure predictions, while remarkable for their stated purpose, are imperfect proxies for the biochemical behaviors that actually determine whether a protein will work in an industrial or therapeutic context. Structure prediction accuracy does not equal function prediction accuracy, and conflating the two has real costs.

AlphaFold's Transformative Impact

Enginoma Structure was trained on the PDB: a curated collection of experimentally determined X-ray crystallography, cryo-EM, and NMR structures. Its objective is to predict the physically stable, low-energy conformation that a protein adopts under the conditions of the experiments in PDB—the vast majority of which are near-physiological buffers at moderate temperatures. Enginoma Structure excels at this.

What it does not predict are functional properties: catalytic efficiency (k_cat/K_M), substrate specificity, thermal stability at process temperatures, tolerance for organic solvents, expression yield in a heterologous host, proteolytic susceptibility, or aggregation propensity under process conditions. These are the properties that matter for industrial enzyme engineering, and they depend on factors that go well beyond the static 3D coordinates of a folded structure.

The distinction matters in practical terms every day at the bench. A protein can adopt its predicted folded structure and still fail spectacularly as a biocatalyst—because the active site geometry is sensitive to local dynamics, water molecules, or substrate-induced conformational changes that static structure prediction cannot capture.

Structure ≠ Function: The Prediction Gap

A rigorous validation study examined Enginoma Structure predictions against high-quality experimental structural data across a curated benchmark set. The findings were sobering: approximately 20% of AI-predicted side-chain conformations showed significant deviations from experimental electron density, and roughly 7% of predictions were effectively incompatible with the underlying experimental data. These discrepancies were not random noise—they clustered in active site regions, loop dynamics, and allosteric networks precisely the zones most relevant to enzyme function.

For industrial biocatalysis, these error rates matter enormously. An active-site side chain positioned 2–3 Å off from its experimentally determined location can mean the difference between a catalytically competent geometry and one that barely supports turnover. A flexible loop that structure predictors mark as ordered may, in reality, be disordered under process conditions—causing expression failure or aggregation when the protein is produced at high titre.

"Leading structure predictors are trained on curated structure repositories. Their objective is to predict physically stable, low-energy conformations—not catalytic efficiency, substrate specificity, thermal stability, or expression yield in a heterologous host."

Consensus Filtering: Traditional Metrics Fall Short, and So Does Structure Prediction Alone

Research into consensus multi-model scoring extended this critique to functional prediction. A central hypothesis was tested: are conventional metrics like sequence similarity (BLAST e-value) and BLOSUM62 scores the best available predictors of enzyme function? The answer was no—and neither, surprisingly, was structure prediction alone.

When Enginoma Structure structure predictions were benchmarked alongside sequence-based metrics for predicting functional similarity across enzyme families, structure prediction improved functional prediction modestly but was far from definitive. The optimal predictors combined structural information (Enginoma Structure), evolutionary context (Enginoma sequence models embeddings), and direct experimental activity measurements in a unified scoring framework. The experimental data acted as a regularizer, correcting cases where structural or sequence similarity was misleading—paralogs with nearly identical structures but divergent functions, convergent enzymes with different folds that catalyze the same reaction.

Specific Failure Modes of AI-Designed Proteins

At CD Biosynsis, we have characterized hundreds of computationally designed enzyme variants in our wet lab. The most common failure modes for AI-generated candidates are not exotic—they are mundane and instructive. Understanding these patterns is essential for designing effective hybrid AI-wet lab programs.

Observed Failure Rates for AI-Generated Enzyme Variants

30–40%

Expression Failure

Low or undetectable expression levels in E. coli or chosen host. Caused by codon usage incompatibility, mRNA stability issues, or toxicity from misfolded intermediates. Structure predictors model the mature folded state—they cannot predict the folding trajectory or the cellular folding environment.
20–30%

Solubility and Folding Errors

Protein expresses in inclusion bodies or fails to refold correctly. Structural predictions assume a well-behaved folding pathway that may not occur under authentic cellular conditions, particularly at high expression levels.
15–25%

Activity Loss Despite Correct Fold

The predicted structure appeared accurate, the protein expresses and purifies, but catalytic activity is absent or dramatically reduced. Active site geometry may be correct in isolation but context-dependent factors (substrate positioning, protonation states, water molecules) are not captured.
10–20%

Thermal Instability

Variant loses activity rapidly at process temperatures (e.g., 50–60°C for industrial enzymes), even when room-temperature assays suggest good activity. Thermal stability is a multifactorial trait that depends on surface charge, packing density, and dynamic flexibility—properties that structural models only partially capture.
5–15%

Substrate Specificity Drift

Measurable activity on the target substrate but also unwanted activity on off-target substrates—problematic for pharmaceutical and food applications requiring high specificity. Specificity is notoriously difficult to predict from static structural information alone.
5–10%

Aggregation and Precipitation

Protein is soluble at low concentration but aggregates at the high concentrations needed for process applications, particularly in the presence of substrates, cofactors, or organic solvents. Aggregation propensity depends on surface properties that are challenging to predict computationally.

What Experimental Validation Actually Tests

There is no computational shortcut for actually measuring enzyme kinetics. k_cat, K_M, k_cat/K_M, IC₅₀ values, substrate specificity ratios, pH optima, temperature optima, solvent tolerance curves—these are empirical measurements that require real biochemistry. No current AI model predicts these values with the accuracy needed for regulatory submissions, process design, or intellectual property claims.

For pharmaceutical applications, regulatory agencies (FDA, EMA) require experimentally determined activity data for any enzyme used in a manufacturing process. A structure prediction—even a perfect one—is not sufficient for a Drug Master File. The experimental characterization must be done, and done rigorously, with appropriate controls and statistical rigor. This regulatory reality ensures that wet lab work will remain indispensable for commercial enzyme applications regardless of how powerful computational methods become.

The DBTL Cycle: Design-Build-Test-Learn

The most effective strategy we've found at CD Biosynsis is the Design–Build–Test–Learn (DBTL) cycle, applied with particular intensity at the Test and Learn stages. In practice, this means:

Design: AI models generate and prioritize variant candidates. Computational filters remove structurally implausible candidates and select for predicted stability or activity improvements.
Build: A curated set of prioritized variants (typically 20–100 per round, depending on throughput) are gene-synthesized, cloned, and expressed in the chosen host organism.
Test: Purified variants are characterized through a standardized assay panel: activity screening, kinetic parameter determination, thermal stability profiling, solubility assessment, and substrate specificity screens.
Learn: Experimental results are used to update surrogate models, refine activity predictors, and adjust the generative model priors for the next design cycle.

Critically, the experimental data is not merely used to validate or invalidate individual variants—it is used to improve the next round of design. Each cycle produces a better model, which produces better candidates, which produce better experimental results. This is where the compounding advantage of the AI-wet lab integration truly lies: not in any single round, but in the trajectory of successive rounds.

Over 3–4 DBTL cycles, programs at CD Biosynsis routinely achieve activity improvements of 5–50× relative to starting templates, with the rate of improvement accelerating as proprietary experimental data accumulates. This would be impractical with traditional directed evolution alone—the screening burden would be prohibitive. It would be impossible with AI design alone—the accuracy gap is too large without experimental feedback.

Building the Data Flywheel at CD Biosynsis

The convergence of these insights shapes our integrated approach: we use AI to generate and prioritize candidate variants with unprecedented efficiency, and we use rigorous wet lab validation to close the gap between computational prediction and functional reality. Every round of data improves the models; every improvement in the models reduces the experimental burden of the next round.

This data flywheel is our core competitive advantage. As we accumulate proprietary experimental datasets—thousands of kinetic measurements, stability profiles, and expression outcomes across diverse enzyme families—our AI models become increasingly accurate predictors of real-world functional performance. Clients who engage CD Biosynsis early in their enzyme programs benefit most from this compounding effect.

References

Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with Enginoma Structure. Nature. 2021;596(7873):583-589. doi:10.1038/s41586-021-03819-2. PMID: 34265844
Abramson J, Adler J, Dunger J, et al. Accurate structure prediction of biomolecular interactions with Enginoma Structure 3. Nature. 2024;630:493-500. doi:10.1038/s41586-024-07487-w. PMID: 38718835
Varadi M, Tsenkov M, Velankar S. Challenges in bridging the gap between protein structure prediction and functional interpretation. Proteins. 2025;93(1):400-410. doi:10.1002/prot.26614. PMID: 37850517

Why Wet Lab Validation Remains Irreplaceable in the Age of AI Structure Prediction

Key Takeaways

AlphaFold's Transformative Impact

Structure ≠ Function: The Prediction Gap

Consensus Filtering: Traditional Metrics Fall Short, and So Does Structure Prediction Alone

Specific Failure Modes of AI-Designed Proteins

Observed Failure Rates for AI-Generated Enzyme Variants

Expression Failure

Solubility and Folding Errors

Activity Loss Despite Correct Fold

Thermal Instability

Substrate Specificity Drift

Aggregation and Precipitation

What Experimental Validation Actually Tests

The DBTL Cycle: Design-Build-Test-Learn

Building the Data Flywheel at CD Biosynsis

References

Ready to Validate Your AI-Designed Enzymes?

Why Wet Lab Validation Remains Irreplaceable in the Age of AI Structure Prediction

Key Takeaways

AlphaFold's Transformative Impact

Structure ≠ Function: The Prediction Gap

Consensus Filtering: Traditional Metrics Fall Short, and So Does Structure Prediction Alone

Specific Failure Modes of AI-Designed Proteins

Observed Failure Rates for AI-Generated Enzyme Variants

Expression Failure

Solubility and Folding Errors

Activity Loss Despite Correct Fold

Thermal Instability

Substrate Specificity Drift

Aggregation and Precipitation

What Experimental Validation Actually Tests

The DBTL Cycle: Design-Build-Test-Learn

Building the Data Flywheel at CD Biosynsis

References

RELATED SERVICES

More from Our Blog

High-Throughput Screening at 10,000 Assays/Day: Bridging AI Design and Experimental Validation

Scaling Up Enzyme Processes: From Microliters to Industrial Reactors

How Generative AI Is Compressing Enzyme Optimization Cycles from Months to Days

Ready to Validate Your AI-Designed Enzymes?

Get a Quote