Glossary of Key Terms
Your comprehensive reference for the terminology shaping AI-powered synthetic biology—from AI structure prediction to CRISPR, directed evolution to high-throughput screening. Explore 64 essential terms across AI/ML, synthetic biology, and wet lab science.
AI / Machine Learning
20+ terms covering deep learning architectures, protein structure prediction, generative models, and transfer learning techniques.
Synthetic Biology
22+ terms spanning genome engineering, metabolic pathways, directed evolution, chassis organisms, and bioprocess design.
Wet Lab Techniques
22+ terms covering chromatography, spectroscopy, fermentation, molecular biology methods, and analytical characterization.
DeepMind's AI system that predicts 3D protein structures from amino acid sequences with near-experimental accuracy, revolutionizing structural biology and protein engineering workflows.
A small, replication-deficient virus used as a gene delivery vector in therapeutic applications. AAV's non-pathogenic nature and broad tissue tropism make it ideal for in vivo gene therapy.
A neural network component that dynamically weighs the importance of different input elements, enabling models to capture long-range dependencies critical for protein sequence and structure analysis.
A technique that separates DNA, RNA, or protein fragments by size using an agarose gel matrix and an electric field. Essential for verifying PCR products, restriction digests, and nucleic acid purity.
A neural network that learns compressed representations of data by encoding inputs to a lower-dimensional latent space and decoding them back. Used in variant effect prediction and dimensionality reduction for omics data.
The enzymatic process where amino acids are loaded onto transfer RNAs (tRNAs) as the first step of translation. Understanding this mechanism is critical for engineered amino acid incorporation in non-standard proteins.
The algorithm that computes gradients in neural networks by propagating error signals backward through layers. Foundation for training deep learning models used in protein property prediction and sequence analysis.
A physically clustered group of genes in a genome that collectively encode the machinery for synthesizing a specialized metabolite. Mining BGCs from microbial genomes is central to natural product discovery.
A detection method using light-producing enzymatic reactions (e.g., firefly luciferase) to quantify analytes such as ATP, gene expression, or enzyme activity with exceptional sensitivity and low background.
A screening technique that links a protein of interest to a phage coat protein, enabling rapid selection of binders from large libraries. Widely used for antibody discovery and protein-protein interaction mapping.
Standardized DNA parts with defined restriction sites that allow modular assembly of genetic circuits. The BioBrick standard enables interchangeable, composable genetic components for synthetic biology applications.
A technique that normalizes layer inputs within each training batch to stabilize and accelerate deep neural network training. Commonly used in transformer and convolutional architectures for protein data analysis.
A genome editing system using a guide RNA to direct the Cas9 nuclease to specific genomic loci for targeted double-strand breaks. It enables precise gene knockout, knock-in, and base editing across organisms.
The practice of redesigning gene sequences to use preferred codons for the expression host without altering the encoded amino acid sequence. Improves translation efficiency and protein yield in heterologous expression systems.
A host microbial strain (e.g., E. coli, yeast, Bacillus) engineered to serve as a platform for synthetic genetic circuits and metabolic pathways. The chassis provides the cellular machinery for gene expression and metabolite production.
A family of laboratory techniques for separating mixtures based on differential affinity between a stationary phase and a mobile phase. Used to purify proteins, nucleic acids, and small molecules including fermentation products.
A protein engineering strategy that connects the N-terminus of a protein to its C-terminus via a peptide linker and creates a new N-terminus at an internal position. Used to alter enzyme properties and create novel fusion proteins.
A self-supervised learning approach that trains representations by contrasting similar and dissimilar data pairs. Applied to protein embeddings for learning functional and structural relationships from unlabelled sequence data.
A protein engineering method that mimics natural selection in the laboratory by iteratively generating genetic diversity and screening for improved variants. Mimics natural evolution to evolve proteins with enhanced function, stability, or specificity.
A generative model that learns to create data by reversing a gradual noising process. Used in protein backbone generation within Enginoma generative design pipelines and small molecule design by iteratively denoising random structures into functional outputs.
A recombination technique that fragments related genes and reassembles them via PCR, creating chimeric genes with traits from multiple parent sequences. A cornerstone of directed evolution for enzyme optimization.
A separation technique using a semipermeable membrane to remove small molecules (salts, buffer components) from protein solutions by diffusion. Critical for buffer exchange and protein sample preparation prior to downstream assays.
A training technique that randomly sets a fraction of neural network activations to zero during each training iteration, preventing overfitting and improving generalization in deep protein property prediction models.
A protein engineering approach that inserts a functional protein domain into a host protein scaffold at a permissive site. Used to create bifunctional enzymes, biosensors, and signaling fusion proteins with novel regulatory properties.
The systematic design and modification of enzyme structure to alter catalytic properties, substrate specificity, thermostability, or enantioselectivity. Combines rational design with directed evolution and AI-driven structure prediction.
A quantitative immunoassay that uses enzyme-linked antibodies to detect and measure antigens or antibodies in samples. Widely used for protein quantification, biomarker detection, and antibody titer determination in bioprocess development.
The addition of a short peptide sequence (e.g., FLAG, HA, Myc) to a protein of interest to enable detection and purification using antibodies specific to that tag. Facilitates protein tracking, co-immunoprecipitation, and western blot analysis.
A numerical representation of biological sequences, molecules, or structures as dense vectors in high-dimensional space where similar entities cluster together. Protein language models produce embeddings capturing evolutionary and functional information.
A quantitative measurement of enzyme catalytic function, typically monitoring substrate consumption or product formation over time. Kinetic parameters (Km, Vmax, kcat) derived from these assays characterize enzyme efficiency and selectivity.
A mutagenesis technique that introduces random point mutations during PCR amplification by using low-fidelity DNA polymerases or biased nucleotide mixtures. Creates library diversity for directed evolution screening campaigns.
The microbial cultivation process for producing enzymes, metabolites, or recombinant proteins at scale. Encompasses shake flask cultures through industrial bioreactors, with optimization of pH, temperature, aeration, and feeding strategies.
A laser-based technique that analyzes physical and chemical characteristics of cells or particles in suspension as they pass through a laser beam. Enables high-throughput single-cell phenotyping, viability counting, and fluorescence-activated cell sorting (FACS).
An automated chromatography system for purifying proteins and biomolecules with high resolution and reproducibility. Supports ion exchange, size exclusion, affinity, and hydrophobic interaction chromatography modes for protein polishing.
Performance calibration: further adapting an Enginoma industrial baseline model on domain-specific datasets. In protein AI, calibration adapts large language models to predict enzyme commission numbers, GO terms, or thermostability.
A prosthetic group derived from riboflavin (vitamin B2) that acts as an enzyme cofactor in redox reactions. FMN-dependent enzymes are prevalent in metabolic engineering for producing fuels, pharmaceuticals, and commodity chemicals.
An analytical technique measuring light emission from fluorophores after excitation. Used to study protein folding (intrinsic tryptophan fluorescence), ligand binding, and enzyme kinetics with high sensitivity and specificity.
A deep learning architecture designed for graph-structured data where nodes represent entities (atoms, amino acids) and edges represent bonds or interactions. GNNs excel at molecular property prediction and protein contact map analysis.
The process of isolating and replicating a specific DNA fragment using restriction enzymes and plasmid vectors in a host organism. The foundational technique enabling recombinant DNA technology and heterologous protein expression.
A comprehensive computational reconstruction of all known metabolic reactions in an organism, represented as a stoichiometric matrix. GEMs enable in silico flux analysis and prediction of metabolic engineering targets for strain optimization.
Also called size exclusion chromatography (SEC), this method separates molecules by size using porous bead columns. The gold standard for protein molecular weight determination, aggregate analysis, and buffer exchange in protein purification.
A modular, Type IIS restriction enzyme-based DNA assembly method that enables seamless, scarless joining of multiple DNA fragments in a single reaction. The standard for assembling genetic circuits and pathway constructs from standardized parts.
The stable insertion of exogenous DNA into the chromosomal genome of a host organism rather than maintaining it on a plasmid. Genomic integration provides inheritance stability and avoids metabolic burden of plasmid maintenance in engineered strains.
An analytical and preparative technique that separates, identifies, and quantifies chemical components in mixtures. HPLC variants (RP, HILIC, ion-exchange) are essential for purity analysis of small molecules, peptides, and protein digests.
An automated, parallel assay approach that screens thousands to millions of samples per day for biological activity. Microtiter plates, robotics, and sensitive detectors enable rapid identification of hit compounds from large combinatorial libraries.
The expression of a gene or pathway from one organism in a different host organism. E. coli, yeast, and Bacillus are common hosts for producing recombinant enzymes, therapeutic proteins, and pathway metabolites from heterologous sources.
An automated microscopy technique that combines cell-based assays with image analysis to extract multiple phenotypic parameters from each cell. Used in drug discovery and functional genomics to profile the effects of genetic or chemical perturbations.
The process of optimizing architectural and training parameters (learning rate, batch size, layer dimensions) that control the learning process of machine learning models. Bayesian optimization and neural architecture search are common automated approaches.
A protein purification method exploiting differences in surface hydrophobicity, typically performed in high-salt conditions where hydrophobic proteins bind to phenyl or butyl ligands. Ideal for capturing enzymes and antibodies from crude lysates.
An antibody-based technique for isolating a specific antigen (protein or nucleic acid) from complex mixtures. IP is used to validate protein-protein interactions, identify post-translational modifications, and pull down chromatin-associated DNA.
A physical technique measuring the heat released or absorbed during a biomolecular binding event. ITC provides direct, label-free determination of binding affinity (Kd), stoichiometry, enthalpy, and entropy without spectral interference.
Computational evaluation of virtual compound libraries or protein variants to predict activity, binding affinity, or stability before experimental testing. AI-powered screening dramatically reduces the experimental burden in drug and enzyme discovery programs.
Non-coding DNA sequences removed from primary RNA transcripts during RNA processing. Engineered intron splicing enables synthetic gene regulation circuits and can be used to expand the coding capacity of genetic systems in eukaryotes.
A computer vision task that segments individual objects within an image, distinguishing between overlapping instances of the same class. Applied to microscopy image analysis for automated cell counting and subcellular compartment profiling.
A purification method that separates molecules based on charge using positively (cation exchange) or negatively (anion exchange) charged resin beads. Essential for protein and nucleotide purification, often as a first capture step in multi-step protocols.
A deep neural network trained on massive text corpora with billions of parameters, capable of understanding and generating human language. Protein LLMs (ESM, ProtTrans) apply transformer architectures to evolutionary sequence data for functional prediction.
The creation of collections of genetic variants (random mutants, designed mutants, or gene fragments) for functional screening or selection. Strategies include error-prone PCR, DNA shuffling, saturation mutagenesis, and AI-guided library design.
An analytical platform combining HPLC separation with mass spectrometric detection for identifying and quantifying proteins, peptides, and small molecules. LC-MS/MS enables proteomics, metabolomics, and characterization of biotherapeutics.
A mathematical function that measures the discrepancy between a model's predictions and the true target values. The loss drives gradient-based optimization during training — common choices include cross-entropy for classification and MSE for regression tasks.
A nutritionally rich bacterial growth medium used for cultivating E. coli and other microorganisms in the laboratory. Standard for plasmid amplification, protein expression pre-cultures, and routine microbiology in synthetic biology workflows.
A compressed, lower-dimensional representation of data learned by autoencoders or generative models. In protein design, navigating the latent space enables interpolation between protein families and generation of novel functional sequences.
The rational modification of cellular metabolism to optimize the production of a desired metabolite. Involves rewiring biosynthetic pathways, eliminating competing reactions, and balancing gene expression levels to maximize product titers.
An analytical technique that measures the mass-to-charge ratio of ions to identify and quantify molecules with extreme sensitivity. Used for protein identification, post-translational modification mapping, and small molecule structure elucidation.
The comprehensive, quantitative analysis of all metabolites within a biological system at a given moment. LC-MS and NMR-based metabolomics provide snapshots of metabolic state that guide strain engineering and bioprocess optimization.
A class of feedforward artificial neural network with multiple layers of interconnected nodes. MLPs serve as the fundamental building block for many deep learning architectures applied to omics data analysis and property prediction.
The production of integral membrane proteins (receptors, transporters, channels) in recombinant systems. Requires specialized expression vectors, host strains, and detergent-based purification strategies due to the hydrophobic nature of transmembrane domains.
A technique measuring the movement of molecules in a temperature gradient to determine binding affinity and stoichiometry. Requires minimal sample and is applicable to proteins, nucleic acids, and small molecules in solution.
A message passing neural network that generates protein sequences conditioned on a given backbone structure, Structure-guided sequence design networks enable near-native sequence recovery and customizable diversity, complementing Enginoma Structure in design-build-test-learn cycles.
A transformer-based neural network trained on massive protein sequence databases to learn evolutionary patterns, structural relationships, and functional annotations. PLMs produce sequence embeddings that transfer to downstream prediction tasks without labels.
Adapting an Enginoma baseline protein language model to specific tasks such as thermostability prediction, enzyme commission number classification, or remote homology detection by training on task-specific labelled datasets.
The modification of gene regulatory sequences (promoters) to tune transcription levels in synthetic gene circuits. Libraries of promoter variants with different strengths enable precise balancing of metabolic pathway fluxes and dynamic control strategies.
AlphaFold's per-residue confidence score ranging from 0 to 100, indicating the reliability of each amino acid's coordinates in a predicted structure. High pLDDT regions (>90) correspond to well-ordered domains suitable for functional interpretation.
An in vitro enzymatic reaction that exponentially amplifies specific DNA sequences using primers and a thermostable DNA polymerase. Variants include RT-PCR (RNA), qPCR (quantitative), and multiplex PCR (multiple targets) for diverse applications.
A generative deep learning method that designs novel protein backbones by denoising random starting structures conditioned on desired structural motifs or functional constraints. Enables structure-based protein design de novo, complementing Enginoma Structure prediction.
A neural architecture with cyclic connections that processes sequential data by maintaining hidden state across time steps. RNNs and their variants (LSTM, GRU) are used for time-series omics data analysis and sequence-to-sequence prediction tasks.
A knowledge-driven approach to enzyme engineering that uses 3D structural information and mechanistic understanding to identify target mutations. Complements directed evolution by focusing mutations on residues predicted to impact function based on computational analysis.
A quantitative version of PCR that monitors amplification in real time using fluorescent dyes or probes. Enables absolute quantification of nucleic acid copy number, gene expression analysis via RT-qPCR, and detection of genetic constructs in cloning workflows.
A statistical and machine learning method for predicting continuous target variables from input features. Applied to enzyme engineering for predicting kcat, Km, and thermostability values from sequence and structural descriptors.
The design and modification of catalytic RNA molecules that perform site-specific cleavage or ligation reactions. Engineered ribozymes enable programmable gene regulation, RNA editing, and synthetic riboswitches for metabolic control.
Sodium dodecyl sulfate polyacrylamide gel electrophoresis separates proteins by molecular weight using a denaturing detergent (SDS) that imparts uniform negative charge. Standard technique for checking protein purity, verifying expression, and estimating molecular weight.
The systematic improvement of a microbial host strain's properties — including productivity, robustness, and substrate utilization — through genetic modifications such as gene knockouts, overexpression, and regulatory circuit redesign.
An optical technique measuring changes in refractive index near a gold sensor surface upon biomolecular binding. SPR provides label-free, real-time kinetic analysis (association/dissociation rates, affinity constants) for protein-ligand and protein-protein interactions.
A genetic construct that causes cell death when activated, used as a selection mechanism in directed evolution or as a safety kill-switch in engineered organisms. Common examples include toxin-antitoxin systems and conditional caspase-based modules.
A machine learning paradigm where models are trained on labelled datasets containing input-output pairs to learn mapping functions. In synthetic biology, supervised learning predicts enzyme kinetics, variant activity, and cell growth from genomic features.
An engineered transcriptional regulatory sequence designed de novo or by modifying natural promoter elements to achieve specific expression characteristics (strength, inducibility, tissue specificity) not found in native promoters.
A machine learning strategy where knowledge gained from training on one task is applied to improve performance on a related but different task. Transfer learning from large protein sequence models to specific enzyme engineering tasks reduces data requirements dramatically.
The process of introducing foreign plasmid or linear DNA into competent bacterial cells. Chemical transformation, electroporation, and biolistic methods are used to establish recombinant strains for cloning, expression, and pathway engineering.
The concentration of a desired product (enzyme, metabolite, recombinant protein) produced per unit volume of fermentation culture, typically expressed as g/L, mg/L, or U/mL. Titer is the primary metric for evaluating bioprocess performance and strain efficiency.
A neural network design using self-attention mechanisms to process sequential data without recurrent connections. The foundation of modern protein language models and generative design systems used in Enginoma industrial protein engineering workflows.
A simple, rapid analytical technique that separates compounds on a thin silica or alumina-coated plate as a mobile phase ascends by capillary action. Used for monitoring reaction progress, assessing compound purity, and guiding fraction collection during chromatography.
High-throughput sequencing of cellular RNA to profile gene expression levels across the entire transcriptome. RNA-Seq data reveals transcriptional responses to genetic modifications and guides metabolic engineering target identification.
An analytical technique that separates proteins by SDS-PAGE, transfers them to a membrane, and detects specific proteins using antibodies. Used to verify recombinant protein expression, assess post-translational modifications, and confirm protein-protein interactions.
The use of intact microbial cells as biocatalysts for chemical transformations, exploiting the cell's native enzyme complement and cofactor regeneration systems. Offers cost advantages over purified enzymes for industrial-scale biotransformations.
The integration of robotic systems and software platforms to automate repetitive laboratory tasks including pipetting, plate reading, and data processing. Workflow automation accelerates high-throughput screening and reduces variability in directed evolution campaigns.
Comprehensive sequencing of an organism's complete genome, enabling identification of all genetic variations including single nucleotide polymorphisms, insertions, deletions, and structural variants that may affect engineered strain performance.
A machine learning paradigm using incomplete, noisy, or imprecise labels to train predictive models. In protein engineering, weakly supervised approaches leverage high-throughput assay data with bulk phenotypes to train models without single-variant labels.
An analytical technique measuring light absorption in the ultraviolet and visible wavelength ranges to quantify nucleic acid and protein concentrations, monitor reaction kinetics, and determine ligand binding by spectral shifts or changes in absorbance.
Need Help Navigating the Terminology?
Our AI-driven synthetic biology team is ready to translate these concepts into results for your project. Get a personalized consultation today.