Deep learning allows genome-scale prediction of Michaelis constants from structural features

Alexander Kroll; Martin K. M. Engqvist; David Heckmann; Martin J. Lercher

doi:10.1371/journal.pbio.3001402

Deep learning allows genome-scale prediction of Michaelis constants from structural features

PLoS Biology ◽

10.1371/journal.pbio.3001402 ◽

2021 ◽

Vol 19 (10) ◽

pp. e3001402

Author(s):

Alexander Kroll ◽

Martin K. M. Engqvist ◽

David Heckmann ◽

Martin J. Lercher

Keyword(s):

Deep Learning ◽

Structural Features ◽

Model Organisms ◽

Specific Substrate ◽

Numerical Representation ◽

Molecular Fingerprint ◽

Cellular Physiology ◽

Enzyme Substrate ◽

Natural Enzyme ◽

Genome Scale

The Michaelis constant KM describes the affinity of an enzyme for a specific substrate and is a central parameter in studies of enzyme kinetics and cellular physiology. As measurements of KM are often difficult and time-consuming, experimental estimates exist for only a minority of enzyme–substrate combinations even in model organisms. Here, we build and train an organism-independent model that successfully predicts KM values for natural enzyme–substrate combinations using machine and deep learning methods. Predictions are based on a task-specific molecular fingerprint of the substrate, generated using a graph neural network, and on a deep numerical representation of the enzyme’s amino acid sequence. We provide genome-scale KM predictions for 47 model organisms, which can be used to approximately relate metabolite concentrations to cellular physiology and to aid in the parameterization of kinetic models of cellular metabolism.

Prediction of Michaelis constants from structural features using deep learning

10.1101/2020.12.01.405928 ◽

2020 ◽

Author(s):

Alexander Kroll ◽

David Heckmann ◽

Martin J. Lercher

Keyword(s):

Deep Learning ◽

Structural Features ◽

Model Organisms ◽

Specific Substrate ◽

Molecular Fingerprint ◽

Cellular Physiology ◽

Enzyme Substrate ◽

Enzyme Model ◽

Natural Enzyme ◽

Michaelis Constants

ABSTRACTThe Michaelis constant KM describes the affinity of an enzyme for a specific substrate, and is a central parameter in studies of enzyme kinetics and cellular physiology. As measurements of KM are often difficult and time-consuming, experimental estimates exist for only a minority of enzyme-substrate combinations even in model organisms. Here, we build and train an organism-independent model that successfully predicts KM values for natural enzyme-substrate combinations using machine and deep learning methods. Predictions are based on a task-specific molecular fingerprint of the substrate, generated using a graph neural network, and the domain structure of the enzyme. Model predictions can be used to estimate enzyme efficiencies, to relate metabolite concentrations to cellular physiology, and to fill gaps in the parameterization of kinetic models of cellular metabolism.

Faculty Opinions recommendation of Deep learning allows genome-scale prediction of Michaelis constants from structural features.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.741015770.793590453 ◽

2021 ◽

Author(s):

Qian-Nan Hu

Keyword(s):

Deep Learning ◽

Structural Features ◽

Michaelis Constants ◽

Genome Scale

Myosinome: A Database of Myosins from Select Eukaryotic Genomes to Facilitate Analysis of Sequence-Structure-Function Relationships

Bioinformatics and Biology Insights ◽

10.4137/bbi.s9902 ◽

2012 ◽

Vol 6 ◽

pp. BBI.S9902 ◽

Cited By ~ 3

Author(s):

Divya P. Syamaladevi ◽

Margaret S Sunitha ◽

S. Kalaimathy ◽

Chandrashekar C. Reddy ◽

Mohammed Iftekhar ◽

...

Keyword(s):

Conformational Changes ◽

Atp Hydrolysis ◽

Homo Sapiens ◽

Relevant Literature ◽

Myosin Ii ◽

Coiled Coil ◽

Structural Features ◽

Model Organisms ◽

Congenital Diseases ◽

C Elegans

Myosins are one of the largest protein superfamilies with 24 classes. They have conserved structural features and catalytic domains yet show huge variation at different domains resulting in a variety of functions. Myosins are molecules driving various kinds of cellular processes and motility until the level of organisms. These are ATPases that utilize the chemical energy released by ATP hydrolysis to bring about conformational changes leading to a motor function. Myosins are important as they are involved in almost all cellular activities ranging from cell division to transcriptional regulation. They are crucial due to their involvement in many congenital diseases symptomatized by muscular malfunctions, cardiac diseases, deafness, neural and immunological dysfunction, and so on, many of which lead to death at an early age. We present Myosinome, a database of selected myosin classes (myosin II, V, and VI) from five model organisms. This knowledge base provides the sequences, phylogenetic clustering, domain architectures of myosins and molecular models, structural analyses, and relevant literature of their coiled-coil domains. In the current version of Myosinome, information about 71 myosin sequences belonging to three myosin classes (myosin II, V, and VI) in five model organisms ( Homo Sapiens, Mus musculus, D. melanogaster, C. elegans and S. cereviseae) identified using bioinformatics surveys are presented, and several of them are yet to be functionally characterized. As these proteins are involved in congenital diseases, such a database would be useful in short-listing candidates for gene therapy and drug development. The database can be accessed from http://caps.ncbs.res.in/myosinome .

MIC-Drop: A platform for large-scale in vivo CRISPR screens

Science ◽

10.1126/science.abi8870 ◽

2021 ◽

pp. eabi8870

Author(s):

Saba Parvez ◽

Chelsea Herdman ◽

Manu Beerens ◽

Korak Chakraborti ◽

Zachary P. Harmer ◽

...

Keyword(s):

Large Scale ◽

Cultured Cells ◽

Cardiac Development ◽

Droplet Microfluidics ◽

Model Organisms ◽

Genetic Screens ◽

Large Numbers ◽

And Function ◽

Genome Scale

CRISPR-Cas9 can be scaled up for large-scale screens in cultured cells, but CRISPR screens in animals have been challenging because generating, validating, and keeping track of large numbers of mutant animals is prohibitive. Here, we report Multiplexed Intermixed CRISPR Droplets (MIC-Drop), a platform combining droplet microfluidics, single-needle en masse CRISPR ribonucleoprotein injections, and DNA barcoding to enable large-scale functional genetic screens in zebrafish. The platform can efficiently identify genes responsible for morphological or behavioral phenotypes. In one application, we show MIC-Drop can identify small molecule targets. Furthermore, in a MIC-Drop screen of 188 poorly characterized genes, we discover several genes important for cardiac development and function. With the potential to scale to thousands of genes, MIC-Drop enables genome-scale reverse-genetic screens in model organisms.

Mapping the glycosyltransferase fold landscape using interpretable deep learning

Nature Communications ◽

10.1038/s41467-021-25975-9 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Rahil Taujale ◽

Zhongliang Zhou ◽

Wayland Yeung ◽

Kelley W. Moremen ◽

Sheng Li ◽

...

Keyword(s):

Deep Learning ◽

Secondary Structure ◽

Structural Features ◽

Functional Diversification ◽

Sequence Structure ◽

Cellular Processes ◽

And Function ◽

Deep Learning Model ◽

Fold Prediction ◽

Primary Sequence Alignment

AbstractGlycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mapping the relationships connecting sequence, structure, fold and function using traditional bioinformatics approaches. Here, we present a convolutional neural network with attention (CNN-attention) based deep learning model that leverages simple secondary structure representations generated from primary sequences to provide GT fold prediction with high accuracy. The model learns distinguishing secondary structure features free of primary sequence alignment constraints and is highly interpretable. It delineates sequence and structural features characteristic of individual fold types, while classifying them into distinct clusters that group evolutionarily divergent families based on shared secondary structural features. We further extend our model to classify GT families of unknown folds and variants of known folds. By identifying families that are likely to adopt novel folds such as GT91, GT96 and GT97, our studies expand the GT fold landscape and prioritize targets for future structural studies.

Genome-Scale Screening and Combinatorial Optimization of Gene Overexpression Targets to Improve Cadmium Tolerance in Saccharomyces cerevisiae

Frontiers in Microbiology ◽

10.3389/fmicb.2021.662512 ◽

2021 ◽

Vol 12 ◽

Author(s):

Yongcan Chen ◽

Jun Liang ◽

Zhicong Chen ◽

Bo Wang ◽

Tong Si

Keyword(s):

Saccharomyces Cerevisiae ◽

Genetic Engineering ◽

Heavy Metal Contamination ◽

Cadmium Toxicity ◽

Biomass Accumulation ◽

Model Organisms ◽

Cadmium Tolerance ◽

Wild Type ◽

Cadmium Resistance ◽

Genome Scale

Heavy metal contamination is an environmental issue on a global scale. Particularly, cadmium poses substantial threats to crop and human health. Saccharomyces cerevisiae is one of the model organisms to study cadmium toxicity and was recently engineered as a cadmium hyperaccumulator. Therefore, it is desirable to overcome the cadmium sensitivity of S. cerevisiae via genetic engineering for bioremediation applications. Here we performed genome-scale overexpression screening for gene targets conferring cadmium resistance in CEN.PK2-1c, an industrial S. cerevisiae strain. Seven targets were identified, including CAD1 and CUP1 that are known to improve cadmium tolerance, as well as CRS5, NRG1, PPH21, BMH1, and QCR6 that are less studied. In the wild-type strain, cadmium exposure activated gene transcription of CAD1, CRS5, CUP1, and NRG1 and repressed PPH21, as revealed by real-time quantitative PCR analyses. Furthermore, yeast strains that contained two overexpression mutations out of the seven gene targets were constructed. Synergistic improvement in cadmium tolerance was observed with episomal co-expression of CRS5 and CUP1. In the presence of 200 μM cadmium, the most resistant strain overexpressing both CAD1 and NRG1 exhibited a 3.6-fold improvement in biomass accumulation relative to wild type. This work provided a new approach to discover and optimize genetic engineering targets for increasing cadmium resistance in yeast.

Genome-scale metabolic network reconstruction of model animals as a platform for translational research

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2102344118 ◽

2021 ◽

Vol 118 (30) ◽

pp. e2102344118

Author(s):

Hao Wang ◽

Jonathan L. Robinson ◽

Pinar Kocabas ◽

Johan Gustafsson ◽

Mihail Anton ◽

...

Keyword(s):

Transgenic Mice ◽

Metabolic Network ◽

Model Organisms ◽

Protein Overexpression ◽

Sequencing Data ◽

Proteomics Data ◽

Gm2 Ganglioside ◽

Species Specific ◽

Specific Reactions ◽

Genome Scale

Genome-scale metabolic models (GEMs) are used extensively for analysis of mechanisms underlying human diseases and metabolic malfunctions. However, the lack of comprehensive and high-quality GEMs for model organisms restricts translational utilization of omics data accumulating from the use of various disease models. Here we present a unified platform of GEMs that covers five major model animals, including Mouse1 (Mus musculus), Rat1 (Rattus norvegicus), Zebrafish1 (Danio rerio), Fruitfly1 (Drosophila melanogaster), and Worm1 (Caenorhabditis elegans). These GEMs represent the most comprehensive coverage of the metabolic network by considering both orthology-based pathways and species-specific reactions. All GEMs can be interactively queried via the accompanying web portal Metabolic Atlas. Specifically, through integrative analysis of Mouse1 with RNA-sequencing data from brain tissues of transgenic mice we identified a coordinated up-regulation of lysosomal GM2 ganglioside and peptide degradation pathways which appears to be a signature metabolic alteration in Alzheimer’s disease (AD) mouse models with a phenotype of amyloid precursor protein overexpression. This metabolic shift was further validated with proteomics data from transgenic mice and cerebrospinal fluid samples from human patients. The elevated lysosomal enzymes thus hold potential to be used as a biomarker for early diagnosis of AD. Taken together, we foresee that this evolving open-source platform will serve as an important resource to facilitate the development of systems medicines and translational biomedical applications.

On the objectivity, reliability, and validity of deep learning enabled bioimage analyses

eLife ◽

10.7554/elife.59780 ◽

2020 ◽

Vol 9 ◽

Cited By ~ 1

Author(s):

Dennis Segebarth ◽

Matthias Griebel ◽

Nikolai Stein ◽

Cora R von Collenberg ◽

Corinna Martin ◽

...

Keyword(s):

Deep Learning ◽

Signal To Noise Ratio ◽

Biological Effects ◽

Reliability And Validity ◽

Ground Truth ◽

Training Data ◽

Model Organisms ◽

Data Annotation ◽

Bioimage Analysis ◽

Model Training

Bioimage analysis of fluorescent labels is widely used in the life sciences. Recent advances in deep learning (DL) allow automating time-consuming manual image analysis processes based on annotated training data. However, manual annotation of fluorescent features with a low signal-to-noise ratio is somewhat subjective. Training DL models on subjective annotations may be instable or yield biased models. In turn, these models may be unable to reliably detect biological effects. An analysis pipeline integrating data annotation, ground truth estimation, and model training can mitigate this risk. To evaluate this integrated process, we compared different DL-based analysis approaches. With data from two model organisms (mice, zebrafish) and five laboratories, we show that ground truth estimation from multiple human annotators helps to establish objectivity in fluorescent feature annotations. Furthermore, ensembles of multiple models trained on the estimated ground truth establish reliability and validity. Our research provides guidelines for reproducible DL-based bioimage analyses.

Mouse Gut Microbiome-Encoded β-Glucuronidases Identified Using Metagenome Analysis Guided by Protein Structure

mSystems ◽

10.1128/msystems.00452-19 ◽

2019 ◽

Vol 4 (4) ◽

Cited By ~ 5

Author(s):

Benjamin C. Creekmore ◽

Josh H. Gray ◽

William G. Walton ◽

Kristen A. Biernat ◽

Michael S. Little ◽

...

Keyword(s):

Protein Structure ◽

Active Site ◽

Human Microbiome ◽

Drug Efficacy ◽

Human Microbiome Project ◽

Structural Features ◽

Model Organisms ◽

Mouse Strains ◽

Sequencing Data ◽

Metagenome Analysis

ABSTRACT Gut microbial β-glucuronidase (GUS) enzymes play important roles in drug efficacy and toxicity, intestinal carcinogenesis, and mammalian-microbial symbiosis. Recently, the first catalog of human gut GUS proteins was provided for the Human Microbiome Project stool sample database and revealed 279 unique GUS enzymes organized into six categories based on active-site structural features. Because mice represent a model biomedical research organism, here we provide an analogous catalog of mouse intestinal microbial GUS proteins—a mouse gut GUSome. Using metagenome analysis guided by protein structure, we examined 2.5 million unique proteins from a comprehensive mouse gut metagenome created from several mouse strains, providers, housing conditions, and diets. We identified 444 unique GUS proteins and organized them into six categories based on active-site features, similarly to the human GUSome analysis. GUS enzymes were encoded by the major gut microbial phyla, including Firmicutes (60%) and Bacteroidetes (21%), and there were nearly 20% for which taxonomy could not be assigned. No differences in gut microbial gus gene composition were observed for mice based on sex. However, mice exhibited gus differences based on active-site features associated with provider, location, strain, and diet. Furthermore, diet yielded the largest differences in gus composition. Biochemical analysis of two low-fat-associated GUS enzymes revealed that they are variable with respect to their efficacy of processing both sulfated and nonsulfated heparan nonasaccharides containing terminal glucuronides. IMPORTANCE Mice are commonly employed as model organisms of mammalian disease; as such, our understanding of the compositions of their gut microbiomes is critical to appreciating how the mouse and human gastrointestinal tracts mirror one another. GUS enzymes, with importance in normal physiology and disease, are an attractive set of proteins to use for such analyses. Here we show that while the specific GUS enzymes differ at the sequence level, a core GUSome functionality appears conserved between mouse and human gastrointestinal bacteria. Mouse strain, provider, housing location, and diet exhibit distinct GUSomes and gus gene compositions, but sex seems not to affect the GUSome. These data provide a basis for understanding the gut microbial GUS enzymes present in commonly used laboratory mice. Further, they demonstrate the utility of metagenome analysis guided by protein structure to provide specific sets of functionally related proteins from whole-genome metagenome sequencing data.

Deep learning model for unstructured knowledge classification using structural features

Personal and Ubiquitous Computing ◽

10.1007/s00779-019-01244-x ◽

2019 ◽

Author(s):

Wonkyun Joo ◽

KiSeok Choi ◽

Young-Kuk Kim

Keyword(s):

Deep Learning ◽

Learning Model ◽

Structural Features ◽

Deep Learning Model ◽

Knowledge Classification