Data-driven rational biosynthesis design: from molecules to cell factories

Fu Chen; Le Yuan; Shaozhen Ding; Yu Tian; Qian-Nan Hu

doi:10.1093/bib/bbz065

Data-driven rational biosynthesis design: from molecules to cell factories

Briefings in Bioinformatics ◽

10.1093/bib/bbz065 ◽

2019 ◽

Vol 21 (4) ◽

pp. 1238-1248

Author(s):

Fu Chen ◽

Le Yuan ◽

Shaozhen Ding ◽

Yu Tian ◽

Qian-Nan Hu

Keyword(s):

Large Scale ◽

Protein Domain ◽

Data Driven ◽

Cell Factory ◽

Dna Assembly ◽

Metabolic Reaction ◽

Cell Factories ◽

Metabolic Systems ◽

One Stop ◽

Target Molecules

Abstract A proliferation of chemical, reaction and enzyme databases, new computational methods and software tools for data-driven rational biosynthesis design have emerged in recent years. With the coming of the era of big data, particularly in the bio-medical field, data-driven rational biosynthesis design could potentially be useful to construct target-oriented chassis organisms. Engineering the complicated metabolic systems of chassis organisms to biosynthesize target molecules from inexpensive biomass is the main goal of cell factory design. The process of data-driven cell factory design could be divided into several parts: (1) target molecule selection; (2) metabolic reaction and pathway design; (3) prediction of novel enzymes based on protein domain and structure transformation of biosynthetic reactions; (4) construction of large-scale DNA for metabolic pathways; and (5) DNA assembly methods and visualization tools. The construction of a one-stop cell factory system could achieve automated design from the molecule level to the chassis level. In this article, we outline data-driven rational biosynthesis design steps and provide an overview of related tools in individual steps.

Download Full-text

GEDpm-cg: Genome Editing Automated Design Platform for Point Mutation Construction in Corynebacterium glutamicum

Frontiers in Bioengineering and Biotechnology ◽

10.3389/fbioe.2021.768289 ◽

2021 ◽

Vol 9 ◽

Author(s):

Yi Yang ◽

Yufeng Mao ◽

Ye Liu ◽

Ruoyu Wang ◽

Hui Lu ◽

...

Keyword(s):

Genome Editing ◽

Point Mutation ◽

In Silico ◽

Computer Aided Design ◽

Large Scale ◽

Point Mutations ◽

Dna Assembly ◽

Cell Factories ◽

Assembly Method ◽

Counter Selection

Advances in robotic system-assisted genome editing techniques and computer-aided design tools have significantly facilitated the development of microbial cell factories. Although multiple separate software solutions are available for vector DNA assembly, genome editing, and verification, by far there is still a lack of complete tool which can provide a one-stop service for the entire genome modification process. This makes the design of numerous genetic modifications, especially the construction of mutations that require strictly precise genetic manipulation, a laborious, time-consuming and error-prone process. Here, we developed a free online tool called GEDpm-cg for the design of genomic point mutations in C. glutamicum. The suicide plasmid-mediated counter-selection point mutation editing method and the overlap-based DNA assembly method were selected to ensure the editability of any single nucleotide at any locus in the C. glutamicum chromosome. Primers required for both DNA assembly of the vector for genetic modification and sequencing verification were provided as design results to meet all the experimental needs. An in-silico design task of over 10,000 single point mutations can be completed in 5 min. Finally, three independent point mutations were successfully constructed in C. glutamicum guided by GEDpm-cg, which confirms that the in-silico design results could accurately and seamlessly be bridged with in vivo or in vitro experiments. We believe this platform will provide a user-friendly, powerful and flexible tool for large-scale mutation analysis in the industrial workhorse C. glutamicum via robotic/software-assisted systems.

Download Full-text

Modular, synthetic chromosomes as new tools for large scale engineering of metabolism

10.1101/2021.10.04.462994 ◽

2021 ◽

Author(s):

Eline Postma ◽

Else-Jasmijn Hassing ◽

Venda Mangkusaputra ◽

Jordi Geelhoed ◽

Pilar de la Torre ◽

...

Keyword(s):

Copy Number ◽

Large Scale ◽

Metabolic Networks ◽

De Novo ◽

Microbial Cell ◽

Cell Factory ◽

Microbial Cell Factory ◽

Microbial Cell Factories ◽

Cell Factories

The construction of powerful cell factories requires intensive genetic engineering for the addition of new functionalities and the remodeling of native pathways and processes. The present study demonstrates the feasibility of extensive genome reprogramming using modular, specialized de novo-assembled neochromosomes in yeast. The in vivo assembly of linear and circular neochromosomes, carrying 20 native and 21 heterologous genes, enabled the first de novo production in a microbial cell factory of anthocyanins, plant compounds with a broad range pharmacological properties. Turned into exclusive expression platforms for heterologous and essential metabolic routes, the neochromosomes mimic native chromosomes regarding mitotic and genetic stability, copy number, harmlessness for the host and editability by CRISPR/Cas9. This study paves the way for future microbial cell factories with modular genomes in which core metabolic networks, localized on satellite, specialized neochromosomes can be swapped for alternative configurations and serve as landing pads for the addition of functionalities.

Download Full-text

Early response of methanogenic archaea to H2 as evaluated by metagenomics and metatranscriptomics

10.21203/rs.3.rs-368581/v1 ◽

2021 ◽

Author(s):

Balázs Kakuk ◽

Roland Wirth ◽

Gergely Maróti ◽

Szuhaj Márk ◽

Gábor Rakhely ◽

...

Keyword(s):

Microbial Community ◽

Large Scale ◽

Methanogenic Archaea ◽

Redox Balance ◽

Early Response ◽

Limiting Factor ◽

Cell Factory ◽

Microbial Composition ◽

Cell Factories ◽

Power To Gas

Abstract Background. The detailed molecular machinery of the complex microbiological cell factory of biogas/biomethane production is not fully understood. One of the main puzzling process control elements is the formation, consumption and regulatory role of hydrogen (H2). Reduction of carbon dioxide (CO2) by H2 is rate limiting factor in methanogenesis, but the community intends to keep H2 concentration low in order to maintain the redox balance of the overall system. H2 metabolism in methanogens becomes increasingly important in the Power-to-Gas renewable energy conversion and storage technologies. Results. The early response of the mixed mesophilic microbial community to H2 gas injection was investigated with the goal of uncovering the first responses of the microbial community in the CH4 formation and CO2 mitigation Power-to-Gas process. The overall microbial composition changes, following a 10 min H2 injection by excessive bubbling of H2 through the reactor, was investigated via metagenome and metatranscriptome sequencing. The overall composition and taxonomic abundance of the biogas producing anaerobic community did not change appreciably two hours after the H2 treatment, indicating that this time period was too short to display differences in the proliferation of the members of the microbial community. There was, however, a substantial increase in the expression of genes related to hydrogenotrophic methanogenesis of certain groups of Archaea. H2 injection also altered the metabolism of a number of microbes belonging in the kingdom Bacteria. The importance of syntrophic cross-kingdom interactions in H2 metabolism and the effects on the related Power-to-Gas process are discussed. Conclusion s. External H2 regulates the functional activity of certain Bacteria and Archaea. Mixed communities are recommended for the large scale Power-to-Gas process rather than single hydrogenotrophic methanogen strains. Fast and reproducible response from the microbial community can be exploited in turn-off and turn-on of the Power-to-Gas microbial cell factories.

Download Full-text

Harnessing the yeast Saccharomyces cerevisiae for the production of fungal secondary metabolites

Essays in Biochemistry ◽

10.1042/ebc20200137 ◽

2021 ◽

Author(s):

Guokun Wang ◽

Douglas B. Kell ◽

Irina Borodina

Keyword(s):

Saccharomyces Cerevisiae ◽

Secondary Metabolites ◽

High Throughput Screening ◽

Large Scale ◽

Strain Improvement ◽

Cell Factory ◽

Yeast Saccharomyces Cerevisiae ◽

Cell Factories ◽

Genetically Engineer ◽

Fungal Secondary Metabolites

Abstract Fungal secondary metabolites (FSMs) represent a remarkable array of bioactive compounds, with potential applications as pharmaceuticals, nutraceuticals, and agrochemicals. However, these molecules are typically produced only in limited amounts by their native hosts. The native organisms may also be difficult to cultivate and genetically engineer, and some can produce undesirable toxic side-products. Alternatively, recombinant production of fungal bioactives can be engineered into industrial cell factories, such as aspergilli or yeasts, which are well amenable for large-scale manufacturing in submerged fermentations. In this review, we summarize the development of baker’s yeast Saccharomyces cerevisiae to produce compounds derived from filamentous fungi and mushrooms. These compounds mainly include polyketides, terpenoids, and amino acid derivatives. We also describe how native biosynthetic pathways can be combined or expanded to produce novel derivatives and new-to-nature compounds. We describe some new approaches for cell factory engineering, such as genome-scale engineering, biosensor-based high-throughput screening, and machine learning, and how these tools have been applied for S. cerevisiae strain improvement. Finally, we prospect the challenges and solutions in further development of yeast cell factories to more efficiently produce FSMs.

Download Full-text

The future of self-selecting and stable fermentations

Journal of Industrial Microbiology & Biotechnology ◽

10.1007/s10295-020-02325-0 ◽

2020 ◽

Vol 47 (11) ◽

pp. 993-1004 ◽

Cited By ~ 2

Author(s):

Peter Rugbjerg ◽

Lisbeth Olsson

Keyword(s):

Growth Rate ◽

Large Scale ◽

Scale Up ◽

Serial Passage ◽

Cell Factory ◽

Sequencing Data ◽

High Performing ◽

Cell Factories ◽

Strain Design ◽

High Production

AbstractUnfavorable cell heterogeneity is a frequent risk during bioprocess scale-up and characterized by rising frequencies of low-producing cells. Low-producing cells emerge by both non-genetic and genetic variation and will enrich due to their higher specific growth rate during the extended number of cell divisions of large-scale bioproduction. Here, we discuss recent strategies for synthetic stabilization of fermentation populations and argue for their application to make cell factory designs that better suit industrial needs. Genotype-directed strategies leverage DNA-sequencing data to inform strain design. Self-selecting phenotype-directed strategies couple high production with cell proliferation, either by redirected metabolic pathways or synthetic product biosensing to enrich for high-performing cell variants. Evaluating production stability early in new cell factory projects will guide heterogeneity-reducing design choices. As good initial metrics, we propose production half-life from standardized serial-passage stability screens and production load, quantified as production-associated percent-wise growth rate reduction. Incorporating more stable genetic designs will greatly increase scalability of future cell factories through sustaining a high-production phenotype and enabling stable long-term production.

Download Full-text

JA signal-mediated immunity of Dendrobium catenatum to necrotrophic Southern Blight pathogen

BMC Plant Biology ◽

10.1186/s12870-021-03134-y ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Cong Li ◽

Qiuyi Shen ◽

Xiang Cai ◽

Danni Lai ◽

Lingshang Wu ◽

...

Keyword(s):

Large Scale ◽

Plant Immunity ◽

Expression Patterns ◽

Protein Domain ◽

Functional Identification ◽

Necrotrophic Pathogen ◽

Southern Blight ◽

Meja Treatment ◽

Sclerotium Delphinii ◽

Large Scale Cultivation

Abstract Background Dendrobium catenatum belongs to the Orchidaceae, and is a precious Chinese herbal medicine. In the past 20 years, D. catenatum industry has developed from an endangered medicinal plant to multi-billion dollar grade industry. The necrotrophic pathogen Sclerotium delphinii has a devastating effection on over 500 plant species, especially resulting in widespread infection and severe yield loss in the process of large-scale cultivation of D. catenatum. It has been widely reported that Jasmonate (JA) is involved in plant immunity to pathogens, but the mechanisms of JA-induced plant resistance to S. delphinii are unclear. Results In the present study, the role of JA in enhancing D. catenatum resistance to S. delphinii was investigated. We identified 2 COI1, 13 JAZ, and 12 MYC proteins in D. catenatum genome. Subsequently, systematic analyses containing phylogenetic relationship, gene structure, protein domain, and motif architecture of core JA pathway proteins were conducted in D. catenatum and the newly characterized homologs from its closely related orchid species Phalaenopsis equestris and Apostasia shenzhenica, along with the well-investigated homologs from Arabidopsis thaliana and Oryza sativa. Public RNA-seq data were investigated to analyze the expression patterns of D. catenatum core JA pathway genes in various tissues and organs. Transcriptome analysis of MeJA and S. delphinii treatment showed exogenous MeJA changed most of the expression of the above genes, and several key members, including DcJAZ1/2/5 and DcMYC2b, are involved in enhancing defense ability to S. delphinii in D. catenatum. Conclusions The findings indicate exogenous MeJA treatment affects the expression level of DcJAZ1/2/5 and DcMYC2b, thereby enhancing D. catenatum resistance to S. delphinii. This research would be helpful for future functional identification of core JA pathway genes involved in breeding for disease resistance in D. catenatum.

Download Full-text

Accelerating In-Transit Co-Processing for Scientific Simulations Using Region-Based Data-Driven Analysis

Algorithms ◽

10.3390/a14050154 ◽

2021 ◽

Vol 14 (5) ◽

pp. 154

Author(s):

Marcus Walldén ◽

Masao Okita ◽

Fumihiko Ino ◽

Dimitris Drikakis ◽

Ioannis Kokkinakis

Keyword(s):

Large Scale ◽

Data Driven ◽

Data Sets ◽

Output Constraints ◽

Data Driven Approach ◽

Scientific Simulations ◽

Multiple Metrics ◽

In Transit ◽

Multiple Compression ◽

Large Scale Simulations

Increasing processing capabilities and input/output constraints of supercomputers have increased the use of co-processing approaches, i.e., visualizing and analyzing data sets of simulations on the fly. We present a method that evaluates the importance of different regions of simulation data and a data-driven approach that uses the proposed method to accelerate in-transit co-processing of large-scale simulations. We use the importance metrics to simultaneously employ multiple compression methods on different data regions to accelerate the in-transit co-processing. Our approach strives to adaptively compress data on the fly and uses load balancing to counteract memory imbalances. We demonstrate the method’s efficiency through a fluid mechanics application, a Richtmyer–Meshkov instability simulation, showing how to accelerate the in-transit co-processing of simulations. The results show that the proposed method expeditiously can identify regions of interest, even when using multiple metrics. Our approach achieved a speedup of 1.29× in a lossless scenario. The data decompression time was sped up by 2× compared to using a single compression method uniformly.

Download Full-text

Automated Data-Driven Generation of Personalized Pedagogical Interventions in Intelligent Tutoring Systems

International Journal of Artificial Intelligence in Education ◽

10.1007/s40593-021-00267-x ◽

2021 ◽

Author(s):

Ekaterina Kochmar ◽

Dung Do Vu ◽

Robert Belfer ◽

Varun Gupta ◽

Iulian Vlad Serban ◽

...

Keyword(s):

Machine Learning ◽

Student Performance ◽

Language Processing ◽

Intelligent Tutoring Systems ◽

Large Scale ◽

Intelligent Tutoring ◽

Performance Outcomes ◽

Data Driven ◽

Personalized Feedback ◽

Tutoring Systems

AbstractIntelligent tutoring systems (ITS) have been shown to be highly effective at promoting learning as compared to other computer-based instructional approaches. However, many ITS rely heavily on expert design and hand-crafted rules. This makes them difficult to build and transfer across domains and limits their potential efficacy. In this paper, we investigate how feedback in a large-scale ITS can be automatically generated in a data-driven way, and more specifically how personalization of feedback can lead to improvements in student performance outcomes. First, in this paper we propose a machine learning approach to generate personalized feedback in an automated way, which takes individual needs of students into account, while alleviating the need of expert intervention and design of hand-crafted rules. We leverage state-of-the-art machine learning and natural language processing techniques to provide students with personalized feedback using hints and Wikipedia-based explanations. Second, we demonstrate that personalized feedback leads to improved success rates at solving exercises in practice: our personalized feedback model is used in , a large-scale dialogue-based ITS with around 20,000 students launched in 2019. We present the results of experiments with students and show that the automated, data-driven, personalized feedback leads to a significant overall improvement of 22.95% in student performance outcomes and substantial improvements in the subjective evaluation of the feedback.

Download Full-text

Data-Driven Energy Use Estimation in Large Scale Transportation Networks

Proceedings of the 2nd ACM/EIGSCC Symposium on Smart Cities and Communities - SCC '19 ◽

10.1145/3357492.3358632 ◽

2019 ◽

Author(s):

Bin Wang ◽

Cy Chan ◽

Divya Somasi ◽

Jane Macfarlane ◽

Eric Rask

Keyword(s):

Large Scale ◽

Energy Use ◽

Transportation Networks ◽

Data Driven

Download Full-text

Improving the management of type 2 diabetes through large-scale general practice: the role of a data-driven and technology-enabled education programme

BMJ Open Quality ◽

10.1136/bmjoq-2020-001087 ◽

2021 ◽

Vol 10 (1) ◽

pp. e001087

Author(s):

Tarek F Radwan ◽

Yvette Agyako ◽

Alireza Ettefaghian ◽

Tahira Kamran ◽

Omar Din ◽

...

Keyword(s):

Type 2 Diabetes ◽

Primary Care ◽

Large Scale ◽

Education Programme ◽

Educational Programme ◽

Data Driven ◽

Treatment Targets ◽

Care Processes ◽

Data Driven Approach

A quality improvement (QI) scheme was launched in 2017, covering a large group of 25 general practices working with a deprived registered population. The aim was to improve the measurable quality of care in a population where type 2 diabetes (T2D) care had previously proved challenging. A complex set of QI interventions were co-designed by a team of primary care clinicians and educationalists and managers. These interventions included organisation-wide goal setting, using a data-driven approach, ensuring staff engagement, implementing an educational programme for pharmacists, facilitating web-based QI learning at-scale and using methods which ensured sustainability. This programme was used to optimise the management of T2D through improving the eight care processes and three treatment targets which form part of the annual national diabetes audit for patients with T2D. With the implemented improvement interventions, there was significant improvement in all care processes and all treatment targets for patients with diabetes. Achievement of all the eight care processes improved by 46.0% (p<0.001) while achievement of all three treatment targets improved by 13.5% (p<0.001). The QI programme provides an example of a data-driven large-scale multicomponent intervention delivered in primary care in ethnically diverse and socially deprived areas.

Download Full-text