scholarly journals Sequencing Enabling Design and Learning in Synthetic Biology

Author(s):  
Pierre-Aurelien Gilliot ◽  
Thomas E. Gorochowski

The ability to read and quantify nucleic acids such as DNA and RNA using sequencing technologies has revolutionized our understanding of life. With the emergence of synthetic biology, these tools are now being put to work in new ways - enabling de novo biological design. Here, we show how sequencing is supporting the creation of a new wave of biological parts and systems, as well as providing the vast data sets needed for the machine learning of design rules for predictive bioengineering. However, we believe this is only the tip of the iceberg and end by providing an outlook on recent advances that will likely broaden the role of sequencing in synthetic biology and its deployment in real-world environments.

2018 ◽  
Author(s):  
Adrian Fritz ◽  
Peter Hofmann ◽  
Stephan Majda ◽  
Eik Dahms ◽  
Johannes Dröge ◽  
...  

Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. Here, we describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series and differential abundance studies, includes real and simulated strain-level diversity, and generates second and third generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT and metaSPAdes, on several thousand small data sets generated with CAMISIM. CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with truth standards for method evaluation. All data sets and the software are freely available at: https://github.com/CAMI-challenge/CAMISIM


2013 ◽  
Vol 5 ◽  
pp. GEG.S12143 ◽  
Author(s):  
Cong-jun Li

DNA methylation is a major epigenetic regulatory mechanism for gene expression and cell differentiation. Until recently, it was still unclear how unmethylated regions in mammalian genomes are protected from de novo methylation and whether or not active demethylating activity is involved. Even the role of molecules and the mechanisms underlying the processes of active demethylation itself is blurred. Emerging sequencing technologies have led to recent insights into the dynamic distribution of DNA methylation during development and the role of this epigenetic mark within a distinct genome context, such as the promoters, exons, or imprinted control regions. This review summarizes recent insights on the dynamic nature of DNA methylation and demethylation, as well as the mechanisms regulating active DNA demethylation in mammalian cells, which have been fundamental research interests in the field of epigenomics.


AI & Society ◽  
2020 ◽  
Author(s):  
Nicolas Malevé

Abstract Computer vision aims to produce an understanding of digital image’s content and the generation or transformation of images through software. Today, a significant amount of computer vision algorithms rely on techniques of machine learning which require large amounts of data assembled in collections, or named data sets. To build these data sets a large population of precarious workers label and classify photographs around the clock at high speed. For computers to learn how to see, a scale articulates macro and micro dimensions: the millions of images culled from the internet with the few milliseconds given to the workers to perform a task for which they are paid a few cents. This paper engages in details with the production of this scale and the labour it relies on: its elaboration. This elaboration does not only require hands and retinas, it also crucially zes mobilises the photographic apparatus. To understand the specific character of the scale created by computer vision scientists, the paper compares it with a previous enterprise of scaling, Malraux’s Le Musée Imaginaire, where photography was used as a device to undo the boundaries of the museum’s collection and open it to an unlimited access to the world’s visual production. Drawing on Douglas Crimp’s argument that the “musée imaginaire”, a hyperbole of the museum, relied simultaneously on the active role of the photographic apparatus for its existence and on its negation, the paper identifies a similar problem in computer vision’s understanding of photography. The double dismissal of the role played by the workers and the agency of the photographic apparatus in the elaboration of computer vision foreground the inherent fragility of the edifice of machine vision and a necessary rethinking of its scale.


2021 ◽  
Vol 1 (3) ◽  
pp. 138-165
Author(s):  
Thomas Krause ◽  
Jyotsna Talreja Wassan ◽  
Paul Mc Kevitt ◽  
Haiying Wang ◽  
Huiru Zheng ◽  
...  

Metagenomics promises to provide new valuable insights into the role of microbiomes in eukaryotic hosts such as humans. Due to the decreasing costs for sequencing, public and private repositories for human metagenomic datasets are growing fast. Metagenomic datasets can contain terabytes of raw data, which is a challenge for data processing but also an opportunity for advanced machine learning methods like deep learning that require large datasets. However, in contrast to classical machine learning algorithms, the use of deep learning in metagenomics is still an exception. Regardless of the algorithms used, they are usually not applied to raw data but require several preprocessing steps. Performing this preprocessing and the actual analysis in an automated, reproducible, and scalable way is another challenge. This and other challenges can be addressed by adjusting known big data methods and architectures to the needs of microbiome analysis and DNA sequence processing. A conceptual architecture for the use of machine learning and big data on metagenomic data sets was recently presented and initially validated to analyze the rumen microbiome. The same architecture can be used for clinical purposes as is discussed in this paper.


2010 ◽  
Vol 2010 ◽  
pp. 1-10 ◽  
Author(s):  
George H. McArthur ◽  
Stephen S. Fong

The generation of well-characterized parts and the formulation of biological design principles in synthetic biology are laying the foundation for more complex and advanced microbial metabolic engineering. Improvements inde novoDNA synthesis and codon-optimization alone are already contributing to the manufacturing of pathway enzymes with improved or novel function. Further development of analytical and computer-aided design tools should accelerate the forward engineering of precisely regulated synthetic pathways by providing a standard framework for the predictable design of biological systems from well-characterized parts. In this review we discuss the current state of synthetic biology within a four-stage framework (design, modeling, synthesis, analysis) and highlight areas requiring further advancement to facilitate true engineering of synthetic microbial metabolism.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Tijana Radivojević ◽  
Zak Costello ◽  
Kenneth Workman ◽  
Hector Garcia Martin

Abstract Synthetic biology allows us to bioengineer cells to synthesize novel valuable molecules such as renewable biofuels or anticancer drugs. However, traditional synthetic biology approaches involve ad-hoc engineering practices, which lead to long development times. Here, we present the Automated Recommendation Tool (ART), a tool that leverages machine learning and probabilistic modeling techniques to guide synthetic biology in a systematic fashion, without the need for a full mechanistic understanding of the biological system. Using sampling-based optimization, ART provides a set of recommended strains to be built in the next engineering cycle, alongside probabilistic predictions of their production levels. We demonstrate the capabilities of ART on simulated data sets, as well as experimental data from real metabolic engineering projects producing renewable biofuels, hoppy flavored beer without hops, fatty acids, and tryptophan. Finally, we discuss the limitations of this approach, and the practical consequences of the underlying assumptions failing.


2018 ◽  
Vol 3 (1) ◽  
pp. 19-37 ◽  
Author(s):  
Kristin V. Presnell ◽  
Hal S. Alper

A review of recent advances of in silico technology toward de novo synthetic biological design.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jonathan R. Belyeu ◽  
Murad Chowdhury ◽  
Joseph Brown ◽  
Brent S. Pedersen ◽  
Michael J. Cormier ◽  
...  

AbstractVisual validation is an important step to minimize false-positive predictions from structural variant (SV) detection. We present Samplot, a tool for creating images that display the read depth and sequence alignments necessary to adjudicate purported SVs across samples and sequencing technologies. These images can be rapidly reviewed to curate large SV call sets. Samplot is applicable to many biological problems such as SV prioritization in disease studies, analysis of inherited variation, or de novo SV review. Samplot includes a machine learning package that dramatically decreases the number of false positives without human review. Samplot is available at https://github.com/ryanlayer/samplot.


2018 ◽  
Author(s):  
Grzegorz M Boratyn ◽  
Jean Thierry-Mieg ◽  
Danielle Thierry-Mieg ◽  
Ben Busby ◽  
Thomas L Madden

ABSTRACTNext-generation sequencing technologies can produce tens of millions of reads, often paired-end, from transcripts or genomes. But few programs can align RNA on the genome and accurately discover introns, especially with long reads. We introduce Magic-BLAST, a new aligner based on ideas from the Magic pipeline. It uses innovative techniques that include the optimization of a spliced alignment score and selective masking during seed selection. We evaluate the performance of Magic-BLAST to accurately map short or long sequences and its ability to discover introns on real RNA-seq data sets from PacBio, Roche and Illumina runs, and on six benchmarks, and compare it to other popular aligners. Additionally, we look at alignments of human idealized RefSeq mRNA sequences perfectly matching the genome. We show that Magic-BLAST is the best at intron discovery over a wide range of conditions and the best at mapping reads longer than 250 bases, from any platform. It is versatile and robust to high levels of mismatches or extreme base composition, and reasonably fast. It can align reads to a BLAST database or a FASTA file. It can accept a FASTQ file as input or automatically retrieve an accession from the SRA repository at the NCBI.


eLife ◽  
2014 ◽  
Vol 3 ◽  
Author(s):  
Qi Zhang ◽  
Xiang Zhou ◽  
RuiZhi Wu ◽  
Amber Mosley ◽  
Shelya X Zeng ◽  
...  

The ‘ribosomal stress (RS)-p53 pathway’ is triggered by any stressor or genetic alteration that disrupts ribosomal biogenesis, and mediated by several ribosomal proteins (RPs), such as RPL11 and RPL5, which inhibit MDM2 and activate p53. Inosine monophosphate (IMP) dehydrogenase 2 (IMPDH2) is a rate-limiting enzyme in de novo guanine nucleotide biosynthesis and crucial for maintaining cellular guanine deoxy- and ribonucleotide pools needed for DNA and RNA synthesis. It is highly expressed in many malignancies. We previously showed that inhibition of IMPDH2 leads to p53 activation by causing RS. Surprisingly, our current study reveals that Inauzhin (INZ), a novel non-genotoxic p53 activator by inhibiting SIRT1, can also inhibit cellular IMPDH2 activity, and reduce the levels of cellular GTP and GTP-binding nucleostemin that is essential for rRNA processing. Consequently, INZ induces RS and the RPL11/RPL5-MDM2 interaction, activating p53. These results support the new notion that INZ suppresses cancer cell growth by dually targeting SIRT1 and IMPDH2.


Sign in / Sign up

Export Citation Format

Share Document