sequence signatures
Recently Published Documents


TOTAL DOCUMENTS

70
(FIVE YEARS 12)

H-INDEX

17
(FIVE YEARS 1)

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Hongchen Ji ◽  
Junjie Li ◽  
Qiong Zhang ◽  
Jingyue Yang ◽  
Juanli Duan ◽  
...  

Abstract Background Mutation processes leave different signatures in genes. For single-base substitutions, previous studies have suggested that mutation signatures are not only reflected in mutation bases but also in neighboring bases. However, because of the lack of a method to identify features of long sequences next to mutation bases, the understanding of how flanking sequences influence mutation signatures is limited. Methods We constructed a long short-term memory-self organizing map (LSTM-SOM) unsupervised neural network. By extracting mutated sequence features via LSTM and clustering similar features with the SOM, single-base substitutions in The Cancer Genome Atlas database were clustered according to both their mutation site and flanking sequences. The relationship between mutation sequence signatures and clinical features was then analyzed. Finally, we clustered patients into different classes according to the composition of the mutation sequence signatures by the K-means method and then studied the differences in clinical features and survival between classes. Results Ten classes of mutant sequence signatures (mutation blots, MBs) were obtained from 2,141,527 single-base substitutions via LSTM-SOM machine learning approach. Different features in mutation bases and flanking sequences were revealed among MBs. MBs reflect both the site and pathological features of cancers. MBs were related to clinical features, including age, sex, and cancer stage. The class of an MB in a given gene was associated with survival. Finally, patients were clustered into 7 classes according to the MB composition. Significant differences in survival and clinical features were observed among different patient classes. Conclusions We provided a method for analyzing the characteristics of mutant sequences. Result of this study showed that flanking sequences, together with mutation bases, shape the signatures of SBSs. MBs were shown related to clinical features and survival of cancer patients. Composition of MBs is a feasible predictive factor of clinical prognosis. Further study of the mechanism of MBs related to cancer characteristics is suggested.


2021 ◽  
Author(s):  
Ji Hongchen ◽  
Li Junjie ◽  
Zhang Qiong ◽  
Yang Jingyue ◽  
Duan Juanli ◽  
...  

Abstract Background: Mutation processes leave different signatures in genes. For single-base substitutions, previous studies have suggested that mutation signatures are not only reflected in mutation bases but also in neighboring bases. However, because of the lack of a method to identify features of long sequences next to mutation bases, the understanding of how flanking sequences influence mutation signatures is limited.Methods: We constructed a long short-term memory – self organizing map (LSTM-SOM) unsupervised neural network. By extracting mutated sequence features via LSTM and clustering similar features with the SOM, single-base substitutions in The Cancer Genome Atlas database were clustered according to both their mutation site and flanking sequences. The relationship between mutation sequence signatures and clinical features was then analyzed. Finally, we clustered patients into different classes according to the composition of the mutation sequence signatures by the K-means method and then studied the differences in clinical features and survival between classes.Results: Ten classes of mutant sequence signatures (mutation blots, MBs) were obtained from 2,141,527 single-base substitutions via LSTM-SOM machine learning approach. Different features in mutation bases and flanking sequences were revealed among MBs. MBs reflect both the site and pathological features of cancers. MBs were related to clinical features, including age, gender, and cancer stage. The class of an MB in a given gene was associated with survival. Finally, patients were clustered into 7 classes according to the MB composition. Significant differences in survival and clinical features were observed among different patient classes.Conclusions: We provided a method for analyzing the characteristics of mutant sequences. Result of this study showed that flanking sequences, together with mutation bases, shape the signatures of SBSs. MBs were shown related to clinical features and survival of cancer patients. Composition of MBs is a feasible predictive factor of clinical prognosis. Further study of the mechanism of MBs related to cancer characteristics is suggested.


2021 ◽  
Vol 118 (5) ◽  
pp. e2010758118
Author(s):  
Shohei Kojima ◽  
Kohei Yoshikawa ◽  
Jumpei Ito ◽  
So Nakagawa ◽  
Nicholas F. Parrish ◽  
...  

Understanding the genetics and taxonomy of ancient viruses will give us great insights into not only the origin and evolution of viruses but also how viral infections played roles in our evolution. Endogenous viruses are remnants of ancient viral infections and are thought to retain the genetic characteristics of viruses from ancient times. In this study, we used machine learning of endogenous RNA virus sequence signatures to identify viruses in the human genome that have not been detected or are already extinct. Here, we show that the k-mer occurrence of ancient RNA viral sequences remains similar to that of extant RNA viral sequences and can be differentiated from that of other human genome sequences. Furthermore, using this characteristic, we screened RNA viral insertions in the human reference genome and found virus-like insertions with phylogenetic and evolutionary features indicative of an exogenous origin but lacking homology to previously identified sequences. Our analysis indicates that animal genomes still contain unknown virus-derived sequences and provides a glimpse into the diversity of the ancient virosphere.


Author(s):  
Isuru Karunatillaka ◽  
Lukasz Jaroszewski ◽  
Adam Godzik

Several plastic degrading enzymes have been described in the literature, most notably PETases that are capable of hydrolyzing polyethylene terephthalate (PET) plastic. One of them, the PETase from Ideonella sakaiensis, a bacterium isolated from environmental samples within a PET bottle recycling site, was the subject of extensive studies. To test how widespread PETase functionality is in other bacterial communities, we used a cascade of BLAST searches in the JGI metagenomic datasets and showed that PETases can also be found in other metagenomic environmental samples from both human affected and relatively pristine sites. To confirm their classification as PETases, we verified that the newly identified proteins have the PETase sequence signatures common to all PETases and that phylogenetic analyses group them with the experimentally characterized PETases. Additionally, docking analysis was performed in order to further confirm the functional assignment of the putative environmental PETases.


2020 ◽  
Vol 6 (11) ◽  
Author(s):  
Zhencheng Fang ◽  
Hongwei Zhou

Plasmids are the key element in horizontal gene transfer in the microbial community. Recently, a large number of experimental and computational methods have been developed to obtain the plasmidomes of microbial communities. Distinguishing transmissible plasmid sequences, which are derived from conjugative or at least mobilizable plasmids, from non-transmissible plasmid sequences in the plasmidome is essential for understanding the diversity of plasmids and how they regulate the microbial community. Unfortunately, due to the highly fragmented characteristics of DNA sequences in the plasmidome, effective identification methods are lacking. In this work, we used information entropy from information theory to assess the randomness of synonymous codon usage over 4424 plasmid genomes. The results showed that for all amino acids, the choice of a synonymous codon in conjugative and mobilizable plasmids is more random than that in non-transmissible plasmids, indicating that transmissible plasmids have different sequence signatures from non-transmissible plasmids. Inspired by this phenomenon, we further developed a novel algorithm named PlasTrans. PlasTrans takes the triplet code sequences and base sequences of plasmid DNA fragments as input and uses the convolutional neural network of the deep learning technique to further extract the more complex signatures of the plasmid sequences and identify the conjugative and mobilizable DNA fragments. Tests showed that PlasTrans could achieve an AUC of as high as 84–91%, even though the fragments only contained hundreds of base pairs. To the best of our knowledge, this is the first quantitative analysis of the difference in sequence signatures between transmissible and non-transmissible plasmids, and we developed the first tool to perform transferability annotation for DNA fragments in the plasmidome. We expect that PlasTrans will be a useful tool for researchers who analyse the properties of novel plasmids in the microbial community and horizontal gene transfer, especially the spread of resistance genes and virulence factors associated with plasmids. PlasTrans is freely available via https://github.com/zhenchengfang/PlasTrans


2020 ◽  
Author(s):  
Dadi Gao ◽  
Elisabetta Morini ◽  
Monica Salani ◽  
Aram J. Krauson ◽  
Ashok Ragavendran ◽  
...  

AbstractPre-mRNA splicing is a key control point in human gene expression. Disturbances in splicing due to mutation or aberrant splicing regulatory networks lead to dysregulated protein expression and contribute to a substantial fraction of human disease. Several classes of active and selective splicing modulator compounds have been recently identified, thus proving that pre-mRNA splicing is a viable target for therapy. We describe herein the identification of BPN-15477, a novel splicing modulator compound, that restores correct splicing of exon 20 in the Elongator complex protein 1 (ELP1) gene carrying the major IVS20+6T>C mutation responsible for familial dysautonomia. We then developed a machine learning approach to evaluate the therapeutic potential of BPN-15477 to correct splicing in other human genetic diseases. Using transcriptome sequencing from compound-treated fibroblast cells, we identified treatment responsive sequence signatures, the majority of which center at the 5’ splice site of exons whose inclusion or exclusion is modulated by SMC treatment. We then leveraged this model to identify 155 human disease genes that harbor ClinVar mutations predicted to alter pre-mRNA splicing as potential targets for BPN-15477 treatment. Using in vitro splicing assays, we validated representative predictions by demonstrating successful correction of splicing defects caused by mutations in genes responsible for cystic fibrosis (CFTR), cholesterol ester storage disease (LIPA), Lynch syndrome (MLH1) and familial frontotemporal dementia (MAPT). Our study shows that deep learning techniques can identify a complex set of sequence signatures and predict response to pharmacological modulation, strongly supporting the use of in silico approaches to expand the therapeutic potential of drugs that modulate splicing.


2019 ◽  
Vol 42 (1) ◽  
pp. 115-124
Author(s):  
Jiangtao Xu ◽  
Xiaoqing Liu ◽  
Xiaoxia Yu ◽  
Xiaoyu Chu ◽  
Jian Tian ◽  
...  

Abstract Objective To thoroughly characterize the Pylb promoter and identify the elements that affect the promoter activity. Result The sequences flanking the − 35 and − 10 box of the Pylb promoter were divided into six segments, and six random-scanning mutant promoter libraries fused to an enhanced green fluorescent protein EGFP were made and analyzed by flow cytometry. Our results showed that the four nucleotides flanking the − 35 box could mostly influence the promoter activity, and this influence was related to the GC content. The promoters mutated in these regions were successfully used for expressing the gene ophc2 encoding organophosphorus hydrolase (OPHC2) and the gene katA encoding catalase (KatA). Conclusion Our work identified and characterized the sequence signatures of the Pylb promoter that could tune the promoter strength, providing further information for the potential application of this promoter. Meanwhile, the sequence signatures have the potential to be used for tuning gene expression in enzyme production, metabolic engineering, and synthetic biology.


2019 ◽  
Vol 29 (5) ◽  
Author(s):  
Jeremy C. Andersen ◽  
Peter Oboyski ◽  
Neil Davies ◽  
Sylvain Charlat ◽  
Curtis Ewing ◽  
...  

2019 ◽  
Vol 39 (6) ◽  
Author(s):  
Nishad Matange

Abstract An explosion of sequence information in the genomics era has thrown up thousands of protein sequences without functional assignment. Though our ability to predict function based on sequence alone is improving steadily, we still have a long way to go. Proteins with common evolutionary origins carry telling sequence signatures, which ought to reveal their biological roles. These sequence signatures have allowed us to classify proteins into families with similar structures, and possibly, functions. Yet, evolution is a perpetual tinkerer, and hence, sequence signatures alone have proved inadequate in understanding the physiological activities of proteins. One such enigmatic family of enzymes is the NUDIX (nucleoside diphosphate linked to a moiety X) hydrolase family that has over 80000 members from all branches of the tree of life. Though MutT, the founding member of this family, was identified in 1954, we are only now beginning to understand the diversity of substrates and biological roles that these enzymes demonstrate. In a recent article by Cordeiro et al. in Bioscience Reports [Biosci. Rep. (2019)], two members of this protein family from the human pathogen Trypanosoma brucei were deorphanized as being polyphosphate hydrolases. The authors show that of the five NUDIX hydrolases coded by the T. brucei genomes, TbNH2 and TbNH4, show in vitro hydrolytic activity against inorganic polyphosphate. Through classical biochemistry and immunostaining microscopy, differences in their substrate specificities and sub-cellular localization were revealed. These new data provide a compelling direction to the study of Trypanosome stress biology as well as our understanding of the NUDIX enzyme family.


Sign in / Sign up

Export Citation Format

Share Document