TopoPhy-CNN: Integrating Topological Information of Phylogenetic Tree for Host Phenotype Prediction From Metagenomic Data

Author(s):  
Bojing Li ◽  
Duo Zhong ◽  
Xingpeng Jiang ◽  
Tingting He
2018 ◽  
Author(s):  
Derek Reiman ◽  
Ahmed A. Metwally ◽  
Yang Dai

AbstractMotivationAccurate prediction of the host phenotype from a metgenomic sample and identification of the associated bacterial markers are important in metagenomic studies. We introduce PopPhy-CNN, a novel convolutional neural networks (CNN) learning architecture that effectively exploits phylogentic structure in microbial taxa. PopPhy-CNN provides an input format of 2D matrix created by embedding the phylogenetic tree that is populated with the relative abundance of microbial taxa in a metagenomic sample. This conversion empowers CNNs to explore the spatial relationship of the taxonomic annotations on the tree and their quantitative characteristics in metagenomic data.ResultsPopPhy-CNN is evaluated using three metagenomic datasets of moderate size. We show the superior performance of PopPhy-CNN compared to random forest, support vector machines, LASSO and a baseline 1D-CNN model constructed with relative abundance microbial feature vectors. In addition, we design a novel scheme of feature extraction from the learned CNN models and demonstrate the improved performance when the extracted features are used to train support vector machines.ConclusionPopPhy-CNN is a novel deep learning framework for the prediction of host phenotype from metagenomic samples. PopPhy-CNN can efficiently train models and does not require excessive amount of data. PopPhy-CNN facilities not only retrieval of informative microbial taxa from the trained CNN models but also visualization of the taxa on the phynogenetic [email protected] code is publicly available at https://github.com/derekreiman/PopPhy-CNNSupplementary informationSupplementary data are available at Bioinformatics online.


2020 ◽  
pp. 37-40

Genetic variety examination has demonstrated fundamental to the understanding of the epidemiological and developmental history of Papillomavirus (HPV), for the development of accurate diagnostic tests and for efficient vaccine design. The HPV nucleotide diversity has been investigated widely among high-risk HPV types. To make the nucleotide sequence of HPV and do the virus database in Thi-Qar province, and compare sequences of our isolates with previously described isolates from around the world and then draw its phylogenetic tree, this study done. A total of 6 breast formalin-fixed paraffin-embedded (FFPE) of the female patients were included in the study, divided as 4 FFPE malignant tumor and 2 FFPE of benign tumor. The PCR technique was implemented to detect the presence of HPV in breast tissue, and the real-time PCR used to determinant HPV genotypes, then determined a complete nucleotide sequence of HPV of L1 capsid gene, and draw its phylogenetic tree. The nucleotide sequencing finding detects a number of substitution mutation (SNPs) in (L1) gene, which have not been designated before, were identified once in this study population, and revealed that the HPV16 strains have the evolutionary relationship with the South African race, while, the HPV33 and HPV6 showing the evolutionary association with the North American and East Asian race, respectively.


2017 ◽  
Vol 9 (4) ◽  
pp. 59-66
Author(s):  
M. Forghani ◽  
P. Vasev ◽  
V. Averbukh

2009 ◽  
Vol 29 (3) ◽  
pp. 836-838
Author(s):  
Gang-cheng LI ◽  
Zan-bo LIU ◽  
Qing-guang ZENG

2020 ◽  
Vol 17 (1) ◽  
pp. 40-50
Author(s):  
Farzane Kargar ◽  
Amir Savardashtaki ◽  
Mojtaba Mortazavi ◽  
Masoud Torkzadeh Mahani ◽  
Ali Mohammad Amani ◽  
...  

Background: The 1,4-alpha-glucan branching protein (GlgB) plays an important role in the glycogen biosynthesis and the deficiency in this enzyme has resulted in Glycogen storage disease and accumulation of an amylopectin-like polysaccharide. Consequently, this enzyme was considered a special topic in clinical and biotechnological research. One of the newly introduced GlgB belongs to the Neisseria sp. HMSC071A01 (Ref.Seq. WP_049335546). For in silico analysis, the 3D molecular modeling of this enzyme was conducted in the I-TASSER web server. Methods: For a better evaluation, the important characteristics of this enzyme such as functional properties, metabolic pathway and activity were investigated in the TargetP software. Additionally, the phylogenetic tree and secondary structure of this enzyme were studied by Mafft and Prabi software, respectively. Finally, the binding site properties (the maltoheptaose as substrate) were studied using the AutoDock Vina. Results: By drawing the phylogenetic tree, the closest species were the taxonomic group of Betaproteobacteria. The results showed that the structure of this enzyme had 34.45% of the alpha helix and 45.45% of the random coil. Our analysis predicted that this enzyme has a potential signal peptide in the protein sequence. Conclusion: By these analyses, a new understanding was developed related to the sequence and structure of this enzyme. Our findings can further be used in some fields of clinical and industrial biotechnology.


2018 ◽  
Vol 15 (1) ◽  
pp. 67-81 ◽  
Author(s):  
Chandan Raychaudhury ◽  
Md. Imbesat Hassan Rizvi ◽  
Debnath Pal

Background: Generating a large number of compounds using combinatorial methods increases the possibility of finding novel bioactive compounds. Although some combinatorial structure generation algorithms are available, any method for generating structures from activity-linked substructural topological information is not yet reported. Objective: To develop a method using graph-theoretical techniques for generating structures of antitubercular compounds combinatorially from activity-linked substructural topological information, predict activity and prioritize and screen potential drug candidates. </P><P> Methods: Activity related vertices are identified from datasets composed of both active and inactive or, differently active compounds and structures are generated combinatorially using the topological distance distribution associated with those vertices. Biological activities are predicted using topological distance based vertex indices and a rule based method. Generated structures are prioritized using a newly defined Molecular Priority Score (MPS). Results: Studies considering a series of Acid Alkyl Ester (AAE) compounds and three known antitubercular drugs show that active compounds can be generated from substructural information of other active compounds for all these classes of compounds. Activity predictions show high level of success rate and a number of highly active AAE compounds produced high MPS score indicating that MPS score may help prioritize and screen potential drug molecules. A possible relation of this work with scaffold hopping and inverse Quantitative Structure-Activity Relationship (iQSAR) problem has also been discussed. The proposed method seems to hold promise for discovering novel therapeutic candidates for combating Tuberculosis and may be useful for discovering novel drug molecules for the treatment of other diseases as well.


Sign in / Sign up

Export Citation Format

Share Document