scholarly journals Comparative Species Divergence across Eight Triplets of Spiny Lizards (Sceloporus) Using Genomic Sequence Data

2013 ◽  
Vol 5 (12) ◽  
pp. 2410-2419 ◽  
Author(s):  
Adam D. Leaché ◽  
Rebecca B. Harris ◽  
Max E. Maliska ◽  
Charles W. Linkem
2020 ◽  
Vol 15 ◽  
Author(s):  
Affan Alim ◽  
Abdul Rafay ◽  
Imran Naseem

Background: Proteins contribute significantly in every task of cellular life. Their functions encompass the building and repairing of tissues in human bodies and other organisms. Hence they are the building blocks of bones, muscles, cartilage, skin, and blood. Similarly, antifreeze proteins are of prime significance for organisms that live in very cold areas. With the help of these proteins, the cold water organisms can survive below zero temperature and resist the water crystallization process which may cause the rupture in the internal cells and tissues. AFP’s have attracted attention and interest in food industries and cryopreservation. Objective: With the increase in the availability of genomic sequence data of protein, an automated and sophisticated tool for AFP recognition and identification is in dire need. The sequence and structures of AFP are highly distinct, therefore, most of the proposed methods fail to show promising results on different structures. A consolidated method is proposed to produce the competitive performance on highly distinct AFP structure. Methods: In this study, we propose to use machine learning-based algorithms Principal Component Analysis (PCA) followed by Gradient Boosting (GB) for antifreeze protein identification. To analyze the performance and validation of the proposed model, various combinations of two segments composition of amino acid and dipeptide are used. PCA, in particular, is proposed to dimension reduction and high variance retaining of data which is followed by an ensemble method named gradient boosting for modelling and classification. Results: The proposed method obtained the superfluous performance on PDB, Pfam and Uniprot dataset as compared with the RAFP-Pred method. In experiment-3, by utilizing only 150 PCA components a high accuracy of 89.63 was achieved which is superior to the 87.41 utilizing 300 significant features reported for the RAFP-Pred method. Experiment-2 is conducted using two different dataset such that non-AFP from the PISCES server and AFPs from Protein data bank. In this experiment-2, our proposed method attained high sensitivity of 79.16 which is 12.50 better than state-of-the-art the RAFP-pred method. Conclusion: AFPs have a common function with distinct structure. Therefore, the development of a single model for different sequences often fails to AFPs. A robust results have been shown by our proposed model on the diversity of training and testing dataset. The results of the proposed model outperformed compared to the previous AFPs prediction method such as RAFP-Pred. Our model consists of PCA for dimension reduction followed by gradient boosting for classification. Due to simplicity, scalability properties and high performance result our model can be easily extended for analyzing the proteomic and genomic dataset.


Genes ◽  
2021 ◽  
Vol 12 (8) ◽  
pp. 1212
Author(s):  
J. Spencer Johnston ◽  
Carl E. Hjelmen

Next-generation sequencing provides a nearly complete genomic sequence for model and non-model species alike; however, this wealth of sequence data includes no road map [...]


2014 ◽  
Vol 64 (Pt_2) ◽  
pp. 316-324 ◽  
Author(s):  
Jongsik Chun ◽  
Fred A. Rainey

The polyphasic approach used today in the taxonomy and systematics of the Bacteria and Archaea includes the use of phenotypic, chemotaxonomic and genotypic data. The use of 16S rRNA gene sequence data has revolutionized our understanding of the microbial world and led to a rapid increase in the number of descriptions of novel taxa, especially at the species level. It has allowed in many cases for the demarcation of taxa into distinct species, but its limitations in a number of groups have resulted in the continued use of DNA–DNA hybridization. As technology has improved, next-generation sequencing (NGS) has provided a rapid and cost-effective approach to obtaining whole-genome sequences of microbial strains. Although some 12 000 bacterial or archaeal genome sequences are available for comparison, only 1725 of these are of actual type strains, limiting the use of genomic data in comparative taxonomic studies when there are nearly 11 000 type strains. Efforts to obtain complete genome sequences of all type strains are critical to the future of microbial systematics. The incorporation of genomics into the taxonomy and systematics of the Bacteria and Archaea coupled with computational advances will boost the credibility of taxonomy in the genomic era. This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics. It includes contributions on specific taxa as well as outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.


2021 ◽  
Vol 16 ◽  
Author(s):  
Jinghao Peng ◽  
Jiajie Peng ◽  
Haiyin Piao ◽  
Zhang Luo ◽  
Kelin Xia ◽  
...  

Background: The open and accessible regions of the chromosome are more likely to be bound by transcription factors which are important for nuclear processes and biological functions. Studying the change of chromosome flexibility can help to discover and analyze disease markers and improve the efficiency of clinical diagnosis. Current methods for predicting chromosome flexibility based on Hi-C data include the flexibility-rigidity index (FRI) and the Gaussian network model (GNM), which have been proposed to characterize chromosome flexibility. However, these methods require the chromosome structure data based on 3D biological experiments, which is time-consuming and expensive. Objective: Generally, the folding and curling of the double helix sequence of DNA have a great impact on chromosome flexibility and function. Motivated by the success of genomic sequence analysis in biomolecular function analysis, we hope to propose a method to predict chromosome flexibility only based on genomic sequence data. Method: We propose a new method (named "DeepCFP") using deep learning models to predict chromosome flexibility based on only genomic sequence features. The model has been tested in the GM12878 cell line. Results: The maximum accuracy of our model has reached 91%. The performance of DeepCFP is close to FRI and GNM. Conclusion: The DeepCFP can achieve high performance only based on genomic sequence.


2019 ◽  
Author(s):  
◽  
Sarah Unruh

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] Phylogenetic trees show us how organisms are related and provide frameworks for studying and testing evolutionary hypotheses. To better understand the evolution of orchids and their mycorrhizal fungi, I used high-throughput sequencing data and bioinformatic analyses, to build phylogenetic hypotheses. In Chapter 2, I used transcriptome sequences to both build a phylogeny of the slipper orchid genera and to confirm the placement of a polyploidy event at the base of the orchid family. Polyploidy is hypothesized to be a strong driver of evolution and a source of unique traits so confirming this event leads us closer to explaining extant orchid diversity. The list of orthologous genes generated from this study will provide a less expensive and more powerful method for researchers examining the evolutionary relationships in Orchidaceae. In Chapter 3, I generated genomic sequence data for 32 fungal isolates that were collected from orchids across North America. I inferred the first multi-locus nuclear phylogenetic tree for these fungal clades. The phylogenetic structure of these fungi will improve the taxonomy of these clades by providing evidence for new species and for revising problematic species designations. A robust taxonomy is necessary for studying the role of fungi in the orchid mycorrhizal symbiosis. In chapter 4 I summarize my work and outline the future directions of my lab at Illinois College including addressing the remaining aims of my Community Sequencing Proposal with the Joint Genome Institute by analyzing the 15 fungal reference genomes I generated during my PhD. Together these chapters are the start of a life-long research project into the evolution and function of the orchid/fungal symbiosis.


Author(s):  
Yoshihiro Yamanishi ◽  
Hisashi Kashima

In silico prediction of compound-protein interactions from heterogeneous biological data is critical in the process of drug development. In this chapter the authors review several supervised machine learning methods to predict unknown compound-protein interactions from chemical structure and genomic sequence information simultaneously. The authors review several kernel-based algorithms from two different viewpoints: binary classification and dimension reduction. In the results, they demonstrate the usefulness of the methods on the prediction of drug-target interactions and ligand-protein interactions from chemical structure data and genomic sequence data.


2005 ◽  
Vol 44 (05) ◽  
pp. 687-692 ◽  
Author(s):  
B. A. Malin

Summary Objectives: Current genomic privacy technologies assume the identity of genomic sequence data is protected if personal information, such as demographics, are obscured, removed, or encrypted. While demographic features can directly compromise an individual’s identity, recent research demonstrates such protections are insufficient because sequence data itself is susceptible to re-identification. To counteract this problem, we introduce an algorithm for anonymizing a collection of person-specific DNA sequences. Methods: The technique is termed DNA lattice an-onymization (DNALA), and is based upon the formal privacy protection schema of k-anonymity. Under this model, it is impossible to observe or learn features that distinguish one genetic sequence from k-1 other entries in a collection. To maximize information retained in protected sequences, we incorporate a concept generalization lattice to learn the distance between two residues in a single nucleotide region. The lattice provides the most similar generalized concept for two residues (e.g. adenine and guanine are both purines). Results: The method is tested and evaluated with several publicly available human population datasets ranging in size from 30 to 400 sequences. Our findings imply the anonymization schema is feasible for the protection of sequences privacy. Conclusions: The DNALA method is the first computational disclosure control technique for general DNA sequences. Given the computational nature of the method, guarantees of anonymity can be formally proven. There is room for improvement and validation, though this research provides the groundwork from which future researchers can construct genomics anonymization schemas tailored to specific data-sharing scenarios.


2019 ◽  
Vol 225 (3) ◽  
pp. 1355-1369 ◽  
Author(s):  
Erik J. M. Koenen ◽  
Dario I. Ojeda ◽  
Royce Steeves ◽  
Jérémy Migliore ◽  
Freek T. Bakker ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document