scholarly journals KIS: An automated attribute induction method for classification of DNA sequences

Author(s):  
Rafał Biedrzycki ◽  
Jarosław Arabas

Abstract This paper presents an application of methods from the machine learning domain to solving the task of DNA sequence recognition. We present an algorithm that learns to recognize groups of DNA sequences sharing common features such as sequence functionality. We demonstrate application of the algorithm to find splice sites, i.e., to properly detect donor and acceptor sequences. We compare the results with those of reference methods that have been designed and tuned to detect splice sites. We also show how to use the algorithm to find a human readable model of the IRE (Iron-Responsive Element) and to find IRE sequences. The method, although universal, yields results which are of quality comparable to those obtained by reference methods. In contrast to reference methods, this approach uses models that operate on sequence patterns, which facilitates interpretation of the results by humans.

2018 ◽  
Author(s):  
Javier Pérez-Rodríguez ◽  
Aida de Haro-García ◽  
Nicolás García-Pedrajas

AbstractRecognition of the functional sites of genes, such as translation initiation sites, donor and acceptor splice sites and stop codons, is a relevant part of many current problems in bioinformatics. Recognition of the functional sites of genes is also a fundamental step in gene structure predictions in the most powerful programs. The best approaches to this type of recognition use sophisticated classifiers, such as support vector machines. However, with the rapid accumulation of sequence data, methods for combining many sources of evidence are necessary as it is unlikely that a single classifier can solve this type of problem with the best possible performance.A major issue is that the number of possible models to combine is large and the use of all of these models is impractical. In this paper, we present a framework that is based on floating search for combining as many classifiers as needed for the recognition of any functional sites of a gene. The methodology can be used for the recognition of translation initiation sites, donor and acceptor splice sites and stop codons. Furthermore, we can combine any number of classifiers that are trained on any species. The method is also scalable to large datasets, as is shown in experiments in which the whole human genome is used. The method is also applicable to other recognition tasks.We present experiments on the recognition of these four functional sites in the human genome, which is used as the target genome, and use another 20 species as sources of evidence. The proposed methodology shows significant improvement over state-of-the-art methods for use in a thorough evaluation process. The proposed method is also able to improve heuristic selection of species to be used as sources of evidence as the search finds the most useful datasets.Author summaryIn this paper we present a methodology for combining many sources of information to recognize some of the most important functional sites in a genomic sequence. The functional sites of the sequences, such as, translation start sites, translation initiation sites, acceptor and donor splice sites and stop codons, play a very relevant role in many Bioinformatics tasks. Their accurate recognition is an important task by itself and also as part of gene structure prediction programs.Our approach uses a methodology usually termed in Computer Science as “floating search”. This is a powerful heuristics applicable when the cost of evaluating each possible solution is high. The methodology is applied to the recognition of four different functional sites in the human genome using as additional sources of evidence the annotated genomes of other twenty different species.The results show an advantage of the proposed method and also challenge the standard assumption of using only genomes not very close and not very far from the human to improve the recognition of functional sites in the human genome.


1993 ◽  
Vol 268 (36) ◽  
pp. 27363-27370
Author(s):  
R S Eisenstein ◽  
P T Tuazon ◽  
K L Schalinske ◽  
S A Anderson ◽  
J A Traugh

2021 ◽  
Vol 5 (2) ◽  
Author(s):  
Olivia M Gearner ◽  
Marcin J Kamiński ◽  
Kojun Kanda ◽  
Kali Swichtenberg ◽  
Aaron D Smith

Abstract Sepidiini is a speciose tribe of desert-inhabiting darkling beetles, which contains a number of poorly defined taxonomic groups and is in need of revision at all taxonomic levels. In this study, two previously unrecognized lineages were discovered, based on morphological traits, among the extremely speciose genera Psammodes Kirby, 1819 (164 species and subspecies) and Ocnodes Fåhraeus, 1870 (144 species and subspecies), namely the Psammodes spinosus species-group and Ocnodes humeralis species-group. In order to test their phylogenetic placement, a phylogeny of the tribe was reconstructed based on analyses of DNA sequences from six nonoverlapping genetic loci (CAD, wg, COI JP, COI BC, COII, and 28S) using Bayesian and maximum likelihood inference methods. The aforementioned, morphologically defined, species-groups were recovered as distinct and well-supported lineages within Molurina + Phanerotomeina and are interpreted as independent genera, respectively, Tibiocnodes Gearner & Kamiński gen. nov. and Tuberocnodes Gearner & Kamiński gen. nov. A new species, Tuberocnodes synhimboides Gearner & Kamiński sp. nov., is also described. Furthermore, as the recovered phylogenetic placement of Tibiocnodes and Tuberocnodes undermines the monophyly of Molurina and Phanerotomeina, an analysis of the available diagnostic characters for those subtribes is also performed. As a consequence, Phanerotomeina is considered as a synonym of the newly redefined Molurina sens. nov. Finally, spectrograms of vibrations produced by substrate tapping of two Molurina species, Toktokkus vialis (Burchell, 1822) and T. synhimboides, are presented.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Hui Yung Chin ◽  
Michael Lardelli ◽  
Lyndsey Collins-Praino ◽  
Karissa Barthelson

AbstractMutation of the gene PARK7 (DJ1) causes monogenic autosomal recessive Parkinson’s disease (PD) in humans. Subsequent alterations of PARK7 protein function lead to mitochondrial dysfunction, a major element in PD pathology. Homozygous mutants for the PARK7-orthologous genes in zebrafish, park7, show changes to gene expression in the oxidative phosphorylation pathway, supporting that disruption of energy production is a key feature of neurodegeneration in PD. Iron is critical for normal mitochondrial function, and we have previously used bioinformatic analysis of IRE-bearing transcripts in brain transcriptomes to find evidence supporting the existence of iron dyshomeostasis in Alzheimer’s disease. Here, we analysed IRE-bearing transcripts in the transcriptome data from homozygous park7−/− mutant zebrafish brains. We found that the set of genes with “high quality” IREs in their 5′ untranslated regions (UTRs, the HQ5′IRE gene set) was significantly altered in these 4-month-old park7−/− brains. However, sets of genes with IREs in their 3′ UTRs appeared unaffected. The effects on HQ5′IRE genes are possibly driven by iron dyshomeostasis and/or oxidative stress, but illuminate the existence of currently unknown mechanisms with differential overall effects on 5′ and 3′ IREs.


Blood ◽  
2001 ◽  
Vol 98 (8) ◽  
pp. 2555-2562 ◽  
Author(s):  
Mark Loyevsky ◽  
Timothy LaVaute ◽  
Charles R. Allerson ◽  
Robert Stearman ◽  
Olakunle O. Kassim ◽  
...  

Abstract This study cloned and sequenced the complementary DNA (cDNA) encoding of a putative malarial iron responsive element-binding protein (PfIRPa) and confirmed its identity to the previously identified iron-regulatory protein (IRP)–like cDNA from Plasmodium falciparum. Sequence alignment showed that the plasmodial sequence has 47% identity with human IRP1. Hemoglobin-free lysates obtained from erythrocyte-stage P falciparum contain a protein that binds a consensus mammalian iron-responsive element (IRE), indicating that a protein(s) with iron-regulatory activity was present in the lysates. IRE-binding activity was found to be iron regulated in the electrophoretic mobility shift assays. Western blot analysis showed a 2-fold increase in the level of PfIRPa in the desferrioxamine-treated cultures versus control or iron-supplemented cells. Malarial IRP was detected by anti-PfIRPa antibody in the IRE-protein complex fromP falciparum lysates. Immunofluorescence studies confirmed the presence of PfIRPa in the infected red blood cells. These findings demonstrate that erythrocyte P falciparum contains an iron-regulated IRP that binds a mammalian consensus IRE sequence, raising the possibility that the malaria parasite expresses transcripts that contain IREs and are iron-dependently regulated.


Sign in / Sign up

Export Citation Format

Share Document