Test on the Structure of Biological Sequences via Chaos Game Representation

Author(s):  
Peggy Cénac

In this paper biological sequences are modelled by stationary ergodic sequences. A new family of statistical tests to characterize the randomness of the inputs is proposed and analyzed. Tests for independence and for the determination of the appropriate order of a Markov chain are constructed with the Chaos Game Representation (CGR), and applied to several genomes.

Author(s):  
Zu-Guo Yu ◽  
Guo-Sheng Han ◽  
Bo Li ◽  
Vo Anh ◽  
Yi-Quan Li

The mitochondrial genomes have provided much information on the evolution of this organelle and have been used for phylogenetic reconstruction by various methods with or without sequence alignment. In this paper, we explore the mitochondrial genomes by means of the chaos game representation (CGR), a tool derived from the chaotic dynamical systems theory. If the DNA sequence is a random collection of bases, the CGR will be a uniformly filled square; on the other hand, any pattern visible in the CGR contains information on the DNA sequence. First we use the Markov chain models to simulate the CGR of mitochondrial genomes. Then we model the noise background in the genome sequences by a Markov chain. A simple correlation-related distance approach without sequence alignment based on the CGR of mitochondrial genomes is proposed to analyze the phylogeny of 64 selected vertebrates.


Fractals ◽  
2006 ◽  
Vol 14 (01) ◽  
pp. 27-35 ◽  
Author(s):  
TOMOYA SUZUKI ◽  
TOHRU IKEGUCHI ◽  
MASUO SUZUKI

Iterative function systems are often used for investigating fractal structures. The method is also referred as Chaos Game Representation (CGR), and is applied for representing characteristic structures of DNA sequences visually. In this paper, we proposed an original way of plotting CGR to easily confirm the property of the temporal evaluation of a time series. We also showed existence of spurious characteristic structures of time series, if we carelessly applied the CGR to real time series. We revealed that the source of spurious identification came from non-uniformity of the frequency histograms of the time series, which is often the case of analyzing real time series. We also showed how to avoid such spurious identification by applying the method of surrogate data and introducing conditional probabilities of the time series.


2019 ◽  
Vol 36 (1) ◽  
pp. 272-279 ◽  
Author(s):  
Hannah F Löchel ◽  
Dominic Eger ◽  
Theodor Sperlea ◽  
Dominik Heider

AbstractMotivationClassification of protein sequences is one big task in bioinformatics and has many applications. Different machine learning methods exist and are applied on these problems, such as support vector machines (SVM), random forests (RF) and neural networks (NN). All of these methods have in common that protein sequences have to be made machine-readable and comparable in the first step, for which different encodings exist. These encodings are typically based on physical or chemical properties of the sequence. However, due to the outstanding performance of deep neural networks (DNN) on image recognition, we used frequency matrix chaos game representation (FCGR) for encoding of protein sequences into images. In this study, we compare the performance of SVMs, RFs and DNNs, trained on FCGR encoded protein sequences. While the original chaos game representation (CGR) has been used mainly for genome sequence encoding and classification, we modified it to work also for protein sequences, resulting in n-flakes representation, an image with several icosagons.ResultsWe could show that all applied machine learning techniques (RF, SVM and DNN) show promising results compared to the state-of-the-art methods on our benchmark datasets, with DNNs outperforming the other methods and that FCGR is a promising new encoding method for protein sequences.Availability and implementationhttps://cran.r-project.org/.Supplementary informationSupplementary data are available at Bioinformatics online.


2018 ◽  
Vol 78 (1-2) ◽  
pp. 441-463 ◽  
Author(s):  
Li Ge ◽  
Jiaguo Liu ◽  
Yusen Zhang ◽  
Matthias Dehmer

Sign in / Sign up

Export Citation Format

Share Document