alignment column
Recently Published Documents


TOTAL DOCUMENTS

4
(FIVE YEARS 1)

H-INDEX

1
(FIVE YEARS 0)

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Nicola De Maio ◽  
Alexander V. Alekseyenko ◽  
William J. Coleman-Smith ◽  
Fabio Pardi ◽  
Marc A. Suchard ◽  
...  

Abstract Background Many important applications in bioinformatics, including sequence alignment and protein family profiling, employ sequence weighting schemes to mitigate the effects of non-independence of homologous sequences and under- or over-representation of certain taxa in a dataset. These schemes aim to assign high weights to sequences that are ‘novel’ compared to the others in the same dataset, and low weights to sequences that are over-represented. Results We formalise this principle by rigorously defining the evolutionary ‘novelty’ of a sequence within an alignment. This results in new sequence weights that we call ‘phylogenetic novelty scores’. These scores have various desirable properties, and we showcase their use by considering, as an example application, the inference of character frequencies at an alignment column—important, for example, in protein family profiling. We give computationally efficient algorithms for calculating our scores and, using simulations, show that they are versatile and can improve the accuracy of character frequency estimation compared to existing sequence weighting schemes. Conclusions Our phylogenetic novelty scores can be useful when an evolutionarily meaningful system for adjusting for uneven taxon sampling is desired. They have numerous possible applications, including estimation of evolutionary conservation scores and sequence logos, identification of targets in conservation biology, and improving and measuring sequence alignment accuracy.



2020 ◽  
Author(s):  
Nicola De Maio ◽  
Alexander V. Alekseyenko ◽  
William J. Coleman-Smith ◽  
Fabio Pardi ◽  
Marc A. Suchard ◽  
...  

AbstractBackgroundMany important applications in bioinformatics, including sequence alignment and protein family profiling, employ sequence weighting schemes to mitigate the effects of non-independence of homologous sequences and under- or over-representation of certain taxa in a dataset. These schemes aim to assign high weights to sequences that are ‘novel’ compared to the others in the same dataset, and low weights to sequences that are over-represented.ResultsWe formalise this principle by rigorously defining the evolutionary ‘novelty’ of a sequence within an alignment. This results in new sequence weights that we call ‘phylogenetic novelty scores’. These scores have various desirable properties, and we showcase their use by considering, as an example application, the inference of character frequencies at an alignment column — important, for example, in protein family profiling. We give computationally efficient algorithms for calculating our scores and, using simulations, show that they improve the accuracy of character frequency estimation compared to existing sequence weighting schemes.ConclusionsOur phylogenetic novelty scores can be useful when an evolutionarily meaningful system for adjusting for uneven taxon sampling is desired. They have numerous possible applications, including estimation of evolutionary conservation scores and sequence logos, identification of targets in conservation biology, and improving and measuring sequence alignment accuracy.



2019 ◽  
Vol 14 (1) ◽  
Author(s):  
Hisanori Kiryu ◽  
Yuto Ichikawa ◽  
Yasuhiro Kojima

Abstract Background  As the number of sequenced genomes grows, researchers have access to an increasingly rich source for discovering detailed evolutionary information. However, the computational technologies for inferring biologically important evolutionary events are not sufficiently developed. Results  We present algorithms to estimate the evolutionary time ($$t_{\text {MRS}}$$tMRS) to the most recent substitution event from a multiple alignment column by using a probabilistic model of sequence evolution. As the confidence in estimated $$t_{\text {MRS}}$$tMRS values varies depending on gap fractions and nucleotide patterns of alignment columns, we also compute the standard deviation $$\sigma$$σ of $$t_{\text {MRS}}$$tMRS by using a dynamic programming algorithm. We identified a number of human genomic sites at which the last substitutions occurred between two speciation events in the human lineage with confidence. A large fraction of such sites have substitutions that occurred between the concestor nodes of Hominoidea and Euarchontoglires. We investigated the correlation between tissue-specific transcribed enhancers and the distribution of the sites with specific substitution time intervals, and found that brain-specific transcribed enhancers are threefold enriched in the density of substitutions in the human lineage relative to expectations. Conclusions  We have presented algorithms to estimate the evolutionary time ($$t_{\text {MRS}}$$tMRS) to the most recent substitution event from a multiple alignment column by using a probabilistic model of sequence evolution. Our algorithms will be useful for Evo-Devo studies, as they facilitate screening potential genomic sites that have played an important role in the acquisition of unique biological features by target species.



2008 ◽  
Vol 363 (1512) ◽  
pp. 4041-4047 ◽  
Author(s):  
Steffen Klaere ◽  
Tanja Gesell ◽  
Arndt von Haeseler

We introduce another view of sequence evolution. Contrary to other approaches, we model the substitution process in two steps. First we assume (arbitrary) scaled branch lengths on a given phylogenetic tree. Second we allocate a Poisson distributed number of substitutions on the branches. The probability to place a mutation on a branch is proportional to its relative branch length. More importantly, the action of a single mutation on an alignment column is described by a doubly stochastic matrix, the so-called one-step mutation matrix. This matrix leads to analytical formulae for the posterior probability distribution of the number of substitutions for an alignment column.



Sign in / Sign up

Export Citation Format

Share Document