substitution matrix
Recently Published Documents


TOTAL DOCUMENTS

91
(FIVE YEARS 23)

H-INDEX

16
(FIVE YEARS 2)

Diversity ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 555
Author(s):  
Emily L. Gordon ◽  
Rebecca T. Kimball ◽  
Edward L. Braun

Phylogenomic analyses have revolutionized the study of biodiversity, but they have revealed that estimated tree topologies can depend, at least in part, on the subset of the genome that is analyzed. For example, estimates of trees for avian orders differ if protein-coding or non-coding data are analyzed. The bird tree is a good study system because the historical signal for relationships among orders is very weak, which should permit subtle non-historical signals to be identified, while monophyly of orders is strongly corroborated, allowing identification of strong non-historical signals. Hydrophobic amino acids in mitochondrially-encoded proteins, which are expected to be found in transmembrane helices, have been hypothesized to be associated with non-historical signals. We tested this hypothesis by comparing the evolution of transmembrane helices and extramembrane segments of mitochondrial proteins from 420 bird species, sampled from most avian orders. We estimated amino acid exchangeabilities for both structural environments and assessed the performance of phylogenetic analysis using each data type. We compared those relative exchangeabilities with values calculated using a substitution matrix for transmembrane helices estimated using a variety of nuclear- and mitochondrially-encoded proteins, allowing us to compare the bird-specific mitochondrial models with a general model of transmembrane protein evolution. To complement our amino acid analyses, we examined the impact of protein structure on patterns of nucleotide evolution. Models of transmembrane and extramembrane sequence evolution for amino acids and nucleotides exhibited striking differences, but there was no evidence for strong topological data type effects. However, incorporating protein structure into analyses of mitochondrially-encoded proteins improved model fit. Thus, we believe that considering protein structure will improve analyses of mitogenomic data, both in birds and in other taxa.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tomasz Woźniak ◽  
Małgorzata Sajek ◽  
Jadwiga Jaruzelska ◽  
Marcin Piotr Sajek

Abstract Background The functions of RNA molecules are mainly determined by their secondary structures. These functions can also be predicted using bioinformatic tools that enable the alignment of multiple RNAs to determine functional domains and/or classify RNA molecules into RNA families. However, the existing multiple RNA alignment tools, which use structural information, are slow in aligning long molecules and/or a large number of molecules. Therefore, a more rapid tool for multiple RNA alignment may improve the classification of known RNAs and help to reveal the functions of newly discovered RNAs. Results Here, we introduce an extremely fast Python-based tool called RNAlign2D. It converts RNA sequences to pseudo-amino acid sequences, which incorporate structural information, and uses a customizable scoring matrix to align these RNA molecules via the multiple protein sequence alignment tool MUSCLE. Conclusions RNAlign2D produces accurate RNA alignments in a very short time. The pseudo-amino acid substitution matrix approach utilized in RNAlign2D is applicable for virtually all protein aligners.


2021 ◽  
Vol 8 ◽  
Author(s):  
Kuan Pern Tan ◽  
Tejashree Rajaram Kanitkar ◽  
Chee Keong Kwoh ◽  
Mallur Srivatsan Madhusudhan

Predicting the functional consequences of single point mutations has relevance to protein function annotation and to clinical analysis/diagnosis. We developed and tested Packpred that makes use of a multi-body clique statistical potential in combination with a depth-dependent amino acid substitution matrix (FADHM) and positional Shannon entropy to predict the functional consequences of point mutations in proteins. Parameters were trained over a saturation mutagenesis data set of T4-lysozyme (1,966 mutations). The method was tested over another saturation mutagenesis data set (CcdB; 1,534 mutations) and the Missense3D data set (4,099 mutations). The performance of Packpred was compared against those of six other contemporary methods. With MCC values of 0.42, 0.47, and 0.36 on the training and testing data sets, respectively, Packpred outperforms all methods in all data sets, with the exception of marginally underperforming in comparison to FADHM in the CcdB data set. A meta server analysis was performed that chose best performing methods of wild-type amino acids and for wild-type mutant amino acid pairs. This led to an increase in the MCC value of 0.40 and 0.51 for the two meta predictors, respectively, on the Missense3D data set. We conjecture that it is possible to improve accuracy with better meta predictors as among the seven methods compared, at least one method or another is able to correctly predict ∼99% of the data.


2021 ◽  
Author(s):  
Amin Rahmani ◽  
Fatemeh Zare-Mirakabad

Humans life depends on the functionality of molecules in the body. One of these essential molecules is the protein that plays a vital role in our life, such that its malfunction can cause severe damages. Such roles make protein structure and its functionality necessary to understand. One of the problems that help us understand the relation between protein structure is the well-known protein design problem which attempts to find an amino acid sequence that can fold into a desired tertiary structure. However, despite having an acceptable accuracy in protein design, this accuracy is an identical percentage of amino acid retrieving. At the same time, it is well-known that amino acids can replace each other in evolution while the function and structure of protein stay the same. Thus the designed sequence does not have the opportunity to be close to the target in the evolutionary aspect. This paper presents an extension to Wang's deep learning model, which uses evolutionary information in the Blosum62 substitution matrix to take amino acid replacement probability into account while designing a sequence.


2021 ◽  
Author(s):  
Yin Yao ◽  
Martin C. Frith

AbstractProtein fossils, i.e. noncoding DNA descended from coding DNA, arise frequently from transposable elements (TEs), decayed genes, and viral integrations. They can reveal, and mislead about, evolutionary history and relationships. They have been detected by comparing DNA to protein sequences, but current methods are not optimized for this task. We describe a powerful DNA-protein homology search method. We use a 64×21 substitution matrix, which is fitted to sequence data, automatically learning the genetic code. We detect subtly homologous regions by considering alternative possible alignments between them, and calculate significance (probability of occurring by chance between random sequences). Our method detects TE protein fossils much more sensitively than blastx, and > 10× faster. Of the ~7 major categories of eukaryotic TE, three have not been found in mammals: we find two of them in the human genome, polinton and DIRS/Ngaro. This method increases our power to find ancient fossils, and perhaps to detect non-standard genetic codes. The alternative-alignments and significance paradigm is not specific to DNA-protein comparison, and could benefit homology search generally.


2021 ◽  
Author(s):  
Shamantha Nasika ◽  
Ashish Runthala

AbstractFor drawing an evolutionary relationship among several protein sequences, the phylogenetic tree is usually constructed through maximum likelihood-based algorithms. To improve the accuracy of these methodologies, many parameters like bootstrap methods, correlation coefficient and residue-substitution models are presumably over-ranked to derive biologically credible relationships. Although the accuracy of protein sequence alignment and the substitution matrix are preliminary constraints to define the biological accuracy of the overlapped sequences/residues, the alignment is not iteratively optimized through the statistical testing of residue-substitution models. The study majorly highlights the potential pitfalls that significantly affect the accuracy of an evolutionary protocol. It emphasizes the need for a more accurate scrutiny of the entire phylogenetic methodology. The need of iterative optimizations is illustrated to construct a biologically credible and not mathematically optimal tree for a sequence dataset.


2021 ◽  
Author(s):  
Kuan Pern Tan ◽  
Tejashree Rajaram Kanitkar ◽  
Kwoh Chee Keong ◽  
M.S. Madhusudhan

1.AbstractPredicting the functional consequences of single point mutations has relevance to protein function annotation and to clinical analysis/diagnosis. We developed and tested Packpred that makes use of a multi-body clique statistical potential in combination with a depth dependent amino acid substitution matrix (FADHM) and positional Shannon Entropy to predict the functional consequences of point mutations in proteins. Parameters were trained over a saturation mutagenesis data set of T4-lysozyme (1966 mutations). The method was tested over another saturation mutagenesis data set (CcdB; 1534 mutations) and the Missense3D data set (4099 mutations). The performance of Packpred was compared against those of six other contemporary methods. With MCC values of 0.42, 0.47 and 0.36 on the training and testing data sets respectively, Packpred outperforms all method in all data sets, with the exception of marginally underperforming to FADHM in the CcdB data set. On analyzing the results, we could build meta servers that chose best performing methods of wild type amino acids and for wild type-mutant amino acid pairs. This lead to an increase of MCC value of 0.40 and 0.51 for the two meta predictors respectively on the Missense3D data set. We conjecture that it is possible to improve accuracy with better meta predictors as among the 7 methods compared, at the least one method or another is able to correctly predict ∼99% of the data.


Sign in / Sign up

Export Citation Format

Share Document