substitution matrix Latest Research Papers

Development of a novel monosaccharide substitution matrix for improved comparison of glycan structures

Carbohydrate Research ◽

10.1016/j.carres.2021.108496 ◽

2022 ◽

pp. 108496

Author(s):

Akihiro Fujita ◽

Kiyoko F. Aoki-Kinoshita

Keyword(s):

Substitution Matrix

Protein Structure, Models of Sequence Evolution, and Data Type Effects in Phylogenetic Analyses of Mitochondrial Data: A Case Study in Birds

Diversity ◽

10.3390/d13110555 ◽

2021 ◽

Vol 13 (11) ◽

pp. 555

Author(s):

Emily L. Gordon ◽

Rebecca T. Kimball ◽

Edward L. Braun

Keyword(s):

Amino Acids ◽

Protein Structure ◽

Amino Acid ◽

Bird Species ◽

Data Type ◽

Substitution Matrix ◽

Sequence Evolution ◽

Transmembrane Helices ◽

Encoded Proteins ◽

The Impact

Phylogenomic analyses have revolutionized the study of biodiversity, but they have revealed that estimated tree topologies can depend, at least in part, on the subset of the genome that is analyzed. For example, estimates of trees for avian orders differ if protein-coding or non-coding data are analyzed. The bird tree is a good study system because the historical signal for relationships among orders is very weak, which should permit subtle non-historical signals to be identified, while monophyly of orders is strongly corroborated, allowing identification of strong non-historical signals. Hydrophobic amino acids in mitochondrially-encoded proteins, which are expected to be found in transmembrane helices, have been hypothesized to be associated with non-historical signals. We tested this hypothesis by comparing the evolution of transmembrane helices and extramembrane segments of mitochondrial proteins from 420 bird species, sampled from most avian orders. We estimated amino acid exchangeabilities for both structural environments and assessed the performance of phylogenetic analysis using each data type. We compared those relative exchangeabilities with values calculated using a substitution matrix for transmembrane helices estimated using a variety of nuclear- and mitochondrially-encoded proteins, allowing us to compare the bird-specific mitochondrial models with a general model of transmembrane protein evolution. To complement our amino acid analyses, we examined the impact of protein structure on patterns of nucleotide evolution. Models of transmembrane and extramembrane sequence evolution for amino acids and nucleotides exhibited striking differences, but there was no evidence for strong topological data type effects. However, incorporating protein structure into analyses of mitochondrially-encoded proteins improved model fit. Thus, we believe that considering protein structure will improve analyses of mitogenomic data, both in birds and in other taxa.

RNAlign2D: a rapid method for combined RNA structure and sequence-based alignment using a pseudo-amino acid substitution matrix

BMC Bioinformatics ◽

10.1186/s12859-021-04426-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Tomasz Woźniak ◽

Małgorzata Sajek ◽

Jadwiga Jaruzelska ◽

Marcin Piotr Sajek

Keyword(s):

Amino Acid ◽

Amino Acid Substitution ◽

Rna Structure ◽

Structural Information ◽

Amino Acid Sequences ◽

Substitution Matrix ◽

Rna Sequences ◽

Rna Molecules ◽

Bioinformatic Tools ◽

Amino Acid Substitution Matrix

Abstract Background The functions of RNA molecules are mainly determined by their secondary structures. These functions can also be predicted using bioinformatic tools that enable the alignment of multiple RNAs to determine functional domains and/or classify RNA molecules into RNA families. However, the existing multiple RNA alignment tools, which use structural information, are slow in aligning long molecules and/or a large number of molecules. Therefore, a more rapid tool for multiple RNA alignment may improve the classification of known RNAs and help to reveal the functions of newly discovered RNAs. Results Here, we introduce an extremely fast Python-based tool called RNAlign2D. It converts RNA sequences to pseudo-amino acid sequences, which incorporate structural information, and uses a customizable scoring matrix to align these RNA molecules via the multiple protein sequence alignment tool MUSCLE. Conclusions RNAlign2D produces accurate RNA alignments in a very short time. The pseudo-amino acid substitution matrix approach utilized in RNAlign2D is applicable for virtually all protein aligners.

Packpred: Predicting the Functional Effect of Missense Mutations

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.646288 ◽

2021 ◽

Vol 8 ◽

Author(s):

Kuan Pern Tan ◽

Tejashree Rajaram Kanitkar ◽

Chee Keong Kwoh ◽

Mallur Srivatsan Madhusudhan

Keyword(s):

Amino Acid ◽

Single Point ◽

Point Mutations ◽

Substitution Matrix ◽

Saturation Mutagenesis ◽

Data Sets ◽

Missense Mutations ◽

Wild Type ◽

Data Set ◽

Functional Consequences

Predicting the functional consequences of single point mutations has relevance to protein function annotation and to clinical analysis/diagnosis. We developed and tested Packpred that makes use of a multi-body clique statistical potential in combination with a depth-dependent amino acid substitution matrix (FADHM) and positional Shannon entropy to predict the functional consequences of point mutations in proteins. Parameters were trained over a saturation mutagenesis data set of T4-lysozyme (1,966 mutations). The method was tested over another saturation mutagenesis data set (CcdB; 1,534 mutations) and the Missense3D data set (4,099 mutations). The performance of Packpred was compared against those of six other contemporary methods. With MCC values of 0.42, 0.47, and 0.36 on the training and testing data sets, respectively, Packpred outperforms all methods in all data sets, with the exception of marginally underperforming in comparison to FADHM in the CcdB data set. A meta server analysis was performed that chose best performing methods of wild-type amino acids and for wild-type mutant amino acid pairs. This led to an increase in the MCC value of 0.40 and 0.51 for the two meta predictors, respectively, on the Missense3D data set. We conjecture that it is possible to improve accuracy with better meta predictors as among the seven methods compared, at least one method or another is able to correctly predict ∼99% of the data.

An extension of Wang protein design model using Blosum62 substitution matrix

10.1101/2021.06.07.447415 ◽

2021 ◽

Author(s):

Amin Rahmani ◽

Fatemeh Zare-Mirakabad

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Protein Design ◽

Tertiary Structure ◽

Amino Acid Replacement ◽

Vital Role ◽

The Body ◽

Substitution Matrix ◽

Evolutionary Information ◽

Deep Learning Model

Humans life depends on the functionality of molecules in the body. One of these essential molecules is the protein that plays a vital role in our life, such that its malfunction can cause severe damages. Such roles make protein structure and its functionality necessary to understand. One of the problems that help us understand the relation between protein structure is the well-known protein design problem which attempts to find an amino acid sequence that can fold into a desired tertiary structure. However, despite having an acceptable accuracy in protein design, this accuracy is an identical percentage of amino acid retrieving. At the same time, it is well-known that amino acids can replace each other in evolution while the function and structure of protein stay the same. Thus the designed sequence does not have the opportunity to be close to the target in the evolutionary aspect. This paper presents an extension to Wang's deep learning model, which uses evolutionary information in the Blosum62 substitution matrix to take amino acid replacement probability into account while designing a sequence.

New amino acid substitution matrix brings sequence alignments into agreement with structure matches

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.26050 ◽

2021 ◽

Author(s):

Kejue Jia ◽

Robert L Jernigan

Keyword(s):

Amino Acid ◽

Amino Acid Substitution ◽

Substitution Matrix ◽

Sequence Alignments ◽

Amino Acid Substitution Matrix

Improved DNA-versus-Protein Homology Search for Protein Fossils

10.1101/2021.01.25.428050 ◽

2021 ◽

Author(s):

Yin Yao ◽

Martin C. Frith

Keyword(s):

Evolutionary History ◽

Sequence Data ◽

Protein Sequences ◽

Search Method ◽

Substitution Matrix ◽

Homology Search ◽

Protein Homology ◽

Noncoding Dna ◽

Genetic Codes ◽

Versus Protein

AbstractProtein fossils, i.e. noncoding DNA descended from coding DNA, arise frequently from transposable elements (TEs), decayed genes, and viral integrations. They can reveal, and mislead about, evolutionary history and relationships. They have been detected by comparing DNA to protein sequences, but current methods are not optimized for this task. We describe a powerful DNA-protein homology search method. We use a 64×21 substitution matrix, which is fitted to sequence data, automatically learning the genetic code. We detect subtly homologous regions by considering alternative possible alignments between them, and calculate significance (probability of occurring by chance between random sequences). Our method detects TE protein fossils much more sensitively than blastx, and > 10× faster. Of the ~7 major categories of eukaryotic TE, three have not been found in mammals: we find two of them in the human genome, polinton and DIRS/Ngaro. This method increases our power to find ancient fossils, and perhaps to detect non-standard genetic codes. The alternative-alignments and significance paradigm is not specific to DNA-protein comparison, and could benefit homology search generally.

Current strategic limitations of phylogenetic tools badly impact the inference of an evolutionary tree

10.1101/2021.01.21.427545 ◽

2021 ◽

Author(s):

Shamantha Nasika ◽

Ashish Runthala

Keyword(s):

Maximum Likelihood ◽

Protein Sequence ◽

Evolutionary Relationship ◽

Protein Sequences ◽

Statistical Testing ◽

Substitution Matrix ◽

Protein Sequence Alignment ◽

Substitution Models ◽

Optimal Tree ◽

Residue Substitution

AbstractFor drawing an evolutionary relationship among several protein sequences, the phylogenetic tree is usually constructed through maximum likelihood-based algorithms. To improve the accuracy of these methodologies, many parameters like bootstrap methods, correlation coefficient and residue-substitution models are presumably over-ranked to derive biologically credible relationships. Although the accuracy of protein sequence alignment and the substitution matrix are preliminary constraints to define the biological accuracy of the overlapped sequences/residues, the alignment is not iteratively optimized through the statistical testing of residue-substitution models. The study majorly highlights the potential pitfalls that significantly affect the accuracy of an evolutionary protocol. It emphasizes the need for a more accurate scrutiny of the entire phylogenetic methodology. The need of iterative optimizations is illustrated to construct a biologically credible and not mathematically optimal tree for a sequence dataset.

Packpred: Predicting the functional effect of missense mutations

10.1101/2020.12.30.424909 ◽

2021 ◽

Author(s):

Kuan Pern Tan ◽

Tejashree Rajaram Kanitkar ◽

Kwoh Chee Keong ◽

M.S. Madhusudhan

Keyword(s):

Amino Acid ◽

Single Point ◽

Point Mutations ◽

Substitution Matrix ◽

Saturation Mutagenesis ◽

Data Sets ◽

Missense Mutations ◽

Wild Type ◽

Data Set ◽

Functional Consequences

1.AbstractPredicting the functional consequences of single point mutations has relevance to protein function annotation and to clinical analysis/diagnosis. We developed and tested Packpred that makes use of a multi-body clique statistical potential in combination with a depth dependent amino acid substitution matrix (FADHM) and positional Shannon Entropy to predict the functional consequences of point mutations in proteins. Parameters were trained over a saturation mutagenesis data set of T4-lysozyme (1966 mutations). The method was tested over another saturation mutagenesis data set (CcdB; 1534 mutations) and the Missense3D data set (4099 mutations). The performance of Packpred was compared against those of six other contemporary methods. With MCC values of 0.42, 0.47 and 0.36 on the training and testing data sets respectively, Packpred outperforms all method in all data sets, with the exception of marginally underperforming to FADHM in the CcdB data set. On analyzing the results, we could build meta servers that chose best performing methods of wild type amino acids and for wild type-mutant amino acid pairs. This lead to an increase of MCC value of 0.40 and 0.51 for the two meta predictors respectively on the Missense3D data set. We conjecture that it is possible to improve accuracy with better meta predictors as among the 7 methods compared, at the least one method or another is able to correctly predict ∼99% of the data.

On the Effects of Substitution Matrix Choices for Pairwise Gapped Global Sequence Alignment of DNA Nucleotides

Communications in Computer and Information Science - Advanced Informatics for Computing Research ◽

10.1007/978-981-16-3660-8_11 ◽

2021 ◽

pp. 113-125

Author(s):

Rajashree Chaurasia ◽

Udayan Ghose

Keyword(s):

Sequence Alignment ◽

Substitution Matrix

substitution matrix
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Development of a novel monosaccharide substitution matrix for improved comparison of glycan structures

Protein Structure, Models of Sequence Evolution, and Data Type Effects in Phylogenetic Analyses of Mitochondrial Data: A Case Study in Birds

RNAlign2D: a rapid method for combined RNA structure and sequence-based alignment using a pseudo-amino acid substitution matrix

Packpred: Predicting the Functional Effect of Missense Mutations

An extension of Wang protein design model using Blosum62 substitution matrix

New amino acid substitution matrix brings sequence alignments into agreement with structure matches

Improved DNA-versus-Protein Homology Search for Protein Fossils

Current strategic limitations of phylogenetic tools badly impact the inference of an evolutionary tree

Packpred: Predicting the functional effect of missense mutations

On the Effects of Substitution Matrix Choices for Pairwise Gapped Global Sequence Alignment of DNA Nucleotides

Export Citation Format

substitution matrixRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Development of a novel monosaccharide substitution matrix for improved comparison of glycan structures

Protein Structure, Models of Sequence Evolution, and Data Type Effects in Phylogenetic Analyses of Mitochondrial Data: A Case Study in Birds

RNAlign2D: a rapid method for combined RNA structure and sequence-based alignment using a pseudo-amino acid substitution matrix

Packpred: Predicting the Functional Effect of Missense Mutations

An extension of Wang protein design model using Blosum62 substitution matrix

New amino acid substitution matrix brings sequence alignments into agreement with structure matches

Improved DNA-versus-Protein Homology Search for Protein Fossils

Current strategic limitations of phylogenetic tools badly impact the inference of an evolutionary tree

Packpred: Predicting the functional effect of missense mutations

On the Effects of Substitution Matrix Choices for Pairwise Gapped Global Sequence Alignment of DNA Nucleotides

substitution matrix
Recently Published Documents