scholarly journals The ranging of amino acids substitution matrices of various types in accordance with the alignment accuracy criterion

2020 ◽  
Vol 21 (S11) ◽  
Author(s):  
Valery Polyanovsky ◽  
Alexander Lifanov ◽  
Natalia Esipova ◽  
Vladimir Tumanyan

Abstract Background The alignment of character sequences is important in bioinformatics. The quality of this procedure is determined by the substitution matrix and parameters of the insertion-deletion penalty function. These matrices are derived from sequence alignment and thus reflect the evolutionary process. Currently, in addition to evolutionary matrices, a large number of different background matrices have been obtained. To make an optimal choice of the substitution matrix and the penalty parameters, we conducted a numerical experiment using a representative sample of existing matrices of various types and origins. Results We tested both the classical evolutionary matrix series (PAM, Blosum, VTML, Pfasum); structural alignment based matrices, contact energy matrix, and matrix based on the properties of the genetic code. This study presents results for two test set types: first, we simulated sequences that reflect the divergent evolution; second, we performed tests on Balibase sequences. In both cases, we obtained the dependences of the alignment quality (Accuracy, Confidence) on the evolutionary distance between sequences and the evolutionary distance to which the substitution matrices correspond. Optimization of a combination of matrices and the penalty parameters was carried out for local and global alignment on the values of penalty function parameters. Consequently, we found that the best alignment quality is achieved with matrices corresponding to the largest evolutionary distance. These matrices prove to be universal, i.e. suitable for aligning sequences separated by both large and small evolutionary distances. We analysed the correspondence of the correlation coefficients of matrices to the alignment quality. It was found that matrices showing high quality alignment have an above average correlation value, but the converse is not true. Conclusions This study showed that the best alignment quality is achieved with evolutionary matrices designed for long distances: Gonnet, VTML250, PAM250, MIQS, and Pfasum050. The same property is inherent in matrices not only of evolutionary origin, but also of another background corresponding to a large evolutionary distance. Therefore, matrices based on structural data show alignment quality close enough to its value for evolutionary matrices. This agrees with the idea that the spatial structure is more conservative than the protein sequence.

2019 ◽  
Vol 36 (1) ◽  
pp. 104-111
Author(s):  
Shuichiro Makigaki ◽  
Takashi Ishida

Abstract Motivation Template-based modeling, the process of predicting the tertiary structure of a protein by using homologous protein structures, is useful if good templates can be found. Although modern homology detection methods can find remote homologs with high sensitivity, the accuracy of template-based models generated from homology-detection-based alignments is often lower than that from ideal alignments. Results In this study, we propose a new method that generates pairwise sequence alignments for more accurate template-based modeling. The proposed method trains a machine learning model using the structural alignment of known homologs. It is difficult to directly predict sequence alignments using machine learning. Thus, when calculating sequence alignments, instead of a fixed substitution matrix, this method dynamically predicts a substitution score from the trained model. We evaluate our method by carefully splitting the training and test datasets and comparing the predicted structure’s accuracy with that of state-of-the-art methods. Our method generates more accurate tertiary structure models than those produced from alignments obtained by other methods. Availability and implementation https://github.com/shuichiro-makigaki/exmachina. Supplementary information Supplementary data are available at Bioinformatics online.


2008 ◽  
Vol 82 (10) ◽  
pp. 4938-4945 ◽  
Author(s):  
Sergey Kryazhimskiy ◽  
Georgii A. Bazykin ◽  
Jonathan Dushoff

ABSTRACT Influenza A virus is one of the best-studied viruses and a model organism for the study of molecular evolution; in particular, much research has focused on detecting natural selection on influenza virus proteins. Here, we study the dynamics of the synonymous and nonsynonymous nucleotide composition of influenza A virus genes. In several genes, the nucleotide frequencies at synonymous positions drift away from the equilibria predicted from the synonymous substitution matrices. We investigate possible reasons for this unexpected behavior by fitting several regression models. Relaxation toward a mutation-selection equilibrium following a host jump fails to explain the dynamics of the synonymous nucleotide composition, even if we allow for slow temporal changes in the substitution matrix. Instead, we find that deep internal branches of the phylogeny show distinct patterns of nucleotide substitution and that these branches strongly influence the dynamics of nucleotide composition, suggesting that the observed trends are at least in part a result of natural selection acting on synonymous sites. Moreover, we find that the dynamics of the nucleotide composition at synonymous and nonsynonymous sites are highly correlated, providing evidence that even nonsynonymous sites can be influenced by selection pressure for nucleotide composition.


2019 ◽  
Author(s):  
Lidia Cabeza ◽  
Julie Giustiniani ◽  
Thibault Chabin ◽  
Bahrie Ramadan ◽  
Coralie Joucla ◽  
...  

AbstractDecision-making is a conserved evolutionary process enabling to choose one option among several alternatives, and relying on reward and cognitive control systems. The Iowa Gambling Task allows to assess human decision-making under uncertainty by presenting four cards decks with various cost-benefit probabilities. Participants seek to maximize their monetary gains by developing long-term optimal choice strategies. Animal versions have been adapted with nutritional rewards but interspecies data comparisons are still scarce. Our study directly compared physiological decision-making performances between humans and wild-type C57BL/6 mice. Human subjects fulfilled an electronic Iowa Gambling Task version while mice performed a maze-based adaptation with four arms baited in a probabilistic way. Our data show closely matching performances among species with similar patterns of choice behaviors. Moreover, both populations clustered into good, intermediate, and poor decision-making categories with similar proportions. Remarkably, mice good decision-makers behaved as humans of the same category, but slight differences among species have been evidenced for the other two subpopulations. Overall, our direct comparative study confirms the good face validity of the rodent gambling task. Extended behavioral characterization and pathological animal models should help strengthen its construct validity and disentangle determinants of decision-making in animals and humans.


2006 ◽  
Vol 65 (1) ◽  
pp. 32-39 ◽  
Author(s):  
Manoj Tyagi ◽  
Venkataraman S. Gowri ◽  
Narayanaswamy Srinivasan ◽  
Alexandre G. de Brevern ◽  
Bernard Offmann

2020 ◽  
Author(s):  
Tair Shauli ◽  
Nadav Brandes ◽  
Michal Linial

AbstractThe characterization of human genetic variation in coding regions is fundamental to our understanding of protein function, structure, and evolution. Amino-acid (AA) substitution matrices such as BLOSUM (BLOcks SUbstitution Matrix) and PAM (Point Accepted Mutations) encapsulate the stochastic nature of such proteomic variation and are used in studying protein families and evolutionary processes. However, these matrices were constructed from protein sequences spanning long evolutionary distances and are not designed to reflect polymorphism within species. To accurately represent proteomic variation within the human population, we constructed a set of human-centric substitution matrices derived from genetic variations by analyzing the frequencies of >4.8M single nucleotide variants (SNVs). These human-specific matrices expose short-term evolutionary trends at both codon and AA resolution and therefore present an evolutionary perspective that differs from that implicated in the traditional matrices. Specifically, our matrices consider the directionality of variants, and uncover a set of AA pairs that exhibit a strong tendency to substitute in a specific direction. We further demonstrate that the substitution rates of nucleotides only partially determine AA substitution rates. Finally, we investigate AA substitutions in post-translational modification (PTM) and ion-binding sites. We confirm a strong propensity towards conservation of the identity of the AA that participates in such functions. The empirically-derived human-specific substitution matrices expose purifying selection over a range of residue-based protein properties. The new substitution matrices provide a robust baseline for the analysis of protein variations in health and disease. The underlying methodology is available as an open-access to the biomedical community.


2006 ◽  
Vol 04 (03) ◽  
pp. 769-782 ◽  
Author(s):  
XIN LIU ◽  
WEI-MOU ZHENG

Amino acid substitution matrices play an essential role in protein sequence alignment, a fundamental task in bioinformatics. Most widely used matrices, such as PAM matrices derived from homologous sequences and BLOSUM matrices derived from aligned segments of PROSITE, did not integrate conformation information in their construction. There are a few structure-based matrices, which are derived from limited data of structure alignment. Using databases PDB_SELECT and DSSP, we create a database of sequence-conformation blocks which explicitly represent sequence-structure relationship. Members in a block are identical in conformation and are highly similar in sequence. From this block database, we derive a conformation-specific amino acid substitution matrix CBSM60. The matrix shows an improved performance in conformational segment search and homolog detection.


2016 ◽  
Vol 34 (4_suppl) ◽  
pp. 590-590
Author(s):  
Ulrich Popper ◽  
Holger Rumpold

590 Background: Carcinogenesis is considered to be an evolutionary process by natural selection of cell clones which have acquired advantageous heritable characteristics. This adaption process, which resembles Darwin’s evolutionary theory, has also been suggested as a potential mechanism promoting resistance to anti-cancer treatment. Methods: We performed whole exome sequencing of four colon cancers at five morphologically different loci of the primary tumor tissue. To investigate intratumor heterogeneity we conducted structural variant analysis, copy number analysis and ploidy profiling. We propose an evolutionary distance of the tumor samples based on a distance function in copy number space. Results: Copy number and phylogenetic analysis demonstrated substantial intratumor heterogeneity, with all 20 samples (5 loci for 4 tumors) possessing a unique copy number profile. Our analysis further revealed cancer-specific, tumor-specific and locus-specific copy number variants. We found that at least one copy of 18q, including the genes DCC and SMAD4, as well as 8p, carrying the CSMD1 gene, was missing in all tumors. Additionally, gains of chromosome 7,8q, 13q, 20q and losses of 4q and 21q were frequent. Using our distance measure in copy number space, phylogenetic analysis grouped together the five loci of one tumor and showed their evolutionary branching. We found that our distance measure in copy number space is superior to a distance measure in SNV space for colon cancer. Conclusions: Our analysis reveals that the landscape of the tumor genome cannot be captured by single tumor-biopsy, which is one of the challenges to personalized-medicine as heterogeneity can lead to therapeutic failures. However, there can be characteristics of the tumor genomes that are stable across samples which can potentially be used as biomarkers. Nonetheless variability of copy number profiles of colorectal cancers might suggest a personalized therapy.


Cells ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 2425
Author(s):  
Eduardo Gorab

Background: Dipterans exhibit a remarkable diversity of chromosome end structures in contrast to the conserved system defined by telomerase and short repeats. Within dipteran families, structure of chromosome termini is usually conserved within genera. With the aim to assess whether or not the evolutionary distance between genera implies chromosome end diversification, this report exploits two representatives of Sciaridae, Rhynchosciara americana, and Trichomegalosphys pubescens. Methods: Probes and plasmid microlibraries obtained by chromosome end microdissection, in situ hybridization, cloning, and sequencing are among the methodological approaches employed in this work. Results: The data argue for the existence of either specific terminal DNA sequences for each chromosome tip in T. pubescens, or sequences common to all chromosome ends but their extension does not allow detection by in situ hybridization. Both sciarid species share terminal sequences that are significantly underrepresented in chromosome ends of T. pubescens. Conclusions: The data suggest an unusual terminal structure in T. pubescens chromosomes compared to other dipterans investigated. A putative, evolutionary process of repetitive DNA expansion that acted differentially to shape chromosome ends of the two flies is also discussed.


2020 ◽  
Author(s):  
Xiaoli Chen ◽  
Nabila Shahnaz Khan ◽  
Shaojie Zhang

Abstract A fast-growing number of non-coding RNA structures have been resolved and deposited in Protein Data Bank (PDB). In contrast to the wide range of global alignment and motif search tools, there is still a lack of local alignment tools. Among all the global alignment tools for RNA 3D structures, STAR3D has become a valuable tool for its unprecedented speed and accuracy. STAR3D compares the 3D structures of RNA molecules using consecutive base-pairs (stacks) as anchors and generates an optimal global alignment. In this article, we developed a local RNA 3D structural alignment tool, named LocalSTAR3D, which was extended from STAR3D and designed to report multiple local alignments between two RNAs. The benchmarking results show that LocalSTAR3D has better accuracy and coverage than other local alignment tools. Furthermore, the utility of this tool has been demonstrated by rediscovering kink-turn motif instances, conserved domains in group II intron RNAs, and the tRNA mimicry of IRES RNAs.


Sign in / Sign up

Export Citation Format

Share Document