scholarly journals ADOPS - Automatic Detection Of Positively Selected Sites

2012 ◽  
Vol 9 (3) ◽  
pp. 18-32 ◽  
Author(s):  
David Reboiro-Jato ◽  
Miguel Reboiro-Jato ◽  
Florentino Fdez-Riverola ◽  
Cristina P. Vieira ◽  
Nuno A. Fonseca ◽  
...  

Summary Maximum-likelihood methods based on models of codon substitution have been widely used to infer positively selected amino acid sites that are responsible for adaptive changes. Nevertheless, in order to use such an approach, software applications are required to align protein and DNA sequences, infer a phylogenetic tree and run the maximum-likelihood models. Therefore, a significant effort is made in order to prepare input files for the different software applications and in the analysis of the output of every analysis. In this paper we present the ADOPS (Automatic Detection Of Positively Selected Sites) software. It was developed with the goal of providing an automatic and flexible tool for detecting positively selected sites given a set of unaligned nucleotide sequence data. An example of the usefulness of such a pipeline is given by showing, under different conditions, positively selected amino acid sites in a set of 54 Coffea putative S-RNase sequences. ADOPS software is freely available and can be downloaded from http://sing.ei.uvigo.es/ADOPS.

Genetics ◽  
2000 ◽  
Vol 155 (1) ◽  
pp. 431-449 ◽  
Author(s):  
Ziheng Yang ◽  
Rasmus Nielsen ◽  
Nick Goldman ◽  
Anne-Mette Krabbe Pedersen

AbstractComparison of relative fixation rates of synonymous (silent) and nonsynonymous (amino acid-altering) mutations provides a means for understanding the mechanisms of molecular sequence evolution. The nonsynonymous/synonymous rate ratio (ω = dN/dS) is an important indicator of selective pressure at the protein level, with ω = 1 meaning neutral mutations, ω < 1 purifying selection, and ω > 1 diversifying positive selection. Amino acid sites in a protein are expected to be under different selective pressures and have different underlying ω ratios. We develop models that account for heterogeneous ω ratios among amino acid sites and apply them to phylogenetic analyses of protein-coding DNA sequences. These models are useful for testing for adaptive molecular evolution and identifying amino acid sites under diversifying selection. Ten data sets of genes from nuclear, mitochondrial, and viral genomes are analyzed to estimate the distributions of ω among sites. In all data sets analyzed, the selective pressure indicated by the ω ratio is found to be highly heterogeneous among sites. Previously unsuspected Darwinian selection is detected in several genes in which the average ω ratio across sites is <1, but in which some sites are clearly under diversifying selection with ω > 1. Genes undergoing positive selection include the β-globin gene from vertebrates, mitochondrial protein-coding genes from hominoids, the hemagglutinin (HA) gene from human influenza virus A, and HIV-1 env, vif, and pol genes. Tests for the presence of positively selected sites and their subsequent identification appear quite robust to the specific distributional form assumed for ω and can be achieved using any of several models we implement. However, we encountered difficulties in estimating the precise distribution of ω among sites from real data sets.


1999 ◽  
Vol 26 (5) ◽  
pp. 495 ◽  
Author(s):  
Kazumasa Yoshida ◽  
Kiyoshi Tazaki

Three genomic clones (Rplec2, Rplec5 and Rplec6) and a cDNA clone (LECRPA4) that encoded lectin or lectin-related polypeptides were isolated from Robinia pseudoacacia L. A comparison of the nucleotide sequences of Rplec2 and a previously reported cDNA for the subunit indicated that Rplec2 encoded the 29 kDa subunit of the inner-bark lectin RPbAI. Rplec5 encoded a polypeptide whose deduced amino acid sequence was 96.1% identical to that of a subunit of seed lectin. The amino acid sequence deduced from the open reading frame of Rplec6 showed 61.1% identity to that encoded by Rplec5. LECRPA4 was isolated from an inner bark cDNA library and appeared to encode the 26 kDa subunit of inner-bark lectin RPbAII. The expression patterns of the various genes in tissues were examined by the reverse transcriptase-polymerase chain reaction (RT-PCR) with appropriate primers. Rplec2 transcripts were detected in the inner bark and roots. Rplec5 transcripts were detected in the inner bark, seeds and roots. No Rplec6 transcripts were detected in all tissues examined. LECRPA4 transcripts were found in leaves and in the inner bark. The level of expression of Rplec2 in the inner bark appeared to be similar in samples collected in different years and from different trees, whereas levels of expression of Rplec5 and LECRPA4 varied. These results suggest the differential regulation of expression of members of the lectin gene family in tissues of R. pseudoacacia. The nucleotide sequence data reported herein will appear in the DDBJ, EMBL and GenBank Nucleotide Sequence Databases under the accession numbers AB 012632 (Rplec2), AB012633 (Rplec5), AB012634 (Rplec6) and AB012635 (LECRPA4).


1993 ◽  
Vol 4 (3) ◽  
pp. 287-292 ◽  
Author(s):  
D.L. Kauffman ◽  
P.J. Keller ◽  
A. Bennick ◽  
M. Blum

Human proline-rich proteins (PRPs) constitute a complex family of salivary proteins that are encoded by a small number of genes. The primary gene product is cleaved by proteases, thereby giving rise to about 20 secreted proteins. To determine the genes for the secreted PRPs, therefore, it is necessary to obtain sequences of both the secreted proteins and the DNA encoding these proteins. We have sequenced most PRPs from one donor (D.K.) and aligned the protein sequences with available DNA sequences from unrelated individuals. Partial sequence data have now been obtained for an additional PRP from D.K. named II-1. This protein was purified from parotid saliva by gel filtration and ion-exchange chromatography. Peptides were obtained by cleavage with trypsin, clostripain, and N-bromosuccinimide, followed by column chromatography. The peptides were sequenced on a gas-phase protein sequenator. Overlapping peptide sequences were obtained for most of II-1 and aligned with translated DNA sequences. The best fit was obtained with clones containing sequences for the allele PRB4" (Lyons et al., 1988). However, there was not complete identity of the protein amino acid sequence and the DNA-derived sequences, indicating that II-1 is not encoded by PRB4". Other PRPs isolated from D.K. also fail to conform to any DNA structure so far reported. This shows the need to obtain amino acid sequences and corresponding DNA sequences from the same person to assign genes for the PRPs and to determine the location of the postribosomal cleavage points in the primary translation product.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
A. V. Stolyarova ◽  
E. Nabieva ◽  
V. V. Ptushenko ◽  
A. V. Favorov ◽  
A. V. Popova ◽  
...  

Abstract Amino acid propensities at a site change in the course of protein evolution. This may happen for two reasons. Changes may be triggered by substitutions at epistatically interacting sites elsewhere in the genome. Alternatively, they may arise due to environmental changes that are external to the genome. Here, we design a framework for distinguishing between these alternatives. Using analytical modelling and simulations, we show that they cause opposite dynamics of the fitness of the allele currently occupying the site: it tends to increase with the time since its origin due to epistasis (“entrenchment”), but to decrease due to random environmental fluctuations (“senescence”). By analysing the genomes of vertebrates and insects, we show that the amino acids originating at negatively selected sites experience strong entrenchment. By contrast, the amino acids originating at positively selected sites experience senescence. We propose that senescence of the current allele is a cause of adaptive evolution.


2018 ◽  
Author(s):  
Lys Sanz Moreta ◽  
Rute Andreia Rodrigues da Fonseca

ABSTRACTThe visualization of the molecular context of an amino acid mutation in a protein structure is crucial for the assessment of its functional impact and to understand its evolutionary implications. Currently, searches for fast evolving amino acid positions using codon substitution models like those implemented in PAML (Z. Yang, 2000) are done in almost complete proteomes, generating large numbers of candidate proteins that require individual structural analyses. Here I present a python wrapper script that integrates the output of PAML with the PyMOL visualization tool to automate the generation of protein structure models where positively selected sites are mapped along with the location of putative functional domains.


Genetics ◽  
2002 ◽  
Vol 161 (1) ◽  
pp. 447-459 ◽  
Author(s):  
Hua Tang ◽  
David O Siegmund ◽  
Peidong Shen ◽  
Peter J Oefner ◽  
Marcus W Feldman

AbstractThis article proposes a method of estimating the time to the most recent common ancestor (TMRCA) of a sample of DNA sequences. The method is based on the molecular clock hypothesis, but avoids assumptions about population structure. Simulations show that in a wide range of situations, the point estimate has small bias and the confidence interval has at least the nominal coverage probability. We discuss conditions that can lead to biased estimates. Performance of this estimator is compared with existing methods based on the coalescence theory. The method is applied to sequences of Y chromosomes and mtDNAs to estimate the coalescent times of human male and female populations.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Yekaterina Shulgina ◽  
Sean R Eddy

The genetic code has been proposed to be a 'frozen accident', but the discovery of alternative genetic codes over the past four decades has shown that it can evolve to some degree. Since most examples were found anecdotally, it is difficult to draw general conclusions about the evolutionary trajectories of codon reassignment and why some codons are affected more frequently. To fill in the diversity of genetic codes, we developed Codetta, a computational method to predict the amino acid decoding of each codon from nucleotide sequence data. We surveyed the genetic code usage of over 250,000 bacterial and archaeal genome sequences in GenBank and discovered five new reassignments of arginine codons (AGG, CGA, and CGG), representing the first sense codon changes in bacteria. In a clade of uncultivated Bacilli, the reassignment of AGG to become the dominant methionine codon likely evolved by a change in the amino acid charging of an arginine tRNA. The reassignments of CGA and/or CGG were found in genomes with low GC content, an evolutionary force which likely helped drive these codons to low frequency and enable their reassignment.


2005 ◽  
Vol 79 (11) ◽  
pp. 7014-7023 ◽  
Author(s):  
Zigui Chen ◽  
Masanori Terai ◽  
Leiping Fu ◽  
Rolando Herrero ◽  
Rob DeSalle ◽  
...  

ABSTRACT Human papillomavirus type 16 (HPV16) is the primary etiological agent of cervical cancer, the second most common cancer in women worldwide. Complete genomes of 12 isolates representing the major lineages of HPV16 were cloned and sequenced from cervicovaginal cells. The sequence variations within the open reading frames (ORFs) and noncoding regions were identified and compared with the HPV16R reference sequence (50). This whole-genome approach gives us unprecedented precision in detailing sequence-level changes that are under selection on a whole-viral-genome scale. Of 7,908 base pair nucleotide positions, 313 (4.0%) were variable. Within the 2,452 amino acids (aa) comprising 8 ORFs, 243 (9.9%) amino acid positions were variable. In order to investigate the molecular evolution of HPV16 variants, maximum likelihood models of codon substitution were used to identify lineages and amino acid sites under selective pressure. Five codon sites in the E5 (aa 48, 65) and E6 (aa 10, 14, 83) ORFs were demonstrated to be under diversifying selective pressure. The E5 ORF had the overall highest nonsynonymous/synonymous substitution rate (ω) ratio (M3 = 0.7965). The E2 gene had the next-highest ω ratio (M3 = 0.5611); however, no specific codons were under positive selection. These data indicate that the E6 and E5 ORFs are evolving under positive Darwinian selection and have done so in a relatively short time period. Whether response to selective pressure upon the E5 and E6 ORFs contributes to the biological success of HPV16, its specific biological niche, and/or its oncogenic potential remains to be established.


Sign in / Sign up

Export Citation Format

Share Document