scholarly journals Effect of Recombination on the Accuracy of the Likelihood Method for Detecting Positive Selection at Amino Acid Sites

Genetics ◽  
2003 ◽  
Vol 164 (3) ◽  
pp. 1229-1236 ◽  
Author(s):  
Maria Anisimova ◽  
Rasmus Nielsen ◽  
Ziheng Yang

AbstractMaximum-likelihood methods based on models of codon substitution accounting for heterogeneous selective pressures across sites have proved to be powerful in detecting positive selection in protein-coding DNA sequences. Those methods are phylogeny based and do not account for the effects of recombination. When recombination occurs, such as in population data, no unique tree topology can describe the evolutionary history of the whole sequence. This violation of assumptions raises serious concerns about the likelihood method for detecting positive selection. Here we use computer simulation to evaluate the reliability of the likelihood-ratio test (LRT) for positive selection in the presence of recombination. We examine three tests based on different models of variable selective pressures among sites. Sequences are simulated using a coalescent model with recombination and analyzed using codon-based likelihood models ignoring recombination. We find that the LRT is robust to low levels of recombination (with fewer than three recombination events in the history of a sample of 10 sequences). However, at higher levels of recombination, the type I error rate can be as high as 90%, especially when the null model in the LRT is unrealistic, and the test often mistakes recombination as evidence for positive selection. The test that compares the more realistic models M7 (β) against M8 (β and ω) is more robust to recombination, where the null model M7 allows the positive selection pressure to vary between 0 and 1 (and so does not account for positive selection), and the alternative model M8 allows an additional discrete class with ω= dN/dS that could be estimated to be >1 (and thus accounts for positive selection). Identification of sites under positive selection by the empirical Bayes method appears to be less affected than the LRT by recombination.

2020 ◽  
Author(s):  
Clayton M. Carey ◽  
Sarah E. Apple ◽  
Zoё A. Hilbert ◽  
Michael S. Kay ◽  
Nels C. Elde

AbstractThe pathogenesis of infectious diarrheal diseases is largely attributed to enterotoxin proteins that disrupt intestinal water absorption, causing severe dehydration. Despite profound health consequences, the impacts of diarrhea-causing microbes on the evolutionary history of host species are largely unknown. We investigated patterns of genetic variation in mammalian Guanylate Cyclase-C (GC-C), an intestinal receptor frequently targeted by bacterial enterotoxins, to determine how hosts might adapt in response to diarrheal infections. Under normal conditions, GC-C interacts with endogenous guanylin peptides to promote water secretion in the intestine, but signaling can be hijacked by bacterially-encoded heat-stable enterotoxins (STa) during infection, which leads to overstimulation of GC-C and diarrhea. Phylogenetic analysis in mammals revealed evidence of recurrent positive selection in the GC-C ligand-binding domain in primates and bats, consistent with selective pressures to evade interactions with STa. Using in vitro assays and transgenic intestinal organoids to model STa-mediated diarrhea, we show that GC-C diversification in these lineages results in substantial variation in toxin susceptibility. In bats, we observe a unique pattern of compensatory coevolution in the endogenous GC-C ligand uroguanylin, reflecting intense bouts of positive selection at the receptor-ligand interface. These findings demonstrate control of water physiology as a previously unrecognized interface for genetic conflict and reveal diarrheal pathogens as a source of selective pressure among diverse mammals.


2006 ◽  
Vol 34 (2) ◽  
pp. 257-262 ◽  
Author(s):  
C.A.M. Semple ◽  
K. Taylor ◽  
H. Eastwood ◽  
P.E. Barran ◽  
J.R. Dorin

We have examined the evolution of the genes at the major human β-defensin locus and the orthologous loci in a range of other primates and mammals. For the first time, these data allow us to examine selective episodes in the more recent evolutionary history of this locus as well as in the ancient past. We have used a combination of maximum-likelihood-based tests and a maximum-parsimony-based sliding window approach to give a detailed view of the varying modes of selection operating at this locus. We provide evidence for strong positive selection soon after the duplication of these genes within an ancestral mammalian genome. During the divergence of primates, however, variable selective pressures have acted on β-defensin genes in different evolutionary lineages, with episodes of both negative and, more rarely, positive selection. Positive selection appears to have been more common in the rodent lineage, accompanying the birth of novel rodent-specific β-defensin gene clades. Sites in the second exon have been subject to positive selection and, by implication, are important in functional diversity. A small number of sites in the mature human peptides were found to have undergone repeated episodes of selection in different primate lineages. Particular sites were consistently implicated by multiple methods at positions throughout the mature peptides. These sites are clustered at positions that are predicted to be important for the function of β-defensins.


Genetics ◽  
1995 ◽  
Vol 139 (2) ◽  
pp. 993-1005 ◽  
Author(s):  
Z Yang

Abstract We describe a model for the evolution of DNA sequences by nucleotide substitution, whereby nucleotide sites in the sequence evolve over time, whereas the rates of substitution are variable and correlated over sites. The temporal process used to describe substitutions between nucleotides is a continuous-time Markov process, with the four nucleotides as the states. The spatial process used to describe variation and dependence of substitution rates over sites is based on a serially correlated gamma distribution, i.e., an auto-gamma model assuming Markov-dependence of rates at adjacent sites. To achieve computational efficiency, we use several equal-probability categories to approximate the gamma distribution, and the result is an auto-discrete-gamma model for rates over sites. Correlation of rates at sites then is modeled by the Markov chain transition of rates at adjacent sites from one rate category to another, the states of the chain being the rate categories. Two versions of nonparametric models, which place no restrictions on the distributional forms of rates for sites, also are considered, assuming either independence or Markov dependence. The models are applied to data of a segment of mitochondrial genome from nine primate species. Model parameters are estimated by the maximum likelihood method, and models are compared by the likelihood ratio test. Tremendous variation of rates among sites in the sequence is revealed by the analyses, and when rate differences for different codon positions are appropriately accounted for in the models, substitution rates at adjacent sites are found to be strongly (positively) correlated. Robustness of the results to uncertainty of the phylogenetic tree linking the species is examined.


2017 ◽  
Author(s):  
Santiago Sánchez-Ramírez ◽  
Jean-Marc Moncalvo

AbstractMany different evolutionary processes may be responsible for explaining natural variation within genomes, some of which include natural selection at the molecular level and changes in population size. Fungi are highly adaptable organisms, and their relatively small genomes and short generation times make them pliable for evolutionary genomic studies. However, adaptation in wild populations has been relatively less documented compared to experimental or clinical studies. Here, we analyzed DNA sequences from 502 putative single-copy orthologous genes in 63 samples that represent seven recently diverged North American Amanita (jacksonii-complex) lineages. For each gene and each species, we measured the genealogical sorting index (gsi) and infinite-site-based summary statistics, such as , and DTaj in coding and intron regions. MKT-based approaches and likelihood-ratio-test Kn/Ks models were used to measure natural selection in all coding sequences. Multi-locus (Extended) Bayesian Skyline Plots (eBSP) were used to model intraspecific demographic changes through time based on unlinked, putative neutral regions (introns). Most genes show evidence of long-term purifying selection, likely reflecting a functional bias implicit in single-copy genes. We find that two species have strongly negatively skewed Tajima’s D, while three other have a positive skew, corresponding well with patterns of demographic expansion and contraction. Standard MKT analyses resulted in a high incidence of near-zero α with a tendency towards negative values. In contrast, α estimates based on the distribution of fitness effects (DFE), which accounts for demographic effects and slightly deleterious mutations, suggest a higher proportion of sites fixed by positive selection. The difference was more notorious in species with expansion signatures or with historically low population sizes, evidencing the concealing effects of specific demographic histories. Finally, we attempt to mitigate Gene Ontology term overrepresentation, highlighting the potential adaptive or ecological roles of some genes under positive selection.


2018 ◽  
Vol 35 (15) ◽  
pp. 2545-2554 ◽  
Author(s):  
Joseph Mingrone ◽  
Edward Susko ◽  
Joseph P Bielawski

Abstract Motivation Likelihood ratio tests are commonly used to test for positive selection acting on proteins. They are usually applied with thresholds for declaring a protein under positive selection determined from a chi-square or mixture of chi-square distributions. Although it is known that such distributions are not strictly justified due to the statistical irregularity of the problem, the hope has been that the resulting tests are conservative and do not lose much power in comparison with the same test using the unknown, correct threshold. We show that commonly used thresholds need not yield conservative tests, but instead give larger than expected Type I error rates. Statistical regularity can be restored by using a modified likelihood ratio test. Results We give theoretical results to prove that, if the number of sites is not too small, the modified likelihood ratio test gives approximately correct Type I error probabilities regardless of the parameter settings of the underlying null hypothesis. Simulations show that modification gives Type I error rates closer to those stated without a loss of power. The simulations also show that parameter estimation for mixture models of codon evolution can be challenging in certain data-generation settings with very different mixing distributions giving nearly identical site pattern distributions unless the number of taxa and tree length are large. Because mixture models are widely used for a variety of problems in molecular evolution, the challenges and general approaches to solving them presented here are applicable in a broader context. Availability and implementation https://github.com/jehops/codeml_modl Supplementary information Supplementary data are available at Bioinformatics online.


2012 ◽  
Vol 367 (1588) ◽  
pp. 483-492 ◽  
Author(s):  
J. N. Young ◽  
R. E. M. Rickaby ◽  
M. V. Kapralov ◽  
D. A. Filatov

Rubisco, the most abundant enzyme on the Earth and responsible for all photosynthetic carbon fixation, is often thought of as a highly conserved and sluggish enzyme. Yet, different algal Rubiscos demonstrate a range of kinetic properties hinting at a history of evolution and adaptation. Here, we show that algal Rubisco has indeed evolved adaptively during ancient and distinct geological periods. Using DNA sequences of extant marine algae of the red and Chromista lineage, we define positive selection within the large subunit of Rubisco, encoded by rbcL , to occur basal to the radiation of modern marine groups. This signal of positive selection appears to be responding to changing intracellular concentrations of carbon dioxide (CO 2 ) triggered by physiological adaptations to declining atmospheric CO 2 . Within the ecologically important Haptophyta (including coccolithophores) and Bacillariophyta (diatoms), positive selection occurred consistently during periods of falling Phanerozoic CO 2 and suggests emergence of carbon-concentrating mechanisms. During the Proterozoic, a strong signal of positive selection after secondary endosymbiosis occurs at the origin of the Chromista lineage (approx. 1.1 Ga), with further positive selection events until 0.41 Ga, implying a significant and continuous decrease in atmospheric CO 2 encompassing the Cryogenian Snowball Earth events. We surmise that positive selection in Rubisco has been caused by declines in atmospheric CO 2 and hence acts as a proxy for ancient atmospheric CO 2 .


Genome ◽  
2006 ◽  
Vol 49 (7) ◽  
pp. 767-776 ◽  
Author(s):  
Stéphane Aris-Brosou

Codon-based substitution models are routinely used to measure selective pressures acting on protein-coding genes. To this effect, the nonsynonymous to synonymous rate ratio (dN/dS = ω) is estimated. The proportion of amino-acid sites potentially under positive selection, as indicated by ω > 1, is inferred by fitting a probability distribution where some sites are permitted to have ω > 1. These sites are then inferred by means of an empirical Bayes or by a Bayes empirical Bayes approach that, respectively, ignores or accounts for sampling errors in maximum-likelihood estimates of the distribution used to infer the proportion of sites with ω > 1. Here, we extend a previous full-Bayes approach to include models with high power and low false-positive rates when inferring sites under positive selection. We propose some heuristics to alleviate the computational burden, and show that (i) full Bayes can be superior to empirical Bayes when analyzing a small data set or small simulated data, (ii) full Bayes has only a small advantage over Bayes empirical Bayes with our small test data, and (iii) Bayesian methods appear relatively insensitive to mild misspecifications of the random process generating adaptive evolution in our simulations, but in practice can prove extremely sensitive to model specification. We suggest that the codon model used to detect amino acids under selection should be carefully selected, for instance using Akaike information criterion (AIC).Key words: codon substitution models, empirical Bayes, Bayes empirical Bayes, full Bayes, ROC curves, AIC.


Author(s):  
Tupitsyn V.V. ◽  
Bataev Kh.M. ◽  
Men’shikova A.N. ◽  
Godina Z.N.

Relevance. Information about the cardiovascular diseases risk factors (CVD RF) for in men with chronic lung inflam-matory pathology (CLID) is contradictory and requires clarification. Aim. To evaluate the peculiarities of CVD RF in men under 60 years of age with CLID in myocardial infarction (MI) to improve prevention. Material and methods. The study included men aged 19-60 years old with type I myocardial infarction. Patients are divided into two age-comparable groups: I - the study group, with CLID - 142 patients; II - control, without it - 424 patients. A comparative analysis of the frequency of observation of the main and additional cardiovascular risk fac-tors in groups was performed. Results. In patients of the study group, more often than in the control group we observed: hereditary burden of is-chemic heart disease (40.8 and 31.6%, respectively; p = 0.0461) and arterial hypertension (54.2 and 44.6%; p = 0.0461), frequent colds (24.6 and 12.0%; p = 0.0003), a history of extrasystoles (19.7 and 12.7%; p = 0.04); chronic foci of infections of internal organs (75.4 and 29.5%; p˂0.0001), non-ulcer lesions of the digestive system (26.1 and 14.6%; p = 0.007), smoking (95.1 and 66.3%; p˂0.0001), MI in winter (40.8 and 25.9%; p = 0.006). Less commonly were observed: oral cavity infections (9.2 and 23.6%; p˂0.0001); hypodynamia (74.5 and 82.5%; p = 0.0358), over-weight (44.4 and 55.2%; p = 0.0136), a subjective relationship between the worsening of the course of coronary heart disease and the season of the year (43.7 and 55.2%; p = 0.0173) and MI - in the autumn (14.1 and 21.9%; p = 0.006) period. Conclusions. The structure of CVD RF in men under 60 years of age with CLID with MI is characterized by the pre-dominance of smoking, non-ulcer pathology of the digestive system, frequent pro-student diseases, meteorological dependence, a history of cardiac arrhythmias and foci of internal organ infections. It is advisable to use the listed factors when planning preventive measures in such patients.


2006 ◽  
Vol 194 (5) ◽  
pp. 552-560 ◽  
Author(s):  
Elizabeth Margaret Maloney ◽  
Yoshihisa Yamano ◽  
Paul C. VanVeldhuisen ◽  
Takashi Sawada ◽  
Norma Kim ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document