scholarly journals Maximum likelihood estimation of natural selection and allele age from time series data of allele frequencies

2019 ◽  
Author(s):  
Zhangyi He ◽  
Xiaoyang Dai ◽  
Mark Beaumont ◽  
Feng Yu

AbstractTemporally spaced genetic data allow for more accurate inference of population genetic parameters and hypothesis testing on the recent action of natural selection. In this work, we develop a novel likelihood-based method for jointly estimating selection coefficient and allele age from time series data of allele frequencies. Our approach is based on a hidden Markov model where the underlying process is a Wright-Fisher diffusion conditioned to survive until the time of the most recent sample. This formulation circumvents the assumption required in existing methods that the allele is created by mutation at a certain low frequency. We calculate the likelihood by numerically solving the resulting Kolmogorov backward equation backwards in time while re-weighting the solution with the emission probabilities of the observation at each sampling time point. This procedure reduces the two-dimensional numerical search for the maximum of the likelihood surface for both the selection coefficient and the allele age to a one-dimensional search over the selection coefficient only. We illustrate through extensive simulations that our method can produce accurate estimates of the selection coefficient and the allele age under both constant and non-constant demographic histories. We apply our approach to re-analyse ancient DNA data associated with horse base coat colours. We find that ignoring demographic histories or grouping raw samples can significantly bias the inference results.

Genetics ◽  
2020 ◽  
Vol 216 (2) ◽  
pp. 463-480
Author(s):  
Zhangyi He ◽  
Xiaoyang Dai ◽  
Mark Beaumont ◽  
Feng Yu

Temporally spaced genetic data allow for more accurate inference of population genetic parameters and hypothesis testing on the recent action of natural selection. In this work, we develop a novel likelihood-based method for jointly estimating selection coefficient and allele age from time series data of allele frequencies. Our approach is based on a hidden Markov model where the underlying process is a Wright-Fisher diffusion conditioned to survive until the time of the most recent sample. This formulation circumvents the assumption required in existing methods that the allele is created by mutation at a certain low frequency. We calculate the likelihood by numerically solving the resulting Kolmogorov backward equation backward in time while reweighting the solution with the emission probabilities of the observation at each sampling time point. This procedure reduces the two-dimensional numerical search for the maximum of the likelihood surface, for both the selection coefficient and the allele age, to a one-dimensional search over the selection coefficient only. We illustrate through extensive simulations that our method can produce accurate estimates of the selection coefficient and the allele age under both constant and nonconstant demographic histories. We apply our approach to reanalyze ancient DNA data associated with horse base coat colors. We find that ignoring demographic histories or grouping raw samples can significantly bias the inference results.


Genetics ◽  
2020 ◽  
Vol 216 (2) ◽  
pp. 521-541
Author(s):  
Zhangyi He ◽  
Xiaoyang Dai ◽  
Mark Beaumont ◽  
Feng Yu

Recent advances in DNA sequencing techniques have made it possible to monitor genomes in great detail over time. This improvement provides an opportunity for us to study natural selection based on time serial samples of genomes while accounting for genetic recombination effect and local linkage information. Such time series genomic data allow for more accurate estimation of population genetic parameters and hypothesis testing on the recent action of natural selection. In this work, we develop a novel Bayesian statistical framework for inferring natural selection at a pair of linked loci by capitalising on the temporal aspect of DNA data with the additional flexibility of modeling the sampled chromosomes that contain unknown alleles. Our approach is built on a hidden Markov model where the underlying process is a two-locus Wright-Fisher diffusion with selection, which enables us to explicitly model genetic recombination and local linkage. The posterior probability distribution for selection coefficients is computed by applying the particle marginal Metropolis-Hastings algorithm, which allows us to efficiently calculate the likelihood. We evaluate the performance of our Bayesian inference procedure through extensive simulations, showing that our approach can deliver accurate estimates of selection coefficients, and the addition of genetic recombination and local linkage brings about significant improvement in the inference of natural selection. We also illustrate the utility of our method on real data with an application to ancient DNA data associated with white spotting patterns in horses.


2019 ◽  
Author(s):  
Zhangyi He ◽  
Xiaoyang Dai ◽  
Mark Beaumont ◽  
Feng Yu

AbstractRecent advances in DNA sequencing techniques have made it possible to monitor genomes in great detail over time. This improvement provides an opportunity for us to study natural selection based on time serial samples of genomes while accounting for genetic recombination effect and local linkage information. Such genomic time series data allow for more accurate estimation of population genetic parameters and hypothesis testing on the recent action of natural selection. In this work, we develop a novel Bayesian statistical framework for inferring natural selection at a pair of linked loci by capitalising on the temporal aspect of DNA data with the additional flexibility of modelling the sampled chromosomes that contain unknown alleles. Our approach is based on a hidden Markov model where the underlying process is a two-locus Wright-Fisher diffusion with selection, which enables us to explicitly model genetic recombination and local linkage. The posterior probability distribution for the selection coefficients is obtained by using the particle marginal Metropolis-Hastings algorithm, which allows us to efficiently calculate the likelihood. We evaluate the performance of our Bayesian inference procedure through extensive simulations, showing that our method can deliver accurate estimates of selection coefficients, and the addition of genetic recombination and local linkage brings about significant improvement in the inference of natural selection. We illustrate the utility of our approach on real data with an application to ancient DNA data associated with white spotting patterns in horses.


2020 ◽  
Vol 2020 ◽  
pp. 1-10 ◽  
Author(s):  
Hao Du ◽  
Hao Gong ◽  
Suyue Han ◽  
Peng Zheng ◽  
Bin Liu ◽  
...  

Reconstruction of realistic economic data often causes social economists to analyze the underlying driving factors in time-series data or to study volatility. The intrinsic complexity of time-series data interests and attracts social economists. This paper proposes the bilateral permutation entropy (BPE) index method to solve the problem based on partly ensemble empirical mode decomposition (PEEMD), which was proposed as a novel data analysis method for nonlinear and nonstationary time series compared with the T-test method. First, PEEMD is extended to the case of gold price analysis in this paper for decomposition into several independent intrinsic mode functions (IMFs), from high to low frequency. Second, IMFs comprise three parts, including a high-frequency part, low-frequency part, and the whole trend based on a fine-to-coarse reconstruction by the BPE index method and the T-test method. Then, this paper conducts a correlation analysis on the basis of the reconstructed data and the related affected macroeconomic factors, including global gold production, world crude oil prices, and world inflation. Finally, the BPE index method is evidently a vitally significant technique for time-series data analysis in terms of reconstructed IMFs to obtain realistic data.


2020 ◽  
Author(s):  
Iain Mathieson

AbstractTime series data of allele frequencies are a powerful resource for detecting and classifying natural and artificial selection. Ancient DNA now allows us to observe these trajectories in natural populations of long-lived species such as humans. Here, we develop a hidden Markov model to infer selection coefficients that vary over time. We show through simulations that our approach can accurately estimate both selection coefficients and the timing of changes in selection. Finally, we analyze some of the strongest signals of selection in the human genome using ancient DNA. We show that the European lactase persistence mutation was selected over the past 5,000 years with a selection coefficient of 2-2.5% in Britain, Central Europe and Iberia, but not Italy. In northern East Asia, selection at the ADH1B locus associated with alcohol metabolism intensified around 4,000 years ago, approximately coinciding with the introduction of rice-based agriculture. Finally, a derived allele at the FADS locus was selected in parallel in both Europe and East Asia, as previously hypothesized. Our approach is broadly applicable to both natural and experimental evolution data and shows how time series data can be used to resolve fine-scale details of selection.


Econometrics ◽  
2019 ◽  
Vol 7 (4) ◽  
pp. 43 ◽  
Author(s):  
Harry Joe

For modeling count time series data, one class of models is generalized integer autoregressive of order p based on thinning operators. It is shown how numerical maximum likelihood estimation is possible by inverting the probability generating function of the conditional distribution of an observation given the past p observations. Two data examples are included and show that thinning operators based on compounding can substantially improve the model fit compared with the commonly used binomial thinning operator.


2016 ◽  
Author(s):  
Angelo Valleriani

AbstractTime-series of allele frequencies are a useful and unique set of data to determine the strength of natural selection on the background of genetic drift. Technically, the selection coefficient is estimated by means of a likelihood function built under the hypothesis that the available trajectory spans a sufficiently large portion of the fitness landscape. Especially for ancient DNA, however, often only one single such trajectories is available and the coverage of the fitness landscape is very limited. In fact, one single trajectory is more representative of a process conditioned both in the initial and in the final condition than of a process free to visit the available fitness landscape. Based on two models of population genetics, here we show how to build a likelihood function for the selection coefficient that takes the statistical peculiarity of single trajectories into account. We show that this conditional likelihood delivers a precise estimate of the selection coefficient also when allele frequencies are close to fixation whereas the unconditioned likelihood fails. Finally, we discuss the fact that the traditional, unconditioned likelihood always delivers an answer, which is often unfalsifiable and appears reasonable also when it is not correct.


Sign in / Sign up

Export Citation Format

Share Document