scholarly journals Detecting and Quantifying Natural Selection at Two Linked Loci from Time Series Data of Allele Frequencies with Forward-in-Time Simulations

Genetics ◽  
2020 ◽  
Vol 216 (2) ◽  
pp. 521-541
Author(s):  
Zhangyi He ◽  
Xiaoyang Dai ◽  
Mark Beaumont ◽  
Feng Yu

Recent advances in DNA sequencing techniques have made it possible to monitor genomes in great detail over time. This improvement provides an opportunity for us to study natural selection based on time serial samples of genomes while accounting for genetic recombination effect and local linkage information. Such time series genomic data allow for more accurate estimation of population genetic parameters and hypothesis testing on the recent action of natural selection. In this work, we develop a novel Bayesian statistical framework for inferring natural selection at a pair of linked loci by capitalising on the temporal aspect of DNA data with the additional flexibility of modeling the sampled chromosomes that contain unknown alleles. Our approach is built on a hidden Markov model where the underlying process is a two-locus Wright-Fisher diffusion with selection, which enables us to explicitly model genetic recombination and local linkage. The posterior probability distribution for selection coefficients is computed by applying the particle marginal Metropolis-Hastings algorithm, which allows us to efficiently calculate the likelihood. We evaluate the performance of our Bayesian inference procedure through extensive simulations, showing that our approach can deliver accurate estimates of selection coefficients, and the addition of genetic recombination and local linkage brings about significant improvement in the inference of natural selection. We also illustrate the utility of our method on real data with an application to ancient DNA data associated with white spotting patterns in horses.

2019 ◽  
Author(s):  
Zhangyi He ◽  
Xiaoyang Dai ◽  
Mark Beaumont ◽  
Feng Yu

AbstractRecent advances in DNA sequencing techniques have made it possible to monitor genomes in great detail over time. This improvement provides an opportunity for us to study natural selection based on time serial samples of genomes while accounting for genetic recombination effect and local linkage information. Such genomic time series data allow for more accurate estimation of population genetic parameters and hypothesis testing on the recent action of natural selection. In this work, we develop a novel Bayesian statistical framework for inferring natural selection at a pair of linked loci by capitalising on the temporal aspect of DNA data with the additional flexibility of modelling the sampled chromosomes that contain unknown alleles. Our approach is based on a hidden Markov model where the underlying process is a two-locus Wright-Fisher diffusion with selection, which enables us to explicitly model genetic recombination and local linkage. The posterior probability distribution for the selection coefficients is obtained by using the particle marginal Metropolis-Hastings algorithm, which allows us to efficiently calculate the likelihood. We evaluate the performance of our Bayesian inference procedure through extensive simulations, showing that our method can deliver accurate estimates of selection coefficients, and the addition of genetic recombination and local linkage brings about significant improvement in the inference of natural selection. We illustrate the utility of our approach on real data with an application to ancient DNA data associated with white spotting patterns in horses.


Genetics ◽  
2020 ◽  
Vol 216 (2) ◽  
pp. 463-480
Author(s):  
Zhangyi He ◽  
Xiaoyang Dai ◽  
Mark Beaumont ◽  
Feng Yu

Temporally spaced genetic data allow for more accurate inference of population genetic parameters and hypothesis testing on the recent action of natural selection. In this work, we develop a novel likelihood-based method for jointly estimating selection coefficient and allele age from time series data of allele frequencies. Our approach is based on a hidden Markov model where the underlying process is a Wright-Fisher diffusion conditioned to survive until the time of the most recent sample. This formulation circumvents the assumption required in existing methods that the allele is created by mutation at a certain low frequency. We calculate the likelihood by numerically solving the resulting Kolmogorov backward equation backward in time while reweighting the solution with the emission probabilities of the observation at each sampling time point. This procedure reduces the two-dimensional numerical search for the maximum of the likelihood surface, for both the selection coefficient and the allele age, to a one-dimensional search over the selection coefficient only. We illustrate through extensive simulations that our method can produce accurate estimates of the selection coefficient and the allele age under both constant and nonconstant demographic histories. We apply our approach to reanalyze ancient DNA data associated with horse base coat colors. We find that ignoring demographic histories or grouping raw samples can significantly bias the inference results.


2019 ◽  
Author(s):  
Zhangyi He ◽  
Xiaoyang Dai ◽  
Mark Beaumont ◽  
Feng Yu

AbstractTemporally spaced genetic data allow for more accurate inference of population genetic parameters and hypothesis testing on the recent action of natural selection. In this work, we develop a novel likelihood-based method for jointly estimating selection coefficient and allele age from time series data of allele frequencies. Our approach is based on a hidden Markov model where the underlying process is a Wright-Fisher diffusion conditioned to survive until the time of the most recent sample. This formulation circumvents the assumption required in existing methods that the allele is created by mutation at a certain low frequency. We calculate the likelihood by numerically solving the resulting Kolmogorov backward equation backwards in time while re-weighting the solution with the emission probabilities of the observation at each sampling time point. This procedure reduces the two-dimensional numerical search for the maximum of the likelihood surface for both the selection coefficient and the allele age to a one-dimensional search over the selection coefficient only. We illustrate through extensive simulations that our method can produce accurate estimates of the selection coefficient and the allele age under both constant and non-constant demographic histories. We apply our approach to re-analyse ancient DNA data associated with horse base coat colours. We find that ignoring demographic histories or grouping raw samples can significantly bias the inference results.


Algorithms ◽  
2020 ◽  
Vol 13 (4) ◽  
pp. 95 ◽  
Author(s):  
Johannes Stübinger ◽  
Katharina Adler

This paper develops the generalized causality algorithm and applies it to a multitude of data from the fields of economics and finance. Specifically, our parameter-free algorithm efficiently determines the optimal non-linear mapping and identifies varying lead–lag effects between two given time series. This procedure allows an elastic adjustment of the time axis to find similar but phase-shifted sequences—structural breaks in their relationship are also captured. A large-scale simulation study validates the outperformance in the vast majority of parameter constellations in terms of efficiency, robustness, and feasibility. Finally, the presented methodology is applied to real data from the areas of macroeconomics, finance, and metal. Highest similarity show the pairs of gross domestic product and consumer price index (macroeconomics), S&P 500 index and Deutscher Aktienindex (finance), as well as gold and silver (metal). In addition, the algorithm takes full use of its flexibility and identifies both various structural breaks and regime patterns over time, which are (partly) well documented in the literature.


2019 ◽  
Vol 10 (3) ◽  
pp. 915
Author(s):  
Ali Ebrahimi Ghahnavieh

Every player in the market has a greater need to know about the smallest change in the market. Therefore, the ability to see what is ahead is a valuable advantage. The purpose of this research is to make an attempt to understand the behavioral patterns and try to find a new hybrid forecasting approach based on ARIMA-ANN for estimating styrene price. The time series analysis and forecasting is an essential tool which could be widely useful for finding the significant characteristics for making future decisions. In this study ARIMA, ANN and Hybrid ARIMA-ANN models were applied to evaluate the previous behavior of a time series data, in order to make interpretations about its future behavior for styrene price. Experimental results with real data sets show that the combined model can be most suitable to improve forecasting accurateness rather than traditional time series forecasting methodologies. As a subset of the literature, the small number of studies have been done to realize the new forecasting methods for forecasting styrene price.


2020 ◽  
Vol 15 (3) ◽  
pp. 225-237
Author(s):  
Saurabh Kumar ◽  
Jitendra Kumar ◽  
Vikas Kumar Sharma ◽  
Varun Agiwal

This paper deals with the problem of modelling time series data with structural breaks occur at multiple time points that may result in varying order of the model at every structural break. A flexible and generalized class of Autoregressive (AR) models with multiple structural breaks is proposed for modelling in such situations. Estimation of model parameters are discussed in both classical and Bayesian frameworks. Since the joint posterior of the parameters is not analytically tractable, we employ a Markov Chain Monte Carlo method, Gibbs sampling to simulate posterior sample. To verify the order change, a hypotheses test is constructed using posterior probability and compared with that of without breaks. The methodologies proposed here are illustrated by means of simulation study and a real data analysis.


2020 ◽  
Vol 1 (3) ◽  
pp. 1-7
Author(s):  
Oleg Kobylin ◽  
Vyacheslav Lyashenko

Time series is one of the forms of data presentation that is used in many studies. It is convenient, easy and informative. Clustering is one of the tasks of data processing. Thus, the most relevant currently are methods for clustering time series. Clustering time series data aims to create clusters with high similarity within a cluster and low similarity between clusters. This work is devoted to clustering time series. Various methods of time series clustering are considered. Examples are given for real data.


2014 ◽  
Vol 926-930 ◽  
pp. 1886-1889
Author(s):  
Bo Tian ◽  
Dian Hong Wang ◽  
Fen Xiong Chen ◽  
Zheng Pu Zhang

This paper presents a new algorithm for the detection of abnormal events in Wireless Sensor Networks (WSN). Abnormal events are sets of data points that correspond to interesting patterns in the underlying phenomenon that the network monitors. This algorithm is inspired from time-series data mining techniques and transforms a stream of sensor readings into an Extension Temporal Edge Operator (ETEO) of time series pattern representation, and then extracts the three eigenvalue of each sub-pattern, that is, patterns length, patterns slope and patterns mean to map time series to feature space, and finally uses local outlier factor to detect abnormal pattern in this feature space. Experiments on synthetic and real data show that the definition of pattern outlier is reasonable and this algorithm is efficient to detect outliers in WSN.


2010 ◽  
Vol 8 (3) ◽  
pp. 263 ◽  
Author(s):  
Pedro Alberto Morettin ◽  
Clélia Maria De Castro Toloi ◽  
Chang Chiann ◽  
José Carlos Simon De Miranda

We introduce copula estimators based on wavelet smoothing of empirical copulas for the case of time series data. We then study the properties of this estimator via simulations and compare its performance with other estimators. Applications to real data are also given.


Sign in / Sign up

Export Citation Format

Share Document