scholarly journals An Approximate Markov Model for the Wright-Fisher Diffusion

2015 ◽  
Author(s):  
Anna Ferrer-Admetlla ◽  
Christoph Leuenberger ◽  
Jeffrey D Jensen ◽  
Daniel Wegmann

The joint and accurate inference of selection and demography from genetic data is considered a particularly challenging question in population genetics, since both process may lead to very similar patterns of genetic diversity. However, additional information for disentangling these effects may be obtained by observing changes in allele frequencies over multiple time points. Such data is common in experimental evolution studies, as well as in the comparison of ancient and contemporary samples. Leveraging this information, however, has been computationally challenging, particularly when considering multi-locus data sets. To overcome these issues, we introduce a novel, discrete approximation for diffusion processes, termed \textit{mean transition time approximation}, which preserves the long-term behavior of the underlying continuous diffusion process. We then derive this approximation for the particular case of inferring selection and demography from time series data under the classic Wright-Fisher model and demonstrate that our approximation is well suited to describe allele trajectories through time, even when only a few states are used. We then develop a Bayesian inference approach to jointly infer the population size and locus-specific selection coefficients with high accuracy, and further extend this model to also infer the rates of sequencing errors and mutations. We finally apply our approach to recent experimental data on the evolution of drug resistance in Influenza virus, identifying likely targets of selection and finding evidence for much larger viral population sizes than previously reported.


2019 ◽  
Author(s):  
Eirini Christodoulaki ◽  
Neda Barghi ◽  
Christian Schlötterer

AbstractPolygenic adaptation is frequently associated with small allele frequency changes of many loci. Recent works suggest, that large allele frequency changes can be also expected. Laboratory natural selection (LNS) experiments provide an excellent experimental framework to study the adaptive architecture under controlled laboratory conditions: time series data in replicate populations evolving independently to the same trait optimum can be used to identify selected loci. Nevertheless, the choice of the new trait optimum in the laboratory is typically an ad hoc decision without consideration of the distance of the starting population to the new optimum. Here, we used forward-simulations to study the selection signatures of polygenic adaptation in populations evolving to different trait optima. Mimicking LNS experiments we analyzed allele frequencies of the selected alleles and population fitness at multiple time points. We demonstrate that the inferred adaptive architecture strongly depends on the choice of the new trait optimum in the laboratory and the significance cut-off used for identification of selected loci. Our results not only have a major impact on the design of future Evolve and Resequence (E&R) studies, but also on the interpretation of current E&R data sets.



2019 ◽  
Author(s):  
Srishti Mishra ◽  
Zohair Shafi ◽  
Santanu Pathak

Data driven decision making is becoming increasingly an important aspect for successful business execution. More and more organizations are moving towards taking informed decisions based on the data that they are generating. Most of this data are in temporal format - time series data. Effective analysis across time series data sets, in an efficient and quick manner is a challenge. The most interesting and valuable part of such analysis is to generate insights on correlation and causation across multiple time series data sets. This paper looks at methods that can be used to analyze such data sets and gain useful insights from it, primarily in the form of correlation and causation analysis. This paper focuses on two methods for doing so, Two Sample Test with Dynamic Time Warping and Hierarchical Clustering and looks at how the results returned from both can be used to gain a better understanding of the data. Moreover, the methods used are meant to work with any data set, regardless of the subject domain and idiosyncrasies of the data set, primarily, a data agnostic approach.



2020 ◽  
Vol 15 (3) ◽  
pp. 225-237
Author(s):  
Saurabh Kumar ◽  
Jitendra Kumar ◽  
Vikas Kumar Sharma ◽  
Varun Agiwal

This paper deals with the problem of modelling time series data with structural breaks occur at multiple time points that may result in varying order of the model at every structural break. A flexible and generalized class of Autoregressive (AR) models with multiple structural breaks is proposed for modelling in such situations. Estimation of model parameters are discussed in both classical and Bayesian frameworks. Since the joint posterior of the parameters is not analytically tractable, we employ a Markov Chain Monte Carlo method, Gibbs sampling to simulate posterior sample. To verify the order change, a hypotheses test is constructed using posterior probability and compared with that of without breaks. The methodologies proposed here are illustrated by means of simulation study and a real data analysis.



2018 ◽  
Author(s):  
Tal Zinger ◽  
Pleuni S. Pennings ◽  
Adi Stern

1AbstractWith the advent of deep sequencing techniques, it is now possible to track the evolution of viruses with ever-increasing detail. Here we present FITS (Flexible Inference from Time-Series) – a computational framework that allows inference of either the fitness of a mutation, the mutation rate or the population size from genomic time-series sequencing data. FITS was designed first and foremost for analysis of either short-term Evolve & Resequence (E&R) experiments, or for rapidly recombining populations of viruses. We thoroughly explore the performance of FITS on noisy simulated data, and highlight its ability to infer meaningful information even in those circumstances. In particular FITS is able to categorize a mutation as Advantageous, Neutral or Deleterious. We next apply FITS to empirical data from an E&R experiment on poliovirus where parameters were determined experimentally and demonstrate extremely high accuracy in inference. We highlight the ease of use of FITS for step-wise or iterative inference of mutation rates, population size, and fitness values for each mutation sequenced, when deep sequencing data is available at multiple time-points.AvailabilityFITS is written in C++ and is available both with a highly user friendly graphical user interface but also as a command line program that allows parallel high throughput analyses. Source code, binaries (Windows and Mac) and complementary scripts, are available from GitHub at https://github.com/SternLabTAU/[email protected]



Author(s):  
Srishti Mishra ◽  
Zohair Shafi ◽  
Santanu Pathak

Data driven decision making is becoming increasingly an important aspect for successful business execution. More and more organizations are moving towards taking informed decisions based on the data that they are generating. Most of this data are in temporal format - time series data. Effective analysis across time series data sets, in an efficient and quick manner is a challenge. The most interesting and valuable part of such analysis is to generate insights on correlation and causation across multiple time series data sets. This paper looks at methods that can be used to analyze such data sets and gain useful insights from it, primarily in the form of correlation and causation analysis. This paper focuses on two methods for doing so, Two Sample Test with Dynamic Time Warping and Hierarchical Clustering and looks at how the results returned from both can be used to gain a better understanding of the data. Moreover, the methods used are meant to work with any data set, regardless of the subject domain and idiosyncrasies of the data set, primarily, a data agnostic approach.



2004 ◽  
Vol 3 (1) ◽  
pp. 1-18 ◽  
Author(s):  
Harry Hochheiser ◽  
Ben Shneiderman

Timeboxes are rectangular widgets that can be used in direct-manipulation graphical user interfaces (GUIs) to specify query constraints on time series data sets. Timeboxes are used to specify simultaneously two sets of constraints: given a set of N time series profiles, a timebox covering time periods x1… x2 ( x1 ≤ x2) and values y1… y2 ( y1 ≤ y2) will retrieve only those n√N that have values y1 ≤ y2 during all times x1 ≤ x ≤ x2. TimeSearcher is an information visualization tool that combines timebox queries with overview displays, query-by-example facilities, and support for queries over multiple time-varying attributes. Query manipulation tools including pattern inversion and ‘leaders & laggards’ graphical bookmarks provide additional support for interactive exploration of data sets. Extensions to the basic timebox model that provide additional expressivity include variable time timeboxes, which can be used to express queries with variability in the time interval, and angular queries, which search for ranges of differentials, rather than absolute values. Analysis of the algorithmic requirements for providing dynamic query performance for timebox queries showed that a sequential search outperformed searches based on geometric indices. Design studies helped identify the strengths and weaknesses of the query tools. Extended case studies involving the analysis of two different types of data from molecular biology experiments provided valuable feedback and validated the utility of both the timebox model and the TimeSearcher tool. Timesearcher is available at http://www.cs.umd.edu/hcil/timesearcher



AI ◽  
2021 ◽  
Vol 2 (1) ◽  
pp. 48-70
Author(s):  
Wei Ming Tan ◽  
T. Hui Teo

Prognostic techniques attempt to predict the Remaining Useful Life (RUL) of a subsystem or a component. Such techniques often use sensor data which are periodically measured and recorded into a time series data set. Such multivariate data sets form complex and non-linear inter-dependencies through recorded time steps and between sensors. Many current existing algorithms for prognostic purposes starts to explore Deep Neural Network (DNN) and its effectiveness in the field. Although Deep Learning (DL) techniques outperform the traditional prognostic algorithms, the networks are generally complex to deploy or train. This paper proposes a Multi-variable Time Series (MTS) focused approach to prognostics that implements a lightweight Convolutional Neural Network (CNN) with attention mechanism. The convolution filters work to extract the abstract temporal patterns from the multiple time series, while the attention mechanisms review the information across the time axis and select the relevant information. The results suggest that the proposed method not only produces a superior accuracy of RUL estimation but it also trains many folds faster than the reported works. The superiority of deploying the network is also demonstrated on a lightweight hardware platform by not just being much compact, but also more efficient for the resource restricted environment.



2017 ◽  
Author(s):  
Anthony Szedlak ◽  
Spencer Sims ◽  
Nicholas Smith ◽  
Giovanni Paternostro ◽  
Carlo Piermarocchi

AbstractModern time series gene expression and other omics data sets have enabled unprecedented resolution of the dynamics of cellular processes such as cell cycle and response to pharmaceutical compounds. In anticipation of the proliferation of time series data sets in the near future, we use the Hopfield model, a recurrent neural network based on spin glasses, to model the dynamics of cell cycle in HeLa (human cervical cancer) and S. cerevisiae cells. We study some of the rich dynamical properties of these cyclic Hopfield systems, including the ability of populations of simulated cells to recreate experimental expression data and the effects of noise on the dynamics. Next, we use a genetic algorithm to identify sets of genes which, when selectively inhibited by local external fields representing gene silencing compounds such as kinase inhibitors, disrupt the encoded cell cycle. We find, for example, that inhibiting the set of four kinases BRD4, MAPK1, NEK7, and YES1 in HeLa cells causes simulated cells to accumulate in the M phase. Finally, we suggest possible improvements and extensions to our model.Author SummaryCell cycle – the process in which a parent cell replicates its DNA and divides into two daughter cells – is an upregulated process in many forms of cancer. Identifying gene inhibition targets to regulate cell cycle is important to the development of effective therapies. Although modern high throughput techniques offer unprecedented resolution of the molecular details of biological processes like cell cycle, analyzing the vast quantities of the resulting experimental data and extracting actionable information remains a formidable task. Here, we create a dynamical model of the process of cell cycle using the Hopfield model (a type of recurrent neural network) and gene expression data from human cervical cancer cells and yeast cells. We find that the model recreates the oscillations observed in experimental data. Tuning the level of noise (representing the inherent randomness in gene expression and regulation) to the “edge of chaos” is crucial for the proper behavior of the system. We then use this model to identify potential gene targets for disrupting the process of cell cycle. This method could be applied to other time series data sets and used to predict the effects of untested targeted perturbations.



Author(s):  
Pritpal Singh

Forecasting using fuzzy time series has been applied in several areas including forecasting university enrollments, sales, road accidents, financial forecasting, weather forecasting, etc. Recently, many researchers have paid attention to apply fuzzy time series in time series forecasting problems. In this paper, we present a new model to forecast the enrollments in the University of Alabama and the daily average temperature in Taipei, based on one-factor fuzzy time series. In this model, a new frequency based clustering technique is employed for partitioning the time series data sets into different intervals. For defuzzification function, two new principles are also incorporated in this model. In case of enrollments as well daily temperature forecasting, proposed model exhibits very small error rate.



2000 ◽  
Vol 39 (02) ◽  
pp. 101-104
Author(s):  
A. Lowe ◽  
M. J. Harrison ◽  
R. W. Jones

Abstract:The recognition of clinically significant trends in monitored signals plays an important role in many medical diagnostic applications. A template-based system technique to identify characteristic patterns in time-series data is described, based on fuzzy logic. Fuzzy set theory allows the creation of fuzzy templates from linguistic rules. The resulting fuzzy template system can accommodate multiple time signals, relative or absolute trends, and automatically generates a normalised “goodness of fit” score. The template approach was originally developed for monitoring during anaesthesia but has the potential to be useful in other domains that require temporal pattern recognition.



Sign in / Sign up

Export Citation Format

Share Document