Distance to trait optimum is a crucial factor determining the genomic signature of polygenic adaptation

AbstractPolygenic adaptation is frequently associated with small allele frequency changes of many loci. Recent works suggest, that large allele frequency changes can be also expected. Laboratory natural selection (LNS) experiments provide an excellent experimental framework to study the adaptive architecture under controlled laboratory conditions: time series data in replicate populations evolving independently to the same trait optimum can be used to identify selected loci. Nevertheless, the choice of the new trait optimum in the laboratory is typically an ad hoc decision without consideration of the distance of the starting population to the new optimum. Here, we used forward-simulations to study the selection signatures of polygenic adaptation in populations evolving to different trait optima. Mimicking LNS experiments we analyzed allele frequencies of the selected alleles and population fitness at multiple time points. We demonstrate that the inferred adaptive architecture strongly depends on the choice of the new trait optimum in the laboratory and the significance cut-off used for identification of selected loci. Our results not only have a major impact on the design of future Evolve and Resequence (E&R) studies, but also on the interpretation of current E&R data sets.

Download Full-text

An Approximate Markov Model for the Wright-Fisher Diffusion

10.1101/030940 ◽

2015 ◽

Cited By ~ 2

Author(s):

Anna Ferrer-Admetlla ◽

Christoph Leuenberger ◽

Jeffrey D Jensen ◽

Daniel Wegmann

Keyword(s):

Time Series Data ◽

Diffusion Processes ◽

Viral Population ◽

Series Data ◽

Recent Experimental Data ◽

Data Sets ◽

Multiple Time ◽

Sequencing Errors ◽

Additional Information ◽

Multiple Time Points

The joint and accurate inference of selection and demography from genetic data is considered a particularly challenging question in population genetics, since both process may lead to very similar patterns of genetic diversity. However, additional information for disentangling these effects may be obtained by observing changes in allele frequencies over multiple time points. Such data is common in experimental evolution studies, as well as in the comparison of ancient and contemporary samples. Leveraging this information, however, has been computationally challenging, particularly when considering multi-locus data sets. To overcome these issues, we introduce a novel, discrete approximation for diffusion processes, termed \textit{mean transition time approximation}, which preserves the long-term behavior of the underlying continuous diffusion process. We then derive this approximation for the particular case of inferring selection and demography from time series data under the classic Wright-Fisher model and demonstrate that our approximation is well suited to describe allele trajectories through time, even when only a few states are used. We then develop a Bayesian inference approach to jointly infer the population size and locus-specific selection coefficients with high accuracy, and further extend this model to also infer the rates of sequencing errors and mutations. We finally apply our approach to recent experimental data on the evolution of drug resistance in Influenza virus, identifying likely targets of selection and finding evidence for much larger viral population sizes than previously reported.

Download Full-text

Distinct Patterns of Selective Sweep and Polygenic Adaptation in Evolve and Resequence Studies

Genome Biology and Evolution ◽

10.1093/gbe/evaa073 ◽

2020 ◽

Vol 12 (6) ◽

pp. 890-904 ◽

Cited By ~ 4

Author(s):

Neda Barghi ◽

Christian Schlötterer

Keyword(s):

Experimental Evolution ◽

Selective Sweep ◽

Time Series Data ◽

Empirical Studies ◽

Series Data ◽

Stabilizing Selection ◽

Selection Signatures ◽

Polygenic Adaptation ◽

Frequency Changes ◽

Optimum Model

Abstract In molecular population genetics, adaptation is typically thought to occur via selective sweeps, where targets of selection have independent effects on the phenotype and rise to fixation, whereas in quantitative genetics, many loci contribute to the phenotype and subtle frequency changes occur at many loci during polygenic adaptation. The sweep model makes specific predictions about frequency changes of beneficial alleles and many test statistics have been developed to detect such selection signatures. Despite polygenic adaptation is probably the prevalent mode of adaptation, because of the traditional focus on the phenotype, we are lacking a solid understanding of the similarities and differences of selection signatures under the two models. Recent theoretical and empirical studies have shown that both selective sweep and polygenic adaptation models could result in a sweep-like genomic signature; therefore, additional criteria are needed to distinguish the two models. With replicated populations and time series data, experimental evolution studies have the potential to identify the underlying model of adaptation. Using the framework of experimental evolution, we performed computer simulations to study the pattern of selected alleles for two models: 1) adaptation of a trait via independent beneficial mutations that are conditioned for fixation, that is, selective sweep model and 2) trait optimum model (polygenic adaptation), that is adaptation of a quantitative trait under stabilizing selection after a sudden shift in trait optimum. We identify several distinct patterns of selective sweep and trait optimum models in populations of different sizes. These features could provide the foundation for development of quantitative approaches to differentiate the two models.

Download Full-text

Time series event correlation with DTW and Hierarchical Clustering methods

10.7287/peerj.preprints.27959 ◽

2019 ◽

Author(s):

Srishti Mishra ◽

Zohair Shafi ◽

Santanu Pathak

Keyword(s):

Time Series ◽

Hierarchical Clustering ◽

Time Series Data ◽

Series Data ◽

Data Sets ◽

Multiple Time ◽

Clustering Methods ◽

Event Correlation ◽

Data Set ◽

Causation Analysis

Data driven decision making is becoming increasingly an important aspect for successful business execution. More and more organizations are moving towards taking informed decisions based on the data that they are generating. Most of this data are in temporal format - time series data. Effective analysis across time series data sets, in an efficient and quick manner is a challenge. The most interesting and valuable part of such analysis is to generate insights on correlation and causation across multiple time series data sets. This paper looks at methods that can be used to analyze such data sets and gain useful insights from it, primarily in the form of correlation and causation analysis. This paper focuses on two methods for doing so, Two Sample Test with Dynamic Time Warping and Hierarchical Clustering and looks at how the results returned from both can be used to gain a better understanding of the data. Moreover, the methods used are meant to work with any data set, regardless of the subject domain and idiosyncrasies of the data set, primarily, a data agnostic approach.

Download Full-text

Random order autoregressive time series model with structural break

Model Assisted Statistics and Applications ◽

10.3233/mas-200490 ◽

2020 ◽

Vol 15 (3) ◽

pp. 225-237

Author(s):

Saurabh Kumar ◽

Jitendra Kumar ◽

Vikas Kumar Sharma ◽

Varun Agiwal

Keyword(s):

Time Series ◽

Structural Breaks ◽

Time Series Data ◽

Structural Break ◽

Random Order ◽

Real Data ◽

Series Data ◽

Model Parameters ◽

Multiple Time ◽

Multiple Time Points

This paper deals with the problem of modelling time series data with structural breaks occur at multiple time points that may result in varying order of the model at every structural break. A flexible and generalized class of Autoregressive (AR) models with multiple structural breaks is proposed for modelling in such situations. Estimation of model parameters are discussed in both classical and Bayesian frameworks. Since the joint posterior of the parameters is not analytically tractable, we employ a Markov Chain Monte Carlo method, Gibbs sampling to simulate posterior sample. To verify the order change, a hypotheses test is constructed using posterior probability and compared with that of without breaks. The methodologies proposed here are illustrated by means of simulation study and a real data analysis.

Download Full-text

A generalised approach to detect selected haplotype blocks in Evolve and Resequence experiments

10.1101/691659 ◽

2019 ◽

Cited By ~ 2

Author(s):

Kathrin A. Otte ◽

Christian Schlötterer

Keyword(s):

Ad Hoc ◽

Time Series Data ◽

Window Size ◽

Series Data ◽

Separate Analysis ◽

Nucleotide Polymorphisms ◽

Haplotype Blocks ◽

Adaptive Architecture ◽

Block Based ◽

Highly Correlated

AbstractShifting from the analysis of single nucleotide polymorphisms to the reconstruction of selected haplotypes greatly facilitates the interpretation of Evolve and Resequence (E&R) experiments. Merging highly correlated hitchhiker SNPs into haplotype blocks reduces thousands of candidates to few selected regions. Current methods of haplotype reconstruction from Pool-Seq data need a variety of data-specific parameters that are typically defined ad hoc and require haplotype sequences for validation. Here, we introduce haplovalidate, a tool which detects selected haplotypes in a broad range of Pool-seq time series data without the need of sequenced haplotypes. Haplovalidate makes data-driven choices of two key parameters for the clustering procedure, the minimum correlation between SNPs constituting a cluster and the window size. Applying haplovalidate to simulated and experimental E&R data reliably detects selected haplotype blocks with low false discovery rates – independent if few or many selection targets are included. Our analyses identified an important restriction of the haplotype block-based approach to describe the genomic architecture of adaptation. We detected a substantial fraction of haplotypes containing multiple selection targets. These blocks were considered as one region of selection and therefore led to under-estimation of the number of selection targets. We demonstrate that the separate analysis of earlier time points can significantly increase the separation of selection targets into individual haplotype blocks. We conclude that the analysis of selected haplotype blocks has a large potential for the characterisation of the adaptive architecture with E&R experiments.

Download Full-text

Inferring Population Genetics Parameters of Evolving Viruses Using Time-series Data

10.1101/437483 ◽

2018 ◽

Author(s):

Tal Zinger ◽

Pleuni S. Pennings ◽

Adi Stern

Keyword(s):

Time Series ◽

Population Size ◽

Deep Sequencing ◽

Time Series Data ◽

Simulated Data ◽

Ease Of Use ◽

Series Data ◽

Multiple Time ◽

Sequencing Data ◽

Multiple Time Points

1AbstractWith the advent of deep sequencing techniques, it is now possible to track the evolution of viruses with ever-increasing detail. Here we present FITS (Flexible Inference from Time-Series) – a computational framework that allows inference of either the fitness of a mutation, the mutation rate or the population size from genomic time-series sequencing data. FITS was designed first and foremost for analysis of either short-term Evolve & Resequence (E&R) experiments, or for rapidly recombining populations of viruses. We thoroughly explore the performance of FITS on noisy simulated data, and highlight its ability to infer meaningful information even in those circumstances. In particular FITS is able to categorize a mutation as Advantageous, Neutral or Deleterious. We next apply FITS to empirical data from an E&R experiment on poliovirus where parameters were determined experimentally and demonstrate extremely high accuracy in inference. We highlight the ease of use of FITS for step-wise or iterative inference of mutation rates, population size, and fitness values for each mutation sequenced, when deep sequencing data is available at multiple time-points.AvailabilityFITS is written in C++ and is available both with a highly user friendly graphical user interface but also as a command line program that allows parallel high throughput analyses. Source code, binaries (Windows and Mac) and complementary scripts, are available from GitHub at https://github.com/SternLabTAU/[email protected]

Download Full-text

Time series event correlation with DTW and Hierarchical Clustering methods

10.7287/peerj.preprints.27959v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Srishti Mishra ◽

Zohair Shafi ◽

Santanu Pathak

Keyword(s):

Time Series ◽

Hierarchical Clustering ◽

Time Series Data ◽

Series Data ◽

Data Sets ◽

Multiple Time ◽

Clustering Methods ◽

Event Correlation ◽

Data Set ◽

Causation Analysis

Download Full-text

Dynamic Query Tools for Time Series Data Sets: Timebox Widgets for Interactive Exploration

Information Visualization ◽

10.1057/palgrave.ivs.9500061 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-18 ◽

Cited By ~ 176

Author(s):

Harry Hochheiser ◽

Ben Shneiderman

Keyword(s):

Time Series ◽

User Interfaces ◽

Time Series Data ◽

Series Data ◽

Time Interval ◽

Data Sets ◽

Multiple Time ◽

Interactive Exploration ◽

Additional Support ◽

Dynamic Query

Timeboxes are rectangular widgets that can be used in direct-manipulation graphical user interfaces (GUIs) to specify query constraints on time series data sets. Timeboxes are used to specify simultaneously two sets of constraints: given a set of N time series profiles, a timebox covering time periods x1… x2 ( x1 ≤ x2) and values y1… y2 ( y1 ≤ y2) will retrieve only those n√N that have values y1 ≤ y2 during all times x1 ≤ x ≤ x2. TimeSearcher is an information visualization tool that combines timebox queries with overview displays, query-by-example facilities, and support for queries over multiple time-varying attributes. Query manipulation tools including pattern inversion and ‘leaders & laggards’ graphical bookmarks provide additional support for interactive exploration of data sets. Extensions to the basic timebox model that provide additional expressivity include variable time timeboxes, which can be used to express queries with variability in the time interval, and angular queries, which search for ranges of differentials, rather than absolute values. Analysis of the algorithmic requirements for providing dynamic query performance for timebox queries showed that a sequential search outperformed searches based on geometric indices. Design studies helped identify the strengths and weaknesses of the query tools. Extended case studies involving the analysis of two different types of data from molecular biology experiments provided valuable feedback and validated the utility of both the timebox model and the TimeSearcher tool. Timesearcher is available at http://www.cs.umd.edu/hcil/timesearcher

Download Full-text

Remaining Useful Life Prediction Using Temporal Convolution with Attention

AI ◽

10.3390/ai2010005 ◽

2021 ◽

Vol 2 (1) ◽

pp. 48-70

Author(s):

Wei Ming Tan ◽

T. Hui Teo

Keyword(s):

Neural Network ◽

Time Series ◽

Time Series Data ◽

Remaining Useful Life ◽

Sensor Data ◽

Series Data ◽

Multiple Time ◽

Data Set ◽

Form Complex ◽

Useful Life

Prognostic techniques attempt to predict the Remaining Useful Life (RUL) of a subsystem or a component. Such techniques often use sensor data which are periodically measured and recorded into a time series data set. Such multivariate data sets form complex and non-linear inter-dependencies through recorded time steps and between sensors. Many current existing algorithms for prognostic purposes starts to explore Deep Neural Network (DNN) and its effectiveness in the field. Although Deep Learning (DL) techniques outperform the traditional prognostic algorithms, the networks are generally complex to deploy or train. This paper proposes a Multi-variable Time Series (MTS) focused approach to prognostics that implements a lightweight Convolutional Neural Network (CNN) with attention mechanism. The convolution filters work to extract the abstract temporal patterns from the multiple time series, while the attention mechanisms review the information across the time axis and select the relevant information. The results suggest that the proposed method not only produces a superior accuracy of RUL estimation but it also trains many folds faster than the reported works. The superiority of deploying the network is also demonstrated on a lightweight hardware platform by not just being much compact, but also more efficient for the resource restricted environment.

Download Full-text

Cell cycle time series gene expression data encoded as cyclic attractors in Hopfield systems

10.1101/170027 ◽

2017 ◽

Author(s):

Anthony Szedlak ◽

Spencer Sims ◽

Nicholas Smith ◽

Giovanni Paternostro ◽

Carlo Piermarocchi

Keyword(s):

Neural Network ◽

Gene Expression ◽

Cell Cycle ◽

Time Series ◽

Time Series Data ◽

Series Data ◽

Data Sets ◽

Expression Data ◽

Time Series Gene Expression ◽

Human Cervical Cancer

AbstractModern time series gene expression and other omics data sets have enabled unprecedented resolution of the dynamics of cellular processes such as cell cycle and response to pharmaceutical compounds. In anticipation of the proliferation of time series data sets in the near future, we use the Hopfield model, a recurrent neural network based on spin glasses, to model the dynamics of cell cycle in HeLa (human cervical cancer) and S. cerevisiae cells. We study some of the rich dynamical properties of these cyclic Hopfield systems, including the ability of populations of simulated cells to recreate experimental expression data and the effects of noise on the dynamics. Next, we use a genetic algorithm to identify sets of genes which, when selectively inhibited by local external fields representing gene silencing compounds such as kinase inhibitors, disrupt the encoded cell cycle. We find, for example, that inhibiting the set of four kinases BRD4, MAPK1, NEK7, and YES1 in HeLa cells causes simulated cells to accumulate in the M phase. Finally, we suggest possible improvements and extensions to our model.Author SummaryCell cycle – the process in which a parent cell replicates its DNA and divides into two daughter cells – is an upregulated process in many forms of cancer. Identifying gene inhibition targets to regulate cell cycle is important to the development of effective therapies. Although modern high throughput techniques offer unprecedented resolution of the molecular details of biological processes like cell cycle, analyzing the vast quantities of the resulting experimental data and extracting actionable information remains a formidable task. Here, we create a dynamical model of the process of cell cycle using the Hopfield model (a type of recurrent neural network) and gene expression data from human cervical cancer cells and yeast cells. We find that the model recreates the oscillations observed in experimental data. Tuning the level of noise (representing the inherent randomness in gene expression and regulation) to the “edge of chaos” is crucial for the proper behavior of the system. We then use this model to identify potential gene targets for disrupting the process of cell cycle. This method could be applied to other time series data sets and used to predict the effects of untested targeted perturbations.

Download Full-text