scholarly journals Multi-locus analysis of genomic time series data from experimental evolution

2014 ◽  
Author(s):  
Jonathan Terhorst ◽  
Yun S. Song

Genomic time series data generated by evolve-and-resequence (E&R) experiments offer a powerful window into the mechanisms that drive evolution. However, standard population genetic inference procedures do not account for sampling serially over time, and new methods are needed to make full use of modern experimental evolution data. To address this problem, we develop a Gaussian process approximation to the multi-locus Wright-Fisher process with selection over a time course of tens of generations. The mean and covariance structure of the Gaussian process are obtained by computing the corresponding moments in discrete-time Wright-Fisher models conditioned on the presence of a linked selected site. This enables our method to account for the effects of linkage and selection, both along the genome and across sampled time points, in an approximate but principled manner. Using simulated data, we demonstrate the power of our method to correctly detect, locate and estimate the fitness of a selected allele from among several linked sites. We also study how this power changes for different values of selection strength, initial haplotypic diversity, population size, sampling frequency, experimental duration, number of replicates, and sequencing coverage depth. In addition to providing quantitative estimates of selection parameters from experimental evolution data, our model can be used by practitioners to design E&R experiments with requisite power. Finally, we explore how our likelihood-based approach can be used to infer other model parameters, including effective population size and recombination rate, and discuss extensions to more complex models.

2021 ◽  
Author(s):  
Zachariah Gompert ◽  
Amy Springer ◽  
Megan Brady ◽  
Samridhi Chaturvedi ◽  
Lauren K. Lucas

AbstractEffective population size affects the efficacy of selection, rate of evolution by drift, and neutral diversity levels. When species are subdivided into multiple populations connected by gene flow, evolutionary processes can depend on global or local effective population sizes. Theory predicts that high levels of diversity might be maintained by gene flow, even very low levels of gene flow, consistent with species long-term effective population size, but tests of this idea are mostly lacking. Here, we show thatLycaeidesbutterfly populations maintain low contemporary (variance) effective population sizes (e.g., ∼200 individuals) and thus evolve rapidly by genetic drift. Contemporary effective sizes were consistent with local census populations sizes. In contrast, populations harbored high levels of genetic diversity consistent with an effective population size several orders of magnitude larger. We hypothesized that the differences in the magnitude and variability of contemporary versus long-term effective population sizes were caused by gene flow of sufficient magnitude to maintain diversity but only subtly affect evolution on generational time scales. Consistent with this hypothesis, we detected low but non-trivial gene flow among populations. Furthermore, using population-genomic time-series data, we documented patterns consistent with predictions from this hypothesis, including a weak but detectable excess of evolutionary change in the direction of the mean (migrant gene pool) allele frequencies across populations, and consistency in the direction of allele frequency change over time. The documented decoupling of diversity levels and short-term change by drift inLycaeideshas implications for our understanding of contemporary evolution and the maintenance of genetic variation in the wild.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Hitoshi Iuchi ◽  
Michiaki Hamada

Abstract Time-course experiments using parallel sequencers have the potential to uncover gradual changes in cells over time that cannot be observed in a two-point comparison. An essential step in time-series data analysis is the identification of temporal differentially expressed genes (TEGs) under two conditions (e.g. control versus case). Model-based approaches, which are typical TEG detection methods, often set one parameter (e.g. degree or degree of freedom) for one dataset. This approach risks modeling of linearly increasing genes with higher-order functions, or fitting of cyclic gene expression with linear functions, thereby leading to false positives/negatives. Here, we present a Jonckheere–Terpstra–Kendall (JTK)-based non-parametric algorithm for TEG detection. Benchmarks, using simulation data, show that the JTK-based approach outperforms existing methods, especially in long time-series experiments. Additionally, application of JTK in the analysis of time-series RNA-seq data from seven tissue types, across developmental stages in mouse and rat, suggested that the wave pattern contributes to the TEG identification of JTK, not the difference in expression levels. This result suggests that JTK is a suitable algorithm when focusing on expression patterns over time rather than expression levels, such as comparisons between different species. These results show that JTK is an excellent candidate for TEG detection.


2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
Y Zhang ◽  
Y W Zhao ◽  
C C Wang ◽  
T C Li

Abstract Study question To investigate the different metabolomic profiling in serum between pregnant and non-pregnant women during early implantation period. Summary answer Metabolomics of progesterone-related hormones enhances from ET day3 for pregnancy women compared with non-pregnancy women. What is known already Metabolomics is based on high-throughput analytical methods to identify and quantify metabolites. Compared to other omics study, metabolomics is the closest one to the phenotype, allowing the observation of dynamic changes in phenotype at specific timepoints. So far there is no published work about the metabolomics profile in human early implantation period. Study design, size, duration: Study design: comparative study. Size: 14 pregnancy women and 14 non-pregnancy women. duration: time-course. Participants/materials, setting, methods Participants: pregnancy women and unpregnancy women after embryo transfer (ET). Setting: university-based study. Methods: Peripheral blood were collected at ET day0, 3, 6 and 9. metabolomic profiling in serum by platforms of capillary electrophoresis-mass spectrometry (CE-MS) and liquid chromatography–mass spectrometry (LC-MS). Main results and the role of chance There were no statistical difference of the age, BMI, basal FSH level, endometrium thickness on the day of embryo transfer, distribution of primary and secondary fertility, embryo transfer cycle as well as the infertile types between the two groups. After deleting those with over 50% missing data, we finally have 310 metabolites into statistical analysis. Among the 310 metabolite, lipid metabolites account the largest percentage, nearly half of all metabolites. The second biggest class of metabolites in our data was organic acids. Combined results in repeated measurement ANOVA (RM-ANOVA) and ANOVA-simultaneous component analysis (ASCA) as well as multivariate empirical Bayes time-series analysis (MEBA), we finally found that progesterone-related hormones were the most important metabolites for the whole time-series data. Those significant metabolites showed a significant down regulation from ET day0 to ET day3 and up regulation from ET day3 to ET day9. Limitations, reasons for caution we have limited sample size for this study and further validation is necessary for confirmation. Wider implications of the findings: The phenomenon of upregulation of progesterone-related hormones from day3 in pregnancy group might be related to the embryo-originated hcg. Because the embryo has entered into endometrium at day3 and produced cytokines, hcg and other interaction with endometrium. Trial registration number NA


2007 ◽  
Vol 9 (1) ◽  
pp. 30-41 ◽  
Author(s):  
Nikhil S. Padhye ◽  
Sandra K. Hanneman

The application of cosinor models to long time series requires special attention. With increasing length of the time series, the presence of noise and drifts in rhythm parameters from cycle to cycle lead to rapid deterioration of cosinor models. The sensitivity of amplitude and model-fit to the data length is demonstrated for body temperature data from ambulatory menstrual cycling and menopausal women and from ambulatory male swine. It follows that amplitude comparisons between studies cannot be made independent of consideration of the data length. Cosinor analysis may be carried out on serial-sections of the series for improved model-fit and for tracking changes in rhythm parameters. Noise and drift reduction can also be achieved by folding the series onto a single cycle, which leads to substantial gains in the model-fit but lowers the amplitude. Central values of model parameters are negligibly changed by consideration of the autoregressive nature of residuals.


Author(s):  
Puneet Agarwal ◽  
William Walker ◽  
Kenneth Bhalla

The most probable maximum (MPM) is the extreme value statistic commonly used in the offshore industry. The extreme value of vessel motions, structural response, and environment are often expressed using the MPM. For a Gaussian process, the MPM is a function of the root-mean square and the zero-crossing rate of the process. Accurate estimates of the MPM may be obtained in frequency domain from spectral moments of the known power spectral density. If the MPM is to be estimated from the time-series of a random process, either from measurements or from simulations, the time series data should be of long enough duration, sampled at an adequate rate, and have an ensemble of multiple realizations. This is not the case when measured data is recorded for an insufficient duration, or one wants to make decisions (requiring an estimate of the MPM) in real-time based on observing the data only for a short duration. Sometimes, the instrumentation system may not be properly designed to measure the dynamic vessel motions with a fine sampling rate, or it may be a legacy instrumentation system. The question then becomes whether the short-duration and/or the undersampled data is useful at all, or if some useful information (i.e., an estimate of MPM) can be extracted, and if yes, what is the accuracy and uncertainty of such estimates. In this paper, a procedure for estimation of the MPM from the short-time maxima, i.e., the maximum value from a time series of short duration (say, 10 or 30 minutes), is presented. For this purpose pitch data is simulated from the vessel RAOs (response amplitude operators). Factors to convert the short-time maxima to the MPM are computed for various non-exceedance levels. It is shown that the factors estimated from simulation can also be obtained from the theory of extremes of a Gaussian process. Afterwards, estimation of the MPM from the short-time maxima is explored for an undersampled process; however, undersampled data must not be used and only the adequately sampled data should be utilized. It is found that the undersampled data can be somewhat useful and factors to convert the short-time maxima to the MPM can be derived for an associated non-exceedance level. However, compared to the adequately sampled data, the factors for the undersampled data are less useful since they depend on more variables and have more uncertainty. While the vessel pitch data was the focus of this paper, the results and conclusions are valid for any adequately sampled narrow-banded Gaussian process.


Author(s):  
Nobuhiko Yamaguchi ◽  

Gaussian Process Dynamical Models (GPDMs) constitute a nonlinear dimensionality reduction technique that provides a probabilistic representation of time series data in terms of Gaussian process priors. In this paper, we report a method based on GPDMs to visualize the states of time-series data. Conventional GPDMs are unsupervised, and therefore, even when the labels of data are available, it is not possible to use this information. To overcome the problem, we propose a supervised GPDM (S-GPDM) that utilizes both the data and their corresponding labels. We demonstrate experimentally that the S-GPDM can locate related motion data closer together than conventional GPDMs.


2018 ◽  
Vol 2 (2) ◽  
pp. 49-57
Author(s):  
Dwi Yulianti ◽  
I Made Sumertajaya ◽  
Itasia Dina Sulvianti

Generalized space time autoregressive integrated  moving average (GSTARIMA) model is a time series model of multiple variables with spatial and time linkages (space time). GSTARIMA model is an extension of the space time autoregressive integrated moving average (STARIMA) model with the assumption that each location has unique model parameters, thus GSTARIMA model is more flexible than STARIMA model. The purposes of this research are to determine the best model and predict the time series data of rice price on all provincial capitals of Sumatra island using GSTARIMA model. This research used weekly data of rice price on all provincial capitals of Sumatra island from January 2010 to December 2017. The spatial weights used in this research are the inverse distance and queen contiguity. The modeling result shows that the best model is GSTARIMA (1,1,0) with queen contiguity weighted matrix and has the smallest MAPE value of 1.17817 %.


2016 ◽  
Author(s):  
Luis F. Jover ◽  
Justin Romberg ◽  
Joshua S. Weitz

In communities with bacterial viruses (phage) and bacteria, the phage-bacteria infection network establishes which virus types infects which host types. The structure of the infection network is a key element in understanding community dynamics. Yet, this infection network is often difficult to ascertain. Introduced over 60 years ago, the plaque assay remains the gold-standard for establishing who infects whom in a community. This culture-based approach does not scale to environmental samples with increased levels of phage and bacterial diversity, much of which is currently unculturable. Here, we propose an alternative method of inferring phage-bacteria infection networks. This method uses time series data of fluctuating population densities to estimate the complete interaction network without having to test each phage-bacteria pair individually. We use in silico experiments to analyze the factors affecting the quality of network reconstruction and find robust regimes where accurate reconstructions are possible. In addition, we present a multi-experiment approach where time series from different experiments are combined to improve estimates of the infection network and mitigate against the possibility of evolutionary changes to infection during the time-course of measurement.


2020 ◽  
Author(s):  
Iain Mathieson

AbstractTime series data of allele frequencies are a powerful resource for detecting and classifying natural and artificial selection. Ancient DNA now allows us to observe these trajectories in natural populations of long-lived species such as humans. Here, we develop a hidden Markov model to infer selection coefficients that vary over time. We show through simulations that our approach can accurately estimate both selection coefficients and the timing of changes in selection. Finally, we analyze some of the strongest signals of selection in the human genome using ancient DNA. We show that the European lactase persistence mutation was selected over the past 5,000 years with a selection coefficient of 2-2.5% in Britain, Central Europe and Iberia, but not Italy. In northern East Asia, selection at the ADH1B locus associated with alcohol metabolism intensified around 4,000 years ago, approximately coinciding with the introduction of rice-based agriculture. Finally, a derived allele at the FADS locus was selected in parallel in both Europe and East Asia, as previously hypothesized. Our approach is broadly applicable to both natural and experimental evolution data and shows how time series data can be used to resolve fine-scale details of selection.


Sign in / Sign up

Export Citation Format

Share Document