scholarly journals A generalised approach to detect selected haplotype blocks in Evolve and Resequence experiments

2019 ◽  
Author(s):  
Kathrin A. Otte ◽  
Christian Schlötterer

AbstractShifting from the analysis of single nucleotide polymorphisms to the reconstruction of selected haplotypes greatly facilitates the interpretation of Evolve and Resequence (E&R) experiments. Merging highly correlated hitchhiker SNPs into haplotype blocks reduces thousands of candidates to few selected regions. Current methods of haplotype reconstruction from Pool-Seq data need a variety of data-specific parameters that are typically defined ad hoc and require haplotype sequences for validation. Here, we introduce haplovalidate, a tool which detects selected haplotypes in a broad range of Pool-seq time series data without the need of sequenced haplotypes. Haplovalidate makes data-driven choices of two key parameters for the clustering procedure, the minimum correlation between SNPs constituting a cluster and the window size. Applying haplovalidate to simulated and experimental E&R data reliably detects selected haplotype blocks with low false discovery rates – independent if few or many selection targets are included. Our analyses identified an important restriction of the haplotype block-based approach to describe the genomic architecture of adaptation. We detected a substantial fraction of haplotypes containing multiple selection targets. These blocks were considered as one region of selection and therefore led to under-estimation of the number of selection targets. We demonstrate that the separate analysis of earlier time points can significantly increase the separation of selection targets into individual haplotype blocks. We conclude that the analysis of selected haplotype blocks has a large potential for the characterisation of the adaptive architecture with E&R experiments.


2019 ◽  
Author(s):  
Eirini Christodoulaki ◽  
Neda Barghi ◽  
Christian Schlötterer

AbstractPolygenic adaptation is frequently associated with small allele frequency changes of many loci. Recent works suggest, that large allele frequency changes can be also expected. Laboratory natural selection (LNS) experiments provide an excellent experimental framework to study the adaptive architecture under controlled laboratory conditions: time series data in replicate populations evolving independently to the same trait optimum can be used to identify selected loci. Nevertheless, the choice of the new trait optimum in the laboratory is typically an ad hoc decision without consideration of the distance of the starting population to the new optimum. Here, we used forward-simulations to study the selection signatures of polygenic adaptation in populations evolving to different trait optima. Mimicking LNS experiments we analyzed allele frequencies of the selected alleles and population fitness at multiple time points. We demonstrate that the inferred adaptive architecture strongly depends on the choice of the new trait optimum in the laboratory and the significance cut-off used for identification of selected loci. Our results not only have a major impact on the design of future Evolve and Resequence (E&R) studies, but also on the interpretation of current E&R data sets.



2019 ◽  
Vol 6 (3) ◽  
pp. 181089 ◽  
Author(s):  
S. J. Salamon ◽  
H. J. Hansen ◽  
D. Abbott

The eye may perceive a significant trend in plotted time-series data, but if the model errors of nearby data points are correlated, the trend may be an illusion. We examine generalized least-squares (GLS) estimation, finding that error correlation may be underestimated in highly correlated small datasets by conventional techniques. This risks indicating a significant trend when there is none. A new correlation estimate based on the Durbin–Watson statistic is developed, leading to an improved estimate of autoregression with highly correlated data, thus reducing this risk. These techniques are generalized to randomly located data points in space, through the new concept of the nearest new neighbour path. We describe tests on the validity of the GLS schemes, allowing verification of the models employed. Examples illustrating our method include a 40-year record of atmospheric carbon dioxide, and Antarctic ice core data. While more conservative than existing techniques, our new GLS estimate finds a statistically significant increase in background carbon dioxide concentration, with an accelerating trend. We conclude with an example of a worldwide empirical climate model for radio propagation studies, to illustrate dealing with spatial correlation in unevenly distributed data points over the surface of the Earth. The method is generally applicable, not only to climate-related data, but to many other kinds of problems (e.g. biological, medical and geological data), where there are unequally (or randomly) spaced observations in temporally or spatially distributed datasets.



2005 ◽  
Vol 289 (5) ◽  
pp. E870-E882 ◽  
Author(s):  
Richard R. Almon ◽  
William Lai ◽  
Debra C. DuBois ◽  
William J. Jusko

Kidney is a major target for adverse effects associated with corticosteroids. A microarray dataset was generated to examine changes in gene expression in rat kidney in response to methylprednisolone. Four control and 48 drug-treated animals were killed at 16 times after drug administration. Kidney RNA was used to query 52 individual Affymetrix chips, generating data for 15,967 different probe sets for each chip. Mining techniques applicable to time series data that identify drug-regulated changes in gene expression were applied. Four sequential filters eliminated probe sets that were not expressed in the tissue, not regulated by drug, or did not meet defined quality control standards. These filters eliminated 14,890 probe sets (94%) from further consideration. Application of judiciously chosen filters is an effective tool for data mining of time series datasets. The remaining data can then be further analyzed by clustering and mathematical modeling. Initial analysis of this filtered dataset identified a group of genes whose pattern of regulation was highly correlated with prototype corticosteroid enhanced genes. Twenty genes in this group, as well as selected genes exhibiting either downregulation or no regulation, were analyzed for 5′ GRE half-sites conserved across species. In general, the results support the hypothesis that the existence of conserved DNA binding sites can serve as an important adjunct to purely analytic approaches to clustering genes into groups with common mechanisms of regulation. This dataset, as well as similar datasets on liver and muscle, are available online in a format amenable to further analysis by others.



2020 ◽  
Vol 34 (6) ◽  
pp. 999-1016 ◽  
Author(s):  
Alexander F. Danvers ◽  
Richard Wundrack ◽  
Matthias Mehl

We provide a basic, step–by–step introduction to the core concepts and mathematical fundamentals of dynamic systems modelling through applying the Change as Outcome model, a simple dynamical systems model, to personality state data. This model characterizes changes in personality states with respect to equilibrium points, estimating attractors and their strength in time series data. Using data from the Personality and Interpersonal Roles study, we find that mean state is highly correlated with attractor position but weakly correlated with attractor strength, suggesting strength provides added information not captured by summaries of the distribution. We then discuss how taking a dynamic systems approach to personality states also entails a theoretical shift. Instead of emphasizing partitioning trait and state variance, dynamic systems analyses of personality states emphasize characterizing patterns generated by mutual, ongoing interactions. Change as Outcome modelling also allows for estimating nuanced effects of personality development after significant life changes, separating effects on characteristic states after the significant change and how strongly she or he is drawn towards those states (an aspect of resiliency). Estimating this model demonstrates core dynamics principles and provides quantitative grounding for measures of ‘repulsive’ personality states and ‘ambivert’ personality structures. © 2020 European Association of Personality Psychology



2019 ◽  
Author(s):  
Alexander Francois Danvers ◽  
Richard Wundrack ◽  
Matthias R. Mehl

We provide a basic, step-by-step introduction to the core concepts and mathematical fundamentals of dynamic systems modeling through applying the Change as Outcome model, a simple dynamical systems model, to personality state data. This model characterizes changes in personality states with respect to equilibrium points, estimating attractors and their strength in time series data. Using data from the Personality and Interpersonal Roles (PAIRS) study, we find that mean state is highly correlated with attractor position but weakly correlated with attractor strength, suggesting strength provides added information not captured by summaries of the distribution. We then discuss how taking a dynamic systems approach to personality states also entails a theoretical shift. Instead of emphasizing partitioning trait and state variance, dynamic systems analyses of personality states emphasize characterizing patterns generated by mutual, ongoing interactions. Change as outcome modeling also allows for the effects of personality development after significant life changes to be conceptualized in more nuanced ways, separating effects on characteristic states after the significant change and how people are drawn towards those states (an aspect of resiliency). Estimating this model demonstrates core dynamics principles and provides quantitative grounding for measures of “repulsive” personality states and “ambivert” personality structures. Supplementary materials: https://osf.io/dps4w.



Water ◽  
2020 ◽  
Vol 12 (11) ◽  
pp. 3032
Author(s):  
Limei Dong ◽  
Desheng Fang ◽  
Xi Wang ◽  
Wei Wei ◽  
Robertas Damaševičius ◽  
...  

The streamflow of the upper reaches of the Yangtze River exhibits different timing and periodicity characteristics in different quarters and months of the year, which makes it difficult to predict. Existing sliding window-based methods usually use a fixed-size window, for which the window size selection is random, resulting in large errors. This paper proposes a dynamic sliding window method that reflects the different timing and periodicity characteristics of the streamflow in different months of the year. Multiple datasets of different months are generated using a dynamic window at first, then the long-short term memory (LSTM) is used to select the optimal window, and finally, the dataset of the optimal window size is used for verification. The proposed method was tested using the hydrological data of Zhutuo Hydrological Station (China). A comparison between the flow prediction data and the measured data shows that the prediction method based on a dynamic sliding window LSTM is more accurate by 8.63% and 3.85% than the prediction method based on fixed window LSTM and the dynamic sliding window back-propagation neural network, respectively. This method can be generally used for the time series data prediction with different periodic characteristics.



2005 ◽  
Vol 17 (2) ◽  
pp. 453-485 ◽  
Author(s):  
A. Menchero ◽  
R. Montes Diez ◽  
D. Ríos Insua ◽  
P. Müller

We show how Bayesian neural networks can be used for time-series analysis. We consider a block-based model building strategy to model linear and nonlinear features within the time series: a linear combination of a linear autoregression term and a feedforward neural network (FFNN) with an unknown number of hidden nodes. To allow for simpler models, we also consider these terms separately as competing models to select from. Model identifiability problems arise when FFNN sigmoidal activation functions exhibit almost linear behavior or when there are almost duplicate or irrelevant neural network nodes. New reversible-jump moves are proposed to facilitate model selection, mitigating model identifiability problems. We illustrate this methodology analyzing several time-series data examples.



2020 ◽  
Vol 10 (5) ◽  
pp. 1876
Author(s):  
Zhongya Fan ◽  
Huiyun Feng ◽  
Jingang Jiang ◽  
Changjin Zhao ◽  
Ni Jiang ◽  
...  

Outliers are often present in large datasets of water quality monitoring time series data. A method of combining the sliding window technique with Dixon detection criterion for the automatic detection of outliers in time series data is limited by the empirical determination of sliding window sizes. The scientific determination of the optimal sliding window size is very meaningful research work. This paper presents a new Monte Carlo Search Method (MCSM) based on random sampling to optimize the size of the sliding window, which fully takes advantage of computers and statistics. The MCSM was applied in a case study to automatic monitoring data of water quality factors in order to test its validity and usefulness. The results of comparing the accuracy and efficiency of the MCSM show that the new method in this paper is scientific and effective. The experimental results show that, at different sample sizes, the average accuracy is between 58.70% and 75.75%, and the average computation time increase is between 17.09% and 45.53%. In the era of big data in environmental monitoring, the proposed new methods can meet the required accuracy of outlier detection and improve the efficiency of calculation.



2010 ◽  
Vol 10 (02) ◽  
pp. 235-250
Author(s):  
VAN-HANH NGUYEN ◽  
FREDERIC MERIENNE ◽  
JEAN-LUC MARTINEZ

One of the most effective applications of virtual reality (VR) in physical rehabilitation is training, where patients are trained for sequence decision-making in special situations presented in virtual environment. In this application, the evaluation of the movement of the subject performing a physical task is crucial. A good evaluation of the motion is necessary to follow the progression of the patient during his training session. Therefore, it helps therapist to better supervise therapeutic planning. Actually, the performance of the patient's training is determined by subjective observation of the therapist. Our approach is to propose a system that allows the patient to perform his training and to evaluate the progress of training in an autonomous way. This system consists of a motion analysis technique for a rehabilitation application where the patient is represented by his own avatar in virtual environment. The task performance required from the patient is his capability to reproduce in real time a movement. The real-time motion evaluation technique is based on the time series data matching method called Longest Common Sub-Sequence (hereafter LCSS). It is used to calculate distance between the reference motion of virtual avatar and the captured motion data of the patients and thus is used to determine how well the patients are doing during the training. The complexity of the technique proposed is in the order of O (δ) in which δ is a constant matching window size. Our prototype application is based on Tai-chi movements which have shown many health benefits and are increasingly used for therapeutic purposes.



2020 ◽  
Vol 58 (3) ◽  
pp. 375-383 ◽  
Author(s):  
Tomohiro Mitani ◽  
Shunsuke Doi ◽  
Shinichiroh Yokota ◽  
Takeshi Imai ◽  
Kazuhiko Ohe

AbstractBackgroundDelta check is widely used for detecting specimen mix-ups. Owing to the inadequate specificity and sparseness of the absolute incidence of mix-ups, the positive predictive value (PPV) of delta check is considerably low as it is labor consuming to identify true mix-up errors among a large number of false alerts. To overcome this problem, we developed a new accurate detection model through machine learning.MethodsInspired by delta check, we decided to conduct comparisons with the past examinations and broaden the time range. Fifteen common items were selected from complete blood cell counts and biochemical tests. We considered examinations in which ≥11 among the 15 items were measured simultaneously in our hospital; we created individual partial time-series data of the consecutive examinations with a sliding window size of 4. The last examinations of the partial time-series data were shuffled to generate artificial mix-up cases. After splitting the dataset into development and validation sets, we allowed a gradient-boosting-decision-tree (GBDT) model to learn using the development set to detect whether the last examination results of the partial time-series data were artificial mixed-up results. The model’s performance was evaluated on the validation set.ResultsThe area under the receiver operating characteristic curve (ROC AUC) of our model was 0.9983 (bootstrap confidence interval [bsCI]: 0.9983–0.9985).ConclusionsThe GBDT model was more effective in detecting specimen mix-up. The improved accuracy will enable more facilities to perform more efficient and centralized mix-up detection, leading to improved patient safety.



Sign in / Sign up

Export Citation Format

Share Document