scholarly journals Efficient statistical significance approximation for local similarity analysis of high-throughput time series data

2012 ◽  
Vol 29 (2) ◽  
pp. 230-237 ◽  
Author(s):  
Li C. Xia ◽  
Dongmei Ai ◽  
Jacob Cram ◽  
Jed A. Fuhrman ◽  
Fengzhu Sun
2011 ◽  
Vol 5 (Suppl 2) ◽  
pp. S15 ◽  
Author(s):  
Li C Xia ◽  
Joshua A Steele ◽  
Jacob A Cram ◽  
Zoe G Cardon ◽  
Sheri L Simmons ◽  
...  

2018 ◽  
Author(s):  
Alexander M Crowell ◽  
Jennifer J. Loros ◽  
Jay C Dunlap

AbstractMotivationIdentification of constitutive reference genes is critical for analysis of gene expression. Large numbers of high throughput time series expression data are available, but current methods for identifying invariant expression are not tailored for time series. Identification of reference genes from these data sets can benefit from methods which incorporate the additional information they provide.ResultsHere we show that we can improve identification of invariant expression from time series by modelling the time component of the data. We implement the Prediction Interval Ranking Score (PIRS) software, which screens high throughput time series data and provides a ranked list of reference candidates. We expect that PIRS will improve the quality of gene expression analysis by allowing researchers to identify the best reference genes for their system from publicly available time series.AvailabilityPIRS can be downloaded and installed with dependencies using ‘pip install pirs’ and Python code and documentation is available for download at https://github.com/aleccrowell/[email protected]


Author(s):  
Elangovan Ramanujam ◽  
S. Padmavathi

Innovations and applicability of time series data mining techniques have significantly increased the researchers' interest in the problem of time series classification. Several algorithms have been proposed for this purpose categorized under shapelet, interval, motif, and whole series-based techniques. Among this, the bag-of-words technique, an extensive application of the text mining approach, performs well due to its simplicity and effectiveness. To extend the efficiency of the bag-of-words technique, this paper proposes a discriminate supervised weighted scheme to identify the characteristic and representative pattern of a class for efficient classification. This paper uses a modified weighted matrix that discriminates the representative and non-representative pattern which enables the interpretability in classification. Experimentation has been carried out to compare the performance of the proposed technique with state-of-the-art techniques in terms of accuracy and statistical significance.


Circulation ◽  
2015 ◽  
Vol 132 (suppl_3) ◽  
Author(s):  
Shaker M Eid ◽  
Aiham Albaeni ◽  
Rebeca Rios ◽  
May Baydoun ◽  
Bolanle Akinyele ◽  
...  

Background: The intent of the 5-yearly Resuscitation Guidelines is to improve outcomes. Previous studies have yielded conflicting reports of a beneficial impact of the 2005 guidelines on out-of-hospital cardiac arrest (OHCA) survival. Using a national database, we examined survival before and after the introduction of both the 2005 and 2010 guidelines. Methods: We used the 2000 through 2012 National Inpatient Sample database to select patients ≥18 years admitted to hospitals in the United States with non-traumatic OHCA (ICD-9 CM codes 427.5 & 427.41). A quasi-experimental (interrupted time series) design was used to compare monthly survival trends. Outcomes for OHCA were compared pre- and post- 2005 and 2010 resuscitation guidelines release as follows: 01/2000-09/2005 vs. 10/2005-9/2010 and 10/2005-9/2010 vs. 10/2010-12/2012. Segmented regression analyses of interrupted time series data were performed to examine changes in survival to hospital discharge. Results: For the pre- and post- guidelines periods, 81600, 69139 and 36556 patients respectively survived to hospital admission following OHCA. Subsequent to the release of the 2005 guidelines, there was a statistically significant worsening in survival trends (β= -0.089, 95% CI -0.163 – -0.016, p =0.018) until the release of the 2010 guidelines when a sharp increase in survival was noted which persisted for the period of study (β= 0.054, 95% CI -0.143 – 0.251, p =0.588) but did not achieve statistical significance (Figure). Conclusion: National clinical guidelines developed to impact outcomes must include mechanisms to assess whether benefit actually occurs. The worsening in OHCA survival following the 2005 guidelines is thought provoking but the improvement following the release of the 2010 guidelines is reassuring and worthy of perpetuation.


Author(s):  
Fang Zhang ◽  
Ang Shan ◽  
Yihui Luan

Abstract In recent years, a large number of time series microbial community data has been produced in molecular biological studies, especially in metagenomics. Among the statistical methods for time series, local similarity analysis is used in a wide range of environments to capture potential local and time-shifted associations that cannot be distinguished by traditional correlation analysis. Initially, the permutation test is popularly applied to obtain the statistical significance of local similarity analysis. More recently, a theoretical method has also been developed to achieve this aim. However, all these methods require the assumption that the time series are independent and identically distributed. In this paper, we propose a new approach based on moving block bootstrap to approximate the statistical significance of local similarity scores for dependent time series. Simulations show that our method can control the type I error rate reasonably, while theoretical approximation and the permutation test perform less well. Finally, our method is applied to human and marine microbial community datasets, indicating that it can identify potential relationship among operational taxonomic units (OTUs) and significantly decrease the rate of false positives.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Simon L. Turner ◽  
Amalia Karahalios ◽  
Andrew B. Forbes ◽  
Monica Taljaard ◽  
Jeremy M. Grimshaw ◽  
...  

Abstract Background The Interrupted Time Series (ITS) is a quasi-experimental design commonly used in public health to evaluate the impact of interventions or exposures. Multiple statistical methods are available to analyse data from ITS studies, but no empirical investigation has examined how the different methods compare when applied to real-world datasets. Methods A random sample of 200 ITS studies identified in a previous methods review were included. Time series data from each of these studies was sought. Each dataset was re-analysed using six statistical methods. Point and confidence interval estimates for level and slope changes, standard errors, p-values and estimates of autocorrelation were compared between methods. Results From the 200 ITS studies, including 230 time series, 190 datasets were obtained. We found that the choice of statistical method can importantly affect the level and slope change point estimates, their standard errors, width of confidence intervals and p-values. Statistical significance (categorised at the 5% level) often differed across the pairwise comparisons of methods, ranging from 4 to 25% disagreement. Estimates of autocorrelation differed depending on the method used and the length of the series. Conclusions The choice of statistical method in ITS studies can lead to substantially different conclusions about the impact of the interruption. Pre-specification of the statistical method is encouraged, and naive conclusions based on statistical significance should be avoided.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4681
Author(s):  
Daniel A. Cuevas ◽  
Robert A. Edwards

High-throughput phenotype assays are a cornerstone of systems biology as they allow direct measurements of mutations, genes, strains, or even different genera. High-throughput methods also require data analytic methods that reduce complex time-series data to a single numeric evaluation. Here, we present the Growth Score, an improvement on the previous Growth Level formula. There is strong correlation between Growth Score and Growth Level, but the new Growth Score contains only essential growth curve properties while the formula of the previous Growth Level was convoluted and not easily interpretable. Several programs can be used to estimate the parameters required to calculate the Growth Score metric, including ourPMAnalyzerpipeline.


Sign in / Sign up

Export Citation Format

Share Document