Efficient statistical significance approximation for local similarity analysis of high-throughput time series data

AbstractMotivationIdentification of constitutive reference genes is critical for analysis of gene expression. Large numbers of high throughput time series expression data are available, but current methods for identifying invariant expression are not tailored for time series. Identification of reference genes from these data sets can benefit from methods which incorporate the additional information they provide.ResultsHere we show that we can improve identification of invariant expression from time series by modelling the time component of the data. We implement the Prediction Interval Ranking Score (PIRS) software, which screens high throughput time series data and provides a ranked list of reference candidates. We expect that PIRS will improve the quality of gene expression analysis by allowing researchers to identify the best reference genes for their system from publicly available time series.AvailabilityPIRS can be downloaded and installed with dependencies using ‘pip install pirs’ and Python code and documentation is available for download at https://github.com/aleccrowell/[email protected]

Download Full-text

Discriminate Supervised Weighted Scheme for the Classification of Time Series Signals

International Journal of Sociotechnology and Knowledge Development ◽

10.4018/ijskd.2021070101 ◽

2021 ◽

Vol 13 (3) ◽

pp. 1-16

Author(s):

Elangovan Ramanujam ◽

S. Padmavathi

Keyword(s):

Time Series ◽

Time Series Data ◽

State Of The Art ◽

Statistical Significance ◽

Series Data ◽

Bag Of Words ◽

Time Series Classification ◽

Problem Of Time ◽

Weighted Matrix

Innovations and applicability of time series data mining techniques have significantly increased the researchers' interest in the problem of time series classification. Several algorithms have been proposed for this purpose categorized under shapelet, interval, motif, and whole series-based techniques. Among this, the bag-of-words technique, an extensive application of the text mining approach, performs well due to its simplicity and effectiveness. To extend the efficiency of the bag-of-words technique, this paper proposes a discriminate supervised weighted scheme to identify the characteristic and representative pattern of a class for efficient classification. This paper uses a modified weighted matrix that discriminates the representative and non-representative pattern which enables the interpretability in classification. Experimentation has been carried out to compare the performance of the proposed technique with state-of-the-art techniques in terms of accuracy and statistical significance.

Download Full-text

Abstract 19225: Impact of Change in Resuscitation Guidelines on National Out-of-hospital Cardiac Arrest Outcomes: Fulfilled Expectations?

Circulation ◽

10.1161/circ.132.suppl_3.19225 ◽

2015 ◽

Vol 132 (suppl_3) ◽

Author(s):

Shaker M Eid ◽

Aiham Albaeni ◽

Rebeca Rios ◽

May Baydoun ◽

Bolanle Akinyele ◽

...

Keyword(s):

Time Series ◽

Cardiac Arrest ◽

Time Series Data ◽

Statistical Significance ◽

Interrupted Time Series ◽

The United States ◽

Series Data ◽

National Database ◽

Resuscitation Guidelines ◽

Hospital Cardiac Arrest

Background: The intent of the 5-yearly Resuscitation Guidelines is to improve outcomes. Previous studies have yielded conflicting reports of a beneficial impact of the 2005 guidelines on out-of-hospital cardiac arrest (OHCA) survival. Using a national database, we examined survival before and after the introduction of both the 2005 and 2010 guidelines. Methods: We used the 2000 through 2012 National Inpatient Sample database to select patients ≥18 years admitted to hospitals in the United States with non-traumatic OHCA (ICD-9 CM codes 427.5 & 427.41). A quasi-experimental (interrupted time series) design was used to compare monthly survival trends. Outcomes for OHCA were compared pre- and post- 2005 and 2010 resuscitation guidelines release as follows: 01/2000-09/2005 vs. 10/2005-9/2010 and 10/2005-9/2010 vs. 10/2010-12/2012. Segmented regression analyses of interrupted time series data were performed to examine changes in survival to hospital discharge. Results: For the pre- and post- guidelines periods, 81600, 69139 and 36556 patients respectively survived to hospital admission following OHCA. Subsequent to the release of the 2005 guidelines, there was a statistically significant worsening in survival trends (β= -0.089, 95% CI -0.163 – -0.016, p =0.018) until the release of the 2010 guidelines when a sharp increase in survival was noted which persisted for the period of study (β= 0.054, 95% CI -0.143 – 0.251, p =0.588) but did not achieve statistical significance (Figure). Conclusion: National clinical guidelines developed to impact outcomes must include mechanisms to assess whether benefit actually occurs. The worsening in OHCA survival following the 2005 guidelines is thought provoking but the improvement following the release of the 2010 guidelines is reassuring and worthy of perpetuation.

Download Full-text

Curve Fitting for Short Time Series Data from High Throughput Experiments with Correction for Biological Variation

Advances in Intelligent Data Analysis XI - Lecture Notes in Computer Science ◽

10.1007/978-3-642-34156-4_15 ◽

2012 ◽

pp. 150-160 ◽

Cited By ~ 2

Author(s):

Frank Klawonn ◽

Nada Abidi ◽

Evelin Berger ◽

Lothar Jänsch

Keyword(s):

Time Series ◽

High Throughput ◽

Curve Fitting ◽

Time Series Data ◽

Biological Variation ◽

Series Data ◽

Short Time Series ◽

Short Time ◽

High Throughput Experiments

Download Full-text

A novel method to accurately calculate statistical significance of local similarity analysis for high-throughput time series

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2018-0019 ◽

2018 ◽

Vol 17 (6) ◽

Cited By ~ 1

Author(s):

Fang Zhang ◽

Ang Shan ◽

Yihui Luan

Keyword(s):

Time Series ◽

Microbial Community ◽

Permutation Test ◽

Statistical Significance ◽

Theoretical Method ◽

Similarity Analysis ◽

Type I ◽

Local Similarity ◽

Biological Studies ◽

Wide Range

Abstract In recent years, a large number of time series microbial community data has been produced in molecular biological studies, especially in metagenomics. Among the statistical methods for time series, local similarity analysis is used in a wide range of environments to capture potential local and time-shifted associations that cannot be distinguished by traditional correlation analysis. Initially, the permutation test is popularly applied to obtain the statistical significance of local similarity analysis. More recently, a theoretical method has also been developed to achieve this aim. However, all these methods require the assumption that the time series are independent and identically distributed. In this paper, we propose a new approach based on moving block bootstrap to approximate the statistical significance of local similarity scores for dependent time series. Simulations show that our method can control the type I error rate reasonably, while theoretical approximation and the permutation test perform less well. Finally, our method is applied to human and marine microbial community datasets, indicating that it can identify potential relationship among operational taxonomic units (OTUs) and significantly decrease the rate of false positives.

Download Full-text

Comparison of six statistical methods for interrupted time series studies: empirical evaluation of 190 published series

BMC Medical Research Methodology ◽

10.1186/s12874-021-01306-w ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Simon L. Turner ◽

Amalia Karahalios ◽

Andrew B. Forbes ◽

Monica Taljaard ◽

Jeremy M. Grimshaw ◽

...

Keyword(s):

Time Series ◽

Statistical Method ◽

Statistical Methods ◽

Time Series Data ◽

Statistical Significance ◽

Empirical Evaluation ◽

Interrupted Time Series ◽

Series Data ◽

Standard Errors ◽

The Impact

Abstract Background The Interrupted Time Series (ITS) is a quasi-experimental design commonly used in public health to evaluate the impact of interventions or exposures. Multiple statistical methods are available to analyse data from ITS studies, but no empirical investigation has examined how the different methods compare when applied to real-world datasets. Methods A random sample of 200 ITS studies identified in a previous methods review were included. Time series data from each of these studies was sought. Each dataset was re-analysed using six statistical methods. Point and confidence interval estimates for level and slope changes, standard errors, p-values and estimates of autocorrelation were compared between methods. Results From the 200 ITS studies, including 230 time series, 190 datasets were obtained. We found that the choice of statistical method can importantly affect the level and slope change point estimates, their standard errors, width of confidence intervals and p-values. Statistical significance (categorised at the 5% level) often differed across the pairwise comparisons of methods, ranging from 4 to 25% disagreement. Estimates of autocorrelation differed depending on the method used and the length of the series. Conclusions The choice of statistical method in ITS studies can lead to substantially different conclusions about the impact of the interruption. Pre-specification of the statistical method is encouraged, and naive conclusions based on statistical significance should be avoided.

Download Full-text

Growth Score: a single metric to define growth in 96-well phenotype assays

PeerJ ◽

10.7717/peerj.4681 ◽

2018 ◽

Vol 6 ◽

pp. e4681

Author(s):

Daniel A. Cuevas ◽

Robert A. Edwards

Keyword(s):

Time Series ◽

Systems Biology ◽

High Throughput ◽

Growth Curve ◽

Time Series Data ◽

Series Data ◽

Direct Measurements ◽

Essential Growth ◽

Data Analytic ◽

Require Data

High-throughput phenotype assays are a cornerstone of systems biology as they allow direct measurements of mutations, genes, strains, or even different genera. High-throughput methods also require data analytic methods that reduce complex time-series data to a single numeric evaluation. Here, we present the Growth Score, an improvement on the previous Growth Level formula. There is strong correlation between Growth Score and Growth Level, but the new Growth Score contains only essential growth curve properties while the formula of the previous Growth Level was convoluted and not easily interpretable. Several programs can be used to estimate the parameters required to calculate the Growth Score metric, including ourPMAnalyzerpipeline.

Download Full-text