Efficient Time Series Clustering and Its Application to Social Network Mining

2014 ◽  
Vol 23 (2) ◽  
pp. 213-229 ◽  
Author(s):  
Cangqi Zhou ◽  
Qianchuan Zhao

AbstractMining time series data is of great significance in various areas. To efficiently find representative patterns in these data, this article focuses on the definition of a valid dissimilarity measure and the acceleration of partitioning clustering, a common group of techniques used to discover typical shapes of time series. Dissimilarity measure is a crucial component in clustering. It is required, by some particular applications, to be invariant to specific transformations. The rationale for using the angle between two time series to define a dissimilarity is analyzed. Moreover, our proposed measure satisfies the triangle inequality with specific restrictions. This property can be employed to accelerate clustering. An integrated algorithm is proposed. The experiments show that angle-based dissimilarity captures the essence of time series patterns that are invariant to amplitude scaling. In addition, the accelerated algorithm outperforms the standard one as redundancies are pruned. Our approach has been applied to discover typical patterns of information diffusion in an online social network. Analyses revealed the formation mechanisms of different patterns.

2021 ◽  
Author(s):  
Samantha J Gleich ◽  
Jacob A Cram ◽  
Jake L Weissman ◽  
David A Caron

Ecological network analyses are used to identify potential biotic interactions between microorganisms from species abundance data. These analyses are often carried out using time-series data; however, time-series networks have unique statistical challenges. Time-dependent species abundance data can lead to species co-occurrence patterns that are not a result of direct, biotic associations and may therefore result in inaccurate network predictions. Here, we describe a generalize additive model (GAM)-based data transformation that removes time-series signals from species abundance data prior to running network analyses. Validation of the transformation was carried out by generating mock, time-series datasets, with an underlying covariance structure, running network analyses on these datasets with and without our GAM transformation, and comparing the network outputs to the known covariance structure of the simulated data. The results revealed that seasonal abundance patterns substantially decreased the accuracy of the inferred networks. Additionally, the GAM transformation increased the F1 score of inferred ecological networks on average and improved the ability of network inference methods to capture important features of network structure. This study underscores the importance of considering temporal features when carrying out network analyses and describes a simple, effective tool that can be used to improve results.


Author(s):  
Steven M. Rock

Instrumentation is one of the threats to the validity of experiments. Four possible cases of instrumentation in a time series of traffic accident statistics in Illinois since the mid-1970s were tested, primarily by using autoregressive integrated moving average methods. Two of these cases, a 1977 change in the reporting threshold for property-damage-only (PDO) accidents and a 1989 change in the definition of a fatality, were not found to be significant. A 1989 change in the method of tabulating monthly data and a 1992 change in the reporting threshold for PDO accidents were statistically significant. These two cases combined could account for a more than 15 percent decline in PDO accidents.


Author(s):  
Harleigh C. Seyffert ◽  
Armin W. Troesch

This paper addresses the existence of rare wave groups, as defined by Kim and Troesch [1], by examining time series data from the Pt. Reyes buoy. The buoy is operated by the Coastal Data Information Program (CDIP), University of California San Diego. The definition of rare wave groups [1] used in this paper differs from the more commonly used wave group definition based on threshold crossings. With the time series data from the Pt. Reyes buoy, these rare wave groups are shown to be a naturally occurring phenomenon. The nature of the data is examined, as well as the analysis methods and findings. By sifting through 17 years of wave elevation data from the Pt. Reyes buoy, this preliminary work addresses not only the question to what extent rare wave groups exist in nature, but also, what their probability of occurrence is.


Author(s):  
Yan Zhu ◽  
Makoto Imamura ◽  
Daniel Nikovski ◽  
Eamonn Keogh

Since their introduction over a decade ago, time se-ries motifs have become a fundamental tool for time series analytics, finding diverse uses in dozens of domains. In this work we introduce Time Series Chains, which are related to, but distinct from, time series motifs. Informally, time series chains are a temporally ordered set of subsequence patterns, such that each pattern is similar to the pattern that preceded it, but the first and last patterns are arbi-trarily dissimilar. In the discrete space, this is simi-lar to extracting the text chain “hit, hot, dot, dog” from a paragraph. The first and last words have nothing in common, yet they are connected by a chain of words with a small mutual difference. Time Series Chains can capture the evolution of systems, and help predict the future. As such, they potentially have implications for prognostics. In this work, we introduce a robust definition of time series chains, and a scalable algorithm that allows us to discover them in massive datasets.


Author(s):  
Harleigh C. Seyffert ◽  
Armin W. Troesch

This paper addresses the existence of rare wave groups by examining time series data from the Pt. Reyes buoy. The buoy is operated by the Coastal Data Information Program (CDIP), University of California San Diego. The definition of rare wave groups, as defined by Kim and Troesch, used in this paper differs from the more commonly used wave group definition based on threshold crossings. With the time series data from the Pt. Reyes buoy, these rare wave groups are shown to be a naturally occurring phenomenon. The essential features of the data are examined, as well as the analysis methods and findings. By sifting through 17 years of wave elevation data from the Pt. Reyes buoy, this preliminary work addresses not only the question to what extent rare wave groups exist in nature but also what their probability of occurrence is.


Genes ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 216 ◽  
Author(s):  
Dongmei Ai ◽  
Xiaoxin Li ◽  
Gang Liu ◽  
Xiaoyi Liang ◽  
Li Xia

The increasing availability of large-scale time series data allows the inference of microbial community dynamics by association network analysis. However, correlation-based association network analyses are noninformative of causal, mediating and time-dependent relationships between microbial community functional factors. To address this insufficiency, we introduced the Granger causality model to the analysis of a recent marine microbial time series dataset. We systematically constructed a directed acyclic network, representing both internal and external causal relationships among the microbial and environmental factors. We further optimized the network by removing false causal associations using the conditional Granger causality. The final network was visualized as a Granger graph, which was analyzed to identify causal relationships driven by key functional operators in the environment, such as Gammaproteobacteria, which was Granger caused by total organic nitrogen and primary production (p < 0.05 and Q < 0.05).


Author(s):  
James B. Elsner ◽  
Thomas H. Jagger

In this chapter, we consider time series models. A time series is an ordered sequence of numbers with respect to time. In climatology, you encounter time-series data in a format given by . . . {h}Tt=1 = {h1,h2,. . . ,hT} (10.1) . . . where the time t is over a given season, month, week, or day and T is the time series length. The aim is to understand the underlying physical processes that produced the series. A trend is an example. Often by simply looking at a time series plot, you can pick out a trend that tells you that the process generating the data is changing. A single time series gives you a sample from the process. Yet under the ergodic hypothesis, a single time series of infinite length contains the same information (loosely speaking) as the collection of all possible series of finite length. In this case, you can use your series to learn about the nature of the process. This is analogous to spatial interpolation encountered in Chapter 9, where the variogram was computed under the assumption that the rainfall field is stationary. Here we consider a selection of techniques and models for time series data. We begin by showing you how to overlay plots as a tool for exploratory analysis. This is done to compare the variation between two series qualitatively. We demonstrate large variation in hurricane counts arising from a constant rate process. We then show techniques for smoothing. We continue with a change-point model and techniques for decomposing a continuous-valued series. We conclude with a unique way to create a network graph from a time series of counts and suggest a new definition of a climate anomaly. A plot showing your variables on a common time axis is an informative exploratory graph. Values from two different series are scaled to have the same relative range so the covariation in the variables can be compared visually. Here you do this with hurricane counts and sea-surface temperature (SST). Begin by loading annual.RData. These data were assembled in Chapter 6.


Sign in / Sign up

Export Citation Format

Share Document