T-Cube: A Data Structure for Fast Extraction of Time Series from Large Datasets

Relative distances between homology groups to assess persistent defects in time series

10.1101/595785 ◽

2019 ◽

Author(s):

Juan G. Diaz Ochoa

Keyword(s):

Time Series ◽

Data Structure ◽

Large Datasets ◽

Mobile Sensors ◽

Patient Stratification ◽

Good Strategy ◽

Homology Groups ◽

Data Intensive ◽

Persistent Anomalies ◽

Definition Of

AbstractIt is common to consider a data-intensive strategy to be an appropriate way to develop systemic analyses in biology and physiology. Therefore, options for data collection, sampling, standardization, visualization, and interpretation determine how causes are identified in time series to build mathematical models. However, there are often biases in the collected data that can affect the validity of the model: while collecting enough large datasets seems to be a good strategy for reducing the bias of the collected data, persistent and dynamical anomalies in the data structure can affect the overall validity of the model. In this work we present a methodology based on the definition of homological groups to evaluate persistent anomalies in the structure of the sampled time series. In this evaluation relevant patterns in the combination of different time series are clustered and grouped to customize the identification of causal relationships between parameters. We test this methodology on data collected from patients using mobile sensors to test the response to physical exercise in real-world conditions and outside the lab. With this methodology we plan to obtain a patient stratification of the time series to customize models in medicine.

Download Full-text

Observability of Complex Systems by Means of Relative Distances Between Homological Groups

Frontiers in Physics ◽

10.3389/fphy.2020.465982 ◽

2020 ◽

Vol 8 ◽

Author(s):

Juan G. Diaz Ochoa

Keyword(s):

Time Series ◽

Data Structure ◽

Quantitative Analysis ◽

Complex Systems ◽

Large Datasets ◽

Mobile Sensors ◽

Good Strategy ◽

Data Intensive ◽

Persistent Topology ◽

Definition Of

It is common to consider using a data-intensive strategy as a way to develop systemic and quantitative analysis of complex systems so that data collection, sampling, standardization, visualization, and interpretation can determine how causal relationships are identified and incorporated into mathematical models. Collecting enough large datasets seems to be a good strategy in reducing bias of the collected data; but persistent and dynamic anomalies in the data structure, generated from variations in intrinsic mechanisms, can actually induce persistent entropy thus affecting the overall validity of quantitative models. In this research, we are introducing a method based on the definition of homological groups that aims at evaluating this persistent entropy as a complexity measure to estimate the observability of the systems. This method identifies patterns with persistent topology, extracted from the combination of different time series and clustering them to identify persistent bias in the data. We tested this method on accumulated data from patients using mobile sensors to measure the response of physical exercise in real-world conditions outside the lab. With this method, we aim to better stratify time series and customize models in complex biological systems.

Download Full-text

Suffix Tree Constructing Algorithm for Datasets with Discrete Contents

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.733.867 ◽

2015 ◽

Vol 733 ◽

pp. 867-870

Author(s):

Zhen Zhong Jin ◽

Zheng Huang ◽

Hua Zhang

Keyword(s):

Data Structure ◽

Sensor Network ◽

Association Analysis ◽

Data Structures ◽

Suffix Tree ◽

Analysis Data ◽

Large Datasets ◽

Intermediate Data ◽

Input Strings

The suffix tree is a useful data structure constructed for indexing strings. However, when it comes to large datasets of discrete contents, most existing algorithms become very inefficient. Discrete datasets are need to be indexed in many fields like record analysis, data analyze in sensor network, association analysis etc. This paper presents an algorithm, STD, which stands for Suffix Tree for Discrete contents, that performs very efficiently with discrete input datasets. It imports several wonderful intermediate data structures for discrete strings; we also take care of the situation that the discrete input strings have similar characteristics. Moreover, STD keeps the advantages of existing implementations which are for successive input strings. Experiments were taken to evaluate the performance and shown that the method works well.

Download Full-text

Data structure enabling retrieval of time series of traffic with the requested granularity

2014 IEEE International Conference on Communication Systems ◽

10.1109/iccs.2014.7024863 ◽

2014 ◽

Author(s):

Yoshihiro Tsuji ◽

Yuichi Ohsita ◽

Masayuki Muarata

Keyword(s):

Time Series ◽

Data Structure

Download Full-text

A FAST ALGORITHM FOR COMPUTING SAMPLE ENTROPY

Advances in Adaptive Data Analysis ◽

10.1142/s1793536911000775 ◽

2011 ◽

Vol 03 (01n02) ◽

pp. 167-186 ◽

Cited By ~ 18

Author(s):

YING JIANG ◽

DONG MAO ◽

YUESHENG XU

Keyword(s):

Time Series ◽

Data Structure ◽

Fast Algorithm ◽

Time Complexity ◽

Computing Time ◽

Sample Entropy ◽

Computational Costs ◽

Tree Data ◽

Input Time ◽

Tree Data Structure

Sample entropy is a widely used tool for quantifying complexity of a biological system. Computing sample entropy directly using its definition requires large computational costs. We propose a fast algorithm based on a k-d tree data structure for computing sample entropy. We prove that the time complexity of the proposed algorithm is [Formula: see text] and its space complexity is O(N log N), where N is the length of the input time series and m is the length of its pattern templates. We present a numerical experiment that demonstrates significant improvement of the proposed algorithm in computing time.

Download Full-text

Time Lattice: A Data Structure for the Interactive Visual Analysis of Large Time Series

Computer Graphics Forum ◽

10.1111/cgf.13398 ◽

2018 ◽

Vol 37 (3) ◽

pp. 23-35 ◽

Cited By ~ 4

Author(s):

Fabio Miranda ◽

Marcos Lage ◽

Harish Doraiswamy ◽

Charlie Mydlarz ◽

Justin Salamon ◽

...

Keyword(s):

Time Series ◽

Data Structure ◽

Large Time ◽

Visual Analysis ◽

Interactive Visual Analysis

Download Full-text

On Estimating from a More General Time-Series Cum Cross-Section Data Structure

The American Economist ◽

10.1177/056943458703100211 ◽

1987 ◽

Vol 31 (2) ◽

pp. 69-71 ◽

Cited By ~ 7

Author(s):

Badi H. Baltagi

Keyword(s):

Time Series ◽

Data Structure ◽

Cross Section ◽

Cross Section Data ◽

Section Data ◽

General Time

Download Full-text

Estimating from a More General Time-Series Cum Cross-Section Data Structure

The American Economist ◽

10.1177/056943457602000103 ◽

1976 ◽

Vol 20 (1) ◽

pp. 15-21 ◽

Cited By ~ 12

Author(s):

Sukesh K. Ghosh

Keyword(s):

Time Series ◽

Data Structure ◽

Cross Section ◽

Cross Section Data ◽

Section Data ◽

General Time

Download Full-text

Novel Approach for Frequent Pattern Algorithm for Maximizing Frequent Patterns in Effective Time

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v3i2b.2876 ◽

2012 ◽

Vol 3 (2) ◽

pp. 279-283

Author(s):

Rahul Sharma ◽

Dr. Manish Manoria

Keyword(s):

Data Structure ◽

Large Datasets ◽

Experimental Results ◽

Frequent Pattern ◽

Frequent Patterns ◽

Effective Time ◽

Novel Approach ◽

Tree Data ◽

Improved Performance ◽

Tree Data Structure

The essential aspect of mining association rules is to mine the frequent patterns. Due to native difficulty it is impossible to mine complete frequent patterns from a dense database. FP-growth algorithm has been implemented using an Array-based structure, known as the FP-tree,which is for storing compressed frequency information. Numerous experimental results have demonstrated that the algorithm performs extremely well. But in FP-growth algorithm, two traversals of FP-tree are needed for constructing the new conditional FP-tree. In this paper we present a novel Array Based Without Scanning Frequent Pattern (ABWSFP) tree technique that greatly reduces the need to traverse FP-trees, thus obtaining significantly improved performance for FP-tree based algorithms. The technique works especially well for large datasets. We then present a new algorithm which use the QFP-tree data structure in combination with the FP Tree- Experimental results show that the new algorithm outperform other algorithm in not only the speed of algorithms, but also their CPU consumption and their scalability.

Download Full-text

Mining Weighted Periodic Patterns by a Weighted Direction Graph Based Approach for Time-Series Databases

Journal of Software ◽

10.17706/jsw.16.6.267-284 ◽

2021 ◽

pp. 267-284

Author(s):

Ye-In Chang ◽

◽

Cheng-An Fu ◽

Jia-Zhen Que

Keyword(s):

Time Series ◽

Data Structure ◽

Data Structures ◽

Processing Time ◽

Pattern Mining ◽

Periodic Pattern ◽

Performance Study ◽

Periodic Patterns ◽

Memory Space ◽

Suffix Trie

Periodic pattern mining in time series database plays an important part in data mining. However, most existing algorithms consider only the count of each item, but do not consider about the value of each item. To consider the value of each item on periodic pattern mining in time series databases, Chanda et al. proposed an algorithm called WPPM. In their algorithm, they construct the suffix trie to store the candidate pattern at first. However, the suffix trie would use too much storage space. In order to decrease the processing time for constructing the data structure, in this paper, we propose two data structures to store the candidates. The first data structure is Weighted Paired Matrix. After scanning the database, we will transform the database into the matrix type, and it is used for the second data structures. Therefore, our algorithm not only can decrease the usage of the memory space, but also the processing time. Because we do not need to use so much time to construct so many nodes and edges. Moreover, wealso consider the case of incremental mining for the increase of the data length. From the performance study, we show that our proposed algorithm based on the Weighted Direction Graphis more efficient than the WPPMalgorithm.

Download Full-text