T-Cube: A Data Structure for Fast Extraction of Time Series from Large Datasets

Author(s):  
Maheshkumar Sabhnani ◽  
Andrew W. Moore ◽  
Artur W. Dubrawski
2019 ◽  
Author(s):  
Juan G. Diaz Ochoa

AbstractIt is common to consider a data-intensive strategy to be an appropriate way to develop systemic analyses in biology and physiology. Therefore, options for data collection, sampling, standardization, visualization, and interpretation determine how causes are identified in time series to build mathematical models. However, there are often biases in the collected data that can affect the validity of the model: while collecting enough large datasets seems to be a good strategy for reducing the bias of the collected data, persistent and dynamical anomalies in the data structure can affect the overall validity of the model. In this work we present a methodology based on the definition of homological groups to evaluate persistent anomalies in the structure of the sampled time series. In this evaluation relevant patterns in the combination of different time series are clustered and grouped to customize the identification of causal relationships between parameters. We test this methodology on data collected from patients using mobile sensors to test the response to physical exercise in real-world conditions and outside the lab. With this methodology we plan to obtain a patient stratification of the time series to customize models in medicine.


2020 ◽  
Vol 8 ◽  
Author(s):  
Juan G. Diaz Ochoa

It is common to consider using a data-intensive strategy as a way to develop systemic and quantitative analysis of complex systems so that data collection, sampling, standardization, visualization, and interpretation can determine how causal relationships are identified and incorporated into mathematical models. Collecting enough large datasets seems to be a good strategy in reducing bias of the collected data; but persistent and dynamic anomalies in the data structure, generated from variations in intrinsic mechanisms, can actually induce persistent entropy thus affecting the overall validity of quantitative models. In this research, we are introducing a method based on the definition of homological groups that aims at evaluating this persistent entropy as a complexity measure to estimate the observability of the systems. This method identifies patterns with persistent topology, extracted from the combination of different time series and clustering them to identify persistent bias in the data. We tested this method on accumulated data from patients using mobile sensors to measure the response of physical exercise in real-world conditions outside the lab. With this method, we aim to better stratify time series and customize models in complex biological systems.


2015 ◽  
Vol 733 ◽  
pp. 867-870
Author(s):  
Zhen Zhong Jin ◽  
Zheng Huang ◽  
Hua Zhang

The suffix tree is a useful data structure constructed for indexing strings. However, when it comes to large datasets of discrete contents, most existing algorithms become very inefficient. Discrete datasets are need to be indexed in many fields like record analysis, data analyze in sensor network, association analysis etc. This paper presents an algorithm, STD, which stands for Suffix Tree for Discrete contents, that performs very efficiently with discrete input datasets. It imports several wonderful intermediate data structures for discrete strings; we also take care of the situation that the discrete input strings have similar characteristics. Moreover, STD keeps the advantages of existing implementations which are for successive input strings. Experiments were taken to evaluate the performance and shown that the method works well.


2011 ◽  
Vol 03 (01n02) ◽  
pp. 167-186 ◽  
Author(s):  
YING JIANG ◽  
DONG MAO ◽  
YUESHENG XU

Sample entropy is a widely used tool for quantifying complexity of a biological system. Computing sample entropy directly using its definition requires large computational costs. We propose a fast algorithm based on a k-d tree data structure for computing sample entropy. We prove that the time complexity of the proposed algorithm is [Formula: see text] and its space complexity is O(N log N), where N is the length of the input time series and m is the length of its pattern templates. We present a numerical experiment that demonstrates significant improvement of the proposed algorithm in computing time.


2018 ◽  
Vol 37 (3) ◽  
pp. 23-35 ◽  
Author(s):  
Fabio Miranda ◽  
Marcos Lage ◽  
Harish Doraiswamy ◽  
Charlie Mydlarz ◽  
Justin Salamon ◽  
...  

2012 ◽  
Vol 3 (2) ◽  
pp. 279-283
Author(s):  
Rahul Sharma ◽  
Dr. Manish Manoria

The essential aspect of mining association rules is to mine the frequent patterns. Due to native difficulty it is impossible to mine complete frequent patterns from a dense database. FP-growth algorithm has been implemented using an Array-based structure, known as the FP-tree,which is for storing compressed frequency information. Numerous experimental results have demonstrated that the algorithm performs extremely well. But in FP-growth algorithm, two traversals of FP-tree are needed for constructing the new conditional FP-tree. In this paper we present a novel Array Based Without Scanning Frequent Pattern (ABWSFP) tree technique that greatly reduces the need to traverse FP-trees, thus obtaining significantly improved performance for FP-tree based algorithms. The technique works especially well for large datasets. We then present a new algorithm which use the QFP-tree data structure in combination with the FP Tree- Experimental results show that the new algorithm outperform other algorithm in not only the speed of algorithms, but also their CPU consumption and their scalability.


2021 ◽  
pp. 267-284
Author(s):  
Ye-In Chang ◽  
◽  
Cheng-An Fu ◽  
Jia-Zhen Que

Periodic pattern mining in time series database plays an important part in data mining. However, most existing algorithms consider only the count of each item, but do not consider about the value of each item. To consider the value of each item on periodic pattern mining in time series databases, Chanda et al. proposed an algorithm called WPPM. In their algorithm, they construct the suffix trie to store the candidate pattern at first. However, the suffix trie would use too much storage space. In order to decrease the processing time for constructing the data structure, in this paper, we propose two data structures to store the candidates. The first data structure is Weighted Paired Matrix. After scanning the database, we will transform the database into the matrix type, and it is used for the second data structures. Therefore, our algorithm not only can decrease the usage of the memory space, but also the processing time. Because we do not need to use so much time to construct so many nodes and edges. Moreover, wealso consider the case of incremental mining for the increase of the data length. From the performance study, we show that our proposed algorithm based on the Weighted Direction Graphis more efficient than the WPPMalgorithm.


Sign in / Sign up

Export Citation Format

Share Document