scholarly journals HOVA-FPPM: Flexible Periodic Pattern Mining in Time Series Databases Using Hashed Occurrence Vectors and Apriori Approach

2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Muhammad Fasih Javed ◽  
Waqas Nawaz ◽  
Kifayat Ullah Khan

Finding flexible periodic patterns in a time series database is nontrivial due to irregular occurrence of unimportant events, which makes it intractable or computationally intensive for large datasets. There exist various solutions based on Apriori, projection, tree, and other techniques to mine these patterns. However, the existence of constant size tree structure, i.e., suffix tree, with extra information in memory throughout the mining process, redundant and invalid pattern generation, limited types of mined flexible periodic patterns, and repeated traversal over tree data structure for pattern discovery, results in unacceptable space and time complexity. In order to overcome these issues, we introduce an efficient approach called HOVA-FPPM based on Apriori approach with hashed occurrence vectors to find all types of flexible periodic patterns. We do not rely on complex tree structure rather manage necessary information in a hash table for efficient lookup during the mining process. We measured the performance of our proposed approach and compared the results with the baseline approach, i.e., FPPM. The results show that our approach requires lesser time and space, regardless of the data size or period value.

Author(s):  
Wynne Hsu ◽  
Mong Li Lee ◽  
Junmei Wang

In this chapter, we describe a new periodicity detection algorithm to efficiently discover short period patterns that may exist in only a limited range of the time series. We refer to these patterns as the dense periodic patterns, where the periodicity is focused on part of the time series. We present a dense periodic pattern mining algorithm called DPMiner to find dense periodic patterns, and design a pruning strategy to limit the search space to the feasible periods. Experimental results on both real-life and synthetic datasets indicate that DPMiner is both scalable and efficient.


2021 ◽  
pp. 267-284
Author(s):  
Ye-In Chang ◽  
◽  
Cheng-An Fu ◽  
Jia-Zhen Que

Periodic pattern mining in time series database plays an important part in data mining. However, most existing algorithms consider only the count of each item, but do not consider about the value of each item. To consider the value of each item on periodic pattern mining in time series databases, Chanda et al. proposed an algorithm called WPPM. In their algorithm, they construct the suffix trie to store the candidate pattern at first. However, the suffix trie would use too much storage space. In order to decrease the processing time for constructing the data structure, in this paper, we propose two data structures to store the candidates. The first data structure is Weighted Paired Matrix. After scanning the database, we will transform the database into the matrix type, and it is used for the second data structures. Therefore, our algorithm not only can decrease the usage of the memory space, but also the processing time. Because we do not need to use so much time to construct so many nodes and edges. Moreover, wealso consider the case of incremental mining for the increase of the data length. From the performance study, we show that our proposed algorithm based on the Weighted Direction Graphis more efficient than the WPPMalgorithm.


2011 ◽  
Vol 03 (01n02) ◽  
pp. 167-186 ◽  
Author(s):  
YING JIANG ◽  
DONG MAO ◽  
YUESHENG XU

Sample entropy is a widely used tool for quantifying complexity of a biological system. Computing sample entropy directly using its definition requires large computational costs. We propose a fast algorithm based on a k-d tree data structure for computing sample entropy. We prove that the time complexity of the proposed algorithm is [Formula: see text] and its space complexity is O(N log N), where N is the length of the input time series and m is the length of its pattern templates. We present a numerical experiment that demonstrates significant improvement of the proposed algorithm in computing time.


The important class of regularities that exist in a time series is nothing but the Partial periodic patterns. These patterns have key properties such as starting, stopping, and restartinganywhere− within a series. Partial periodic patterns areclassifiedinto two types: (i) regular patterns− exhibiting periodic behavior throughout a series with some exceptions and( ii) periodic patterns exhibiting periodic behavior only for particular time intervals within a series. We have focused primarily on finding regular patterns during past studies on partial periodic search. The knowledge pertaining to periodic patterns cannot be ignored. This is because useful information pertaining to seasonal or time-based associations between events is provided bythem. Because of the foll o wi n g two main reasons, finding periodic patterns is a non-trivial task. (i) Each periodic pattern is associated with time-based information pertaining to its durations of periodic appearances in a series. Since the information can vary within and across patterns, obtaining this information ischallenging. (ii) As they do not satisfy the anti-monotonic property, finding all periodic patterns is a computationally expensive process. In this paper, periodic pattern model is proposed by addressing the above issues. Periodic Pattern growth algorithm along with an efficient pruning technique is also proposed to discover these patterns. The results through Experimentation have shown that Periodic patterns canbe really useful and it has also proven that our algorithm isnoteworthy.


2013 ◽  
Vol 40 (8) ◽  
pp. 3015-3027 ◽  
Author(s):  
Manziba Akanda Nishi ◽  
Chowdhury Farhan Ahmed ◽  
Md. Samiullah ◽  
Byeong-Soo Jeong

Author(s):  
Imam Mukhlash ◽  
Desna Yuanda ◽  
Mohammad Iqbal

A convergence of technologies in data mining, machine learning, and a persuasive computer has led to an interest in the development of smart environment to help human with functions, such as monitoring and remote health interventions, activity recognition, energy saving. The need for technology development was confirmed again by the aging population and the importance of individual independent in their own homes. Pattern mining on sensor data from smart home is widely applied in research such as using data mining. In this paper, we proposed a periodic pattern mining in smart house data that is integrated between the FP-Growth PrefixSpan algorithm and a fuzzy approach, which is called as fuzzy-time interval periodic patterns mining. Our purpose is to obtain the periodic pattern of activity at various time intervals. The simulation results show that the resident activities can be recognized by analyzing the triggered sensor patterns, and the impacts of minimum support values to the number of fuzzy-time-interval periodic patterns generated. Moreover, fuzzy-time-interval periodic patterns that are generated encourages to find daily or anomalies resident’s habits.


2020 ◽  
Author(s):  
Maren Kaluza ◽  
Luis Samaniego ◽  
Stephan Thober ◽  
Robert Schweppe ◽  
Rohini Kumar ◽  
...  

<p>Parameter estimation of a global-scale, high-resolution hydrological model requires a powerful supercomputer and an optimized parallelization<br>algorithm. Improving the efficiency of such an implementation is essential to advance hydrological science and to minimize the uncertainty of<br>the major hydrologic fluxes and storages at continental and global scales. Within the ESM project [1], the main transfer-function parameters of the mHM<br>model will be estimated by jointly assimilating evapotranspiration (ET) from FLUXNET, the TWS anomaly from GRACE (NASA) and streamflow time series<br>from 5500 GRDC gauges to achieve this goal.</p><p>For the parallelization of the objective functions, a hybrid MPI-OpenMP scheme is implemented. While the parallelization<br>into equally sized subdomains for cell-wise computations  of fluxes (e.g., ET, TWS) is trivial,<br>cell-to-cell fluxes need to be computed for streamflow routing. For time series<br>datasets, the advanced parallelization algorithm MPI parallelized Decomposition of Forest (MDF) will be used. </p><p>In this study, we go beyond the standard approach which decomposes the river into tributaries (e.g. the Pfaffenstetter System<br>[2]). We apply a non-trivial graph algorithm to decompose each river-network into a tree data structure with nodes representing<br>subbasin domains of almost equal size [3]. </p><p>We analyze several aspects affecting the MDF parallelization: <br>(1) the communication time between nodes; (2) buffering data before sending; (3) optimizing total node idle time and total run time; (4) memory<br>imbalance between master processes and other processes. </p><p>We run the mHM model on the high-performance JUWELS supercomputer at Jülich Supercomputing Center (JSC) where the (routing) code efficiently scales up to ~180 nodes with 96 CPUs each. We discuss different parallelization aspects, <br>including the effect of parameters onto the scaling of MDF and we show the benefits of MDF over a non-parallelized routing module.</p><p>[1] https://www.esm-project.net/<br>[2] http://proceedings.esri.com/library/userconf/proc01/professional/papers/pap1008/p1008.htm<br>[3] https://meetingorganizer.copernicus.org/EGU2019/EGU2019-8129-1.pdf</p>


Sign in / Sign up

Export Citation Format

Share Document