With the development of streaming data processing technology, real-time event monitoring and querying has become a hot issue in this field. In this article, an investigation based on coal mine disaster events is carried out, and a new anti-aliasing model for abnormal events is proposed, as well as a multistage identification method. Coal mine micro-seismic signal is of great importance in the investigation of vibration characteristic, attenuation law, and disaster assessment of coal mine disasters. However, as affected by factors like geological structure and energy losses, the micro-seismic signals of the same kind of disasters may produce data drift in the time domain transmission, such as weak or enhanced signals, which affects the accuracy of the identification of abnormal events (“the coal mine disaster events”). The current mine disaster event monitoring method is a lagged identification, which is based on monitoring a series of sensors with a 10-s-long data waveform as the monitoring unit. The identification method proposed in this article first takes advantages of the dynamic time warping algorithm, which is widely applied in the field of audio recognition, to build an anti-aliasing model and identifies whether the perceived data are disaster signal based on the similarity fitting between them and the template waveform of historical disaster data, and second, since the real-time monitoring data are continuous streaming data, it is necessary to identify the start point of the disaster waveform before the identification of the disaster signal. Therefore, this article proposes a strategy based on a variable sliding window to align two waveforms, locating the start point of perceptual disaster wave and template wave by gradually sliding the perceptual window, which can guarantee the accuracy of the matching. Finally, this article proposes a multistage identification mechanism based on the sliding window matching strategy and the characteristics of the waveforms of coal mine disasters, adjusting the early warning level according to the identification extent of the disaster signal, which increases the early warning level gradually with the successful result of the matching of 1/ N size of the template, and the piecewise aggregate approximation method is used to optimize the calculation process. Experimental results show that the method proposed in this article is more accurate and be used in real time.
Online mining of frequent closed itemsets over streaming data is one of the most important issues in mining data streams. In this paper, we proposed a novel sliding window based algorithm. The algorithm exploits lattice properties to limit the search to frequent close itemsets which share at least one item with the new transaction. Experiments results on synthetic datasets show that our proposed algorithm is both time and space efficient.
The data stream is a series of data generated at sequential time from different sources. Processing such data is very important in many contemporary applications such as sensor networks, RFID technology, mobile computing and many more. The huge amount data generated and frequent changes in a short time makes the conventional processing methods insufficient. The Sliding Window Model (SWM) was introduced by Datar et. al to handle this problem. Avoiding multiple scans of the whole data sets, optimizing memory usage, and processing only the most recent tuple are the main challenges. The number of possible world instances grows exponentially in uncertain data and it is highly difficult to comprehend what it takes to meet Top-k query processing in the shortest amount of time. Following the generation of rules and the probability theory of this model, a framework was anticipated to sustain top-k processing algorithm over the SWM approach until the candidates expired. Based on the literature review study, none of the existing work have been made to tackle the issue arises from the top-k query processing of the possible world instance of the uncertain data streams within the SWM. The major issue resulted from these scenarios need to be addressed especially in the computation redundancy area that contributed to the increases of computational cost within the SWM. Therefore, the main objective of this research work is to propose the top-k query processing methods over uncertain data streams in SWM utilizing the score and the Possible World (PW) setting. In this study, a novel expiration and object indexing method is introduced to address the computational redundancy issues. We believed the proposed method can reduce computational costs and by managing insertion and exit policy on the right tuple candidates within a specified window frame. This research work will contribute to the area of computational query processing.
Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing.