Privacy Preserving Data Mining Using Time Series Data Aggregation

Author(s):  
Sivaranjani Reddi

This article proposes a mechanism to provide privacy to mined results by assuming that the data is distributed across many nodes. The first objective includes mining the query results by the node in a cluster, communicating it to the cluster head, aggregating the data collected from all the cluster nodes and then communicating it to the group controller. The second objective is to incorporate privacy at each level of the clusters node: cluster head and the group controller level. The final objective is to provide a dynamic network feature, where the nodes can join or leave the distributed network without disturbing the network functionality. The proposed algorithm was implemented and validated in Java for its performance in terms of communication costs computational complexity.

Author(s):  
Sivaranjani Reddi

This article proposes a mechanism to provide privacy to mined results by assuming that the data is distributed across many nodes. The first objective includes mining the query results by the node in a cluster, communicating it to the cluster head, aggregating the data collected from all the cluster nodes and then communicating it to the group controller. The second objective is to incorporate privacy at each level of the clusters node: cluster head and the group controller level. The final objective is to provide a dynamic network feature, where the nodes can join or leave the distributed network without disturbing the network functionality. The proposed algorithm was implemented and validated in Java for its performance in terms of communication costs computational complexity.


Author(s):  
Shadi Aljawarneh ◽  
Aurea Anguera ◽  
John William Atwood ◽  
Juan A. Lara ◽  
David Lizcano

AbstractNowadays, large amounts of data are generated in the medical domain. Various physiological signals generated from different organs can be recorded to extract interesting information about patients’ health. The analysis of physiological signals is a hard task that requires the use of specific approaches such as the Knowledge Discovery in Databases process. The application of such process in the domain of medicine has a series of implications and difficulties, especially regarding the application of data mining techniques to data, mainly time series, gathered from medical examinations of patients. The goal of this paper is to describe the lessons learned and the experience gathered by the authors applying data mining techniques to real medical patient data including time series. In this research, we carried out an exhaustive case study working on data from two medical fields: stabilometry (15 professional basketball players, 18 elite ice skaters) and electroencephalography (100 healthy patients, 100 epileptic patients). We applied a previously proposed knowledge discovery framework for classification purpose obtaining good results in terms of classification accuracy (greater than 99% in both fields). The good results obtained in our research are the groundwork for the lessons learned and recommendations made in this position paper that intends to be a guide for experts who have to face similar medical data mining projects.


Author(s):  
Malcolm J. Beynonm

The seminal work of Zadeh (1965), namely fuzzy set theory (FST), has developed into a methodology fundamental to analysis that incorporates vagueness and ambiguity. With respect to the area of data mining, it endeavours to find potentially meaningful patterns from data (Hu & Tzeng, 2003). This includes the construction of if-then decision rule systems, which attempt a level of inherent interpretability to the antecedents and consequents identified for object classification (See Breiman, 2001). Within a fuzzy environment this is extended to allow a linguistic facet to the possible interpretation, examples including mining time series data (Chiang, Chow, & Wang, 2000) and multi-objective optimisation (Ishibuchi & Yamamoto, 2004). One approach to if-then rule construction has been through the use of decision trees (Quinlan, 1986), where the path down a branch of a decision tree (through a series of nodes), is associated with a single if-then rule. A key characteristic of the traditional decision tree analysis is that the antecedents described in the nodes are crisp, where this restriction is mitigated when operating in a fuzzy environment (Crockett, Bandar, Mclean, & O’Shea, 2006). This chapter investigates the use of fuzzy decision trees as an effective tool for data mining. Pertinent to data mining and decision making, Mitra, Konwar and Pal (2002) succinctly describe a most important feature of decision trees, crisp and fuzzy, which is their capability to break down a complex decision-making process into a collection of simpler decisions and thereby, providing an easily interpretable solution.


Author(s):  
Anne Denton

Time series data is of interest to most science and engineering disciplines and analysis techniques have been developed for hundreds of years. There have, however, in recent years been new developments in data mining techniques, such as frequent pattern mining, that take a different perspective of data. Traditional techniques were not meant for such pattern-oriented approaches. There is, as a result, a significant need for research that extends traditional time-series analysis, in particular clustering, to the requirements of the new data mining algorithms.


Axioms ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 49
Author(s):  
Anton Romanov ◽  
Valeria Voronina ◽  
Gleb Guskov ◽  
Irina Moshkina ◽  
Nadezhda Yarushkina

The development of the economy and the transition to industry 4.0 creates new challenges for artificial intelligence methods. Such challenges include the processing of large volumes of data, the analysis of various dynamic indicators, the discovery of complex dependencies in the accumulated data, and the forecasting of the state of processes. The main point of this study is the development of a set of analytical and prognostic methods. The methods described in this article based on fuzzy logic, statistic, and time series data mining, because data extracted from dynamic systems are initially incomplete and have a high degree of uncertainty. The ultimate goal of the study is to improve the quality of data analysis in industrial and economic systems. The advantages of the proposed methods are flexibility and orientation to the high interpretability of dynamic data. The high level of the interpretability and interoperability of dynamic data is achieved due to a combination of time series data mining and knowledge base engineering methods. The merging of a set of rules extracted from the time series and knowledge base rules allow for making a forecast in case of insufficiency of the length and nature of the time series. The proposed methods are also based on the summarization of the results of processes modeling for diagnosing technical systems, forecasting of the economic condition of enterprises, and approaches to the technological preparation of production in a multi-productive production program with the application of type 2 fuzzy sets for time series modeling. Intelligent systems based on the proposed methods demonstrate an increase in the quality and stability of their functioning. This article contains a set of experiments to approve this statement.


Author(s):  
T. Warren Liao

In this chapter, we present genetic algorithm (GA) based methods developed for clustering univariate time series with equal or unequal length as an exploratory step of data mining. These methods basically implement the k-medoids algorithm. Each chromosome encodes in binary the data objects serving as the k-medoids. To compare their performance, both fixed-parameter and adaptive GAs were used. We first employed the synthetic control chart data set to investigate the performance of three fitness functions, two distance measures, and other GA parameters such as population size, crossover rate, and mutation rate. Two more sets of time series with or without known number of clusters were also experimented: one is the cylinder-bell-funnel data and the other is the novel battle simulation data. The clustering results are presented and discussed.


Sign in / Sign up

Export Citation Format

Share Document