An efficient perturbation approach for multivariate data in sensitive and reliable data mining

2021 ◽  
Vol 62 ◽  
pp. 102954
Author(s):  
Mahit Kumar Paul ◽  
Md. Rabiul Islam ◽  
A.H.M. Sarowar Sattar

The improvement of an information processing and Memory capacity, the vast amount of data is collected for various data analyses purposes. Data mining techniques are used to get knowledgeable information. The process of extraction of data by using data mining techniques the data get discovered publically and this leads to breaches of specific privacy data. Privacypreserving data mining is used to provide to protection of sensitive information from unwanted or unsanctioned disclosure. In this paper, we analysis the problem of discovering similarity checks for functional dependencies from a given dataset such that application of algorithm (l, d) inference with generalization can anonymised the micro data without loss in utility. [8] This work has presented Functional dependency based perturbation approach which hides sensitive information from the user, by applying (l, d) inference model on the dependency attributes based on Information Gain. This approach works on both categorical and numerical attributes. The perturbed data set does not affects the original dataset it maintains the same or very comparable patterns as the original data set. Hence the utility of the application is always high, when compared to other data mining techniques. The accuracy of the original and perturbed datasets is compared and analysed using tools, data mining classification algorithm.


Author(s):  
Dries Verlet ◽  
Carl Devos

Although policy evaluation has always been important, today there is a rising attention for policy evaluation in the public sector. In order to provide a solid base for the so-called evidence-based policy, valid en reliable data are needed to depict the performance of organisations within the public sector. Without a solid empirical base, one needs to be very careful with data mining in the public sector. When measuring performance, several unintended and negative effects can occur. In this chapter, the authors focus on a few common pitfalls that occur when measuring performance in the public sector. They also discuss possible strategies to prevent them by setting up and adjusting the right measurement systems for performance in the public sector. Data mining is about knowledge discovery. The question is: what do we want to know? What are the consequences of asking that question?


2015 ◽  
Vol 18 (1) ◽  
pp. 96-114 ◽  
Author(s):  
S. R. Mounce ◽  
E. J. M. Blokker ◽  
S. P. Husband ◽  
W. R. Furnass ◽  
P. G. Schaap ◽  
...  

Particulate material accumulates over time as cohesive layers on internal pipeline surfaces in water distribution systems (WDS). When mobilised, this material can cause discolouration. This paper explores factors expected to be involved in this accumulation process. Two complementary machine learning methodologies are applied to significant amounts of real world field data from both a qualitative and a quantitative perspective. First, Kohonen self-organising maps were used for integrative and interpretative multivariate data mining of potential factors affecting accumulation. Second, evolutionary polynomial regression (EPR), a hybrid data-driven technique, was applied that combines genetic algorithms with numerical regression for developing easily interpretable mathematical model expressions. EPR was used to explore producing novel simple expressions to highlight important accumulation factors. Three case studies are presented: UK national and two Dutch local studies. The results highlight bulk water iron concentration, pipe material and looped network areas as key descriptive parameters for the UK study. At the local level, a significantly increased third data set allowed K-fold cross validation. The mean cross validation coefficient of determination was 0.945 for training data and 0.930 for testing data for an equation utilising amount of material mobilised and soil temperature for estimating daily regeneration rate. The approach shows promise for developing transferable expressions usable for pro-active WDS management.


Author(s):  
Rosaria Lombardo

By the early 1990s, the term “data mining” had come to mean the process of finding information in large data sets. In the framework of the Total Quality Management, earlier studies have suggested that enterprises could harness the predictive power of Learning Management System (LMS) data to develop reporting tools that identify at-risk customers/consumers and allow for more timely interventions (Macfadyen & Dawson, 2009). The Learning Management System data and the subsequent Customer Interaction System data can help to provide “early warning system data” for risk detection in enterprises. This chapter confirms and extends this proposition by providing data from an international research project investigating on customer satisfaction in services to persons of public utility, like education, training services and health care services, by means of explorative multivariate data analysis tools as Ordered Multiple Correspondence Analysis, Boosting regression, Partial Least Squares regression and its generalizations.


Author(s):  
Gary Smith ◽  
Jay Cordes

The traditional statistical analysis of data follows what has come to be known as the scientific method: collecting reliable data to test plausible theories. Data mining goes in the other direction, analyzing data without being motivated or encumbered by theories. The fundamental problem with data mining is simple: We think that data patterns are unusual and therefore meaningful. Patterns are, in fact, inevitable and therefore meaningless. This is why data mining is not usually knowledge discovery, but noise discovery. Finding correlations is easy. Good data scientists are not seduced by discovered patterns because they don’t put data before theory. They do not commit Texas Sharpshooter Fallacies or fall into the Feynman Trap.


2021 ◽  
Vol 9 (2) ◽  
pp. 131-135
Author(s):  
G. Srinivas Reddy, Et. al.

As the usage of internet and web applications emerges faster, security and privacy of the data is the most challenging issue which we are facing, leading to the possibility of being easily damaged. Various conventional techniques are used for privacy preservation like condensation, randomization and tree structure etc., the limitations of the existing approaches are, they are not able to maintain proper balance between the data utility and privacy and it may have the problem with privacy violations. This paper presents an Additive Rotation Perturbation approach for Privacy Preserving Data Mining (PPDM). In this proposed work, various dataset from UCI Machine Learning Repository was collected and it is protected with a New Additive Rotational Perturbation Technique under Privacy Preserving Data Mining. Experimental result shows that the proposed algorithm’s strength is high for all the datasets and it is estimated using the DoV (Difference of Variance) method.


2015 ◽  
Vol 12 (12) ◽  
pp. 5463-5466
Author(s):  
G Manikandan ◽  
N Sairam ◽  
P Rajendiran ◽  
R Balakrishnan ◽  
N. Rajesh Kumar ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document