Data Security and Chase

Author(s):  
Zbigniew W. Ras ◽  
Seunghyun Im

This article describes requirements and approaches necessary for ensuring data confidentiality in knowledge discovery systems. Data mining systems should provide knowledge extracted from their data which can be used to identify underlying trends and patterns, but the knowledge should not be used to compromise data confidentiality. Confidentiality for sensitive data is achieved, in general, by hiding them from unauthorized users in conventional database systems (e.g., data encryption and/or access control methods can be considered as data hiding). However, it is not sufficient to hide the confidential data in knowledge discovery systems (KDSs) due to Chase (Dardzinska & Ras, 2003a, 2003c). Chase is a missing value prediction tool enhanced by data mining technologies. For example, if an attribute is incomplete in an information system, we can use Chase to approximate the missing values to make the attribute more complete. It is also used to answer user queries containing non-local attributes (Ras & Joshi, 1997). If attributes in queries are locally unknown, we search for their definitions from KDSs and use the results to replace the non-local part of the query.

Author(s):  
Seunghyun Im ◽  
Zbigniew W. Ras

This article discusses data security in Knowledge Discovery Systems (KDS). In particular, we presents the problem of confidential data reconstruction by Chase (Dardzinska and Ras, 2003c) in KDS, and discuss protection methods. In conventional database systems, data confidentiality is achieved by hiding sensitive data from unauthorized users (e.g. Data encryption or Access Control). However, hiding is not sufficient in KDS due to Chase. Chase is a generalized null value imputation algorithm that is designed to predict null or missing values, and has many application areas. For example, we can use Chase in a medical decision support system to handle difficult medical situations (e.g. dangerous invasive medical test for the patients who cannot take it). The results derived from the decision support system can help doctors diagnose and treat patients. The data approximated by Chase is particularly reliable because they reflect the actual characteristics of the data set in the information system. Chase, however, can create data security problems if an information system contains confidential data (Im and Ras, 2005) (Im, 2006). Suppose that an attribute in an information system S contains medical information about patients; some portions of the data are not confidential while others have to be confidential. In this case, part or all of the confidential data in the attribute can be revealed by Chase using knowledge extracted at S. In other words, self-generated rules extracted from non-confidential portions of data can be used to find secret data. Knowledge is often extracted from remote sites in a Distributed Knowledge Discovery System (DKDS) (Ras, 1994). The key concept of DKDS is to generate global knowledge through knowledge sharing. Each site in DKDS develops knowledge independently, and they are used jointly to produce global knowledge without complex data integrations. Assume that two sites S1 and S2 in a DKDS accept the same ontology of their attributes, and they share their knowledge in order to obtain global knowledge, and an attribute of a site S1 in a DKDS is confidential. The confidential data in S1 can be hidden by replacing them with null values. However, users at S1 may treat them as missing data and reconstruct them with Chase using the knowledge extracted from S2. A distributed medical information system is an example that an attribute is confidential for one information system while the same attribute may not be considered as secret information in another site. These examples show that hiding confidential data from an information system does not guarantee data confidentiality due to Chase, and methods that would protect against these problems are essential to build a security-aware KDS.


2013 ◽  
Vol 694-697 ◽  
pp. 2317-2321
Author(s):  
Hui Wang

The goal of knowledge discovery is to extract hidden or useful unknown knowledge from databases, while the objective of knowledge hiding is to prevent certain confidential data or knowledge from being extracted through data mining techniques. Hiding sensitive association rules is focused. The side-effects of the existing data mining technology are investigated. The problem of sensitive association rule hiding is described formally. The representative sanitizing strategies for sensitive association rule hiding are discussed.


2008 ◽  
pp. 3694-3699
Author(s):  
William Perrizo ◽  
Qiang Ding ◽  
Masum Serazi ◽  
Taufik Abidin ◽  
Baoying Wang

For several decades and especially with the preeminence of relational database systems, data is almost always formed into horizontal record structures and then processed vertically (vertical scans of files of horizontal records). This makes good sense when the requested result is a set of horizontal records. In knowledge discovery and data mining, however, researchers are typically interested in collective properties or predictions that can be expressed very briefly. Therefore, the approaches for scan-based processing of horizontal records are known to be inadequate for data mining in very large data repositories (Han & Kamber, 2001; Han, Pei, & Yin, 2000; Shafer, Agrawal, & Mehta, 1996).


Author(s):  
Wenhao Shu ◽  
Wenbin Qian ◽  
Yonghong Xie ◽  
Zhaoping Tang

Attribute reduction plays an important role in knowledge discovery and data mining. Confronted with data characterized by the interval and missing values in many data analysis tasks, it is interesting to research the attribute reduction for interval-valued data with missing values. Uncertainty measures can supply efficient viewpoints, which help us to disclose the substantive characteristics of such data. Therefore, this paper addresses the attribute reduction problem based on uncertainty measure for interval-valued data with missing values. At first, an uncertainty measure is provided for measuring candidate attributes, and then an efficient attribute reduction algorithm is developed for the interval-valued data with missing values. To improve the efficiency of attribute reduction, the objects that fall within the positive region are deleted from the whole object set in the process of selecting attributes. Finally, experimental results demonstrate that the proposed algorithm can find a subset of attributes in much shorter time than existing attribute reduction algorithms without losing the classification performance.


Latest development in wireless technology has found many users and wide applications. As the applicants and users are more, Security of data is a main concern. Wireless networks are very common for both organizations and individuals. The transmission of confidential data like e-mails, banking transactions, credit card details etc. on common transmission media is unsecured. To protect the data during transmission is essential for successful operation of system, which mostly rely on this data. In this paper, proposed an enhanced method for data encryption and decryption which guarantees data confidentiality during its transmission over network. User’s data is encrypted before transmission by assigning less number of bits to the plain ASCII text. The key used will consist of all plain ASCII text in random fashion and will be treated as 2-dimension array. In this way, data is transmitting in a secure and efficient manner accomplishing the main goal of Cryptography. 2-dimension array result is compared with Advanced Encryption Standard (AES) algorithm. The use of 2-dimension array will provide security and saves effort of the data to be encrypted


Author(s):  
QingXiang Wu ◽  
Martin McGinnity ◽  
Girijesh Prasad ◽  
David Bell

Data mining and knowledge discovery aim at finding useful information from typically massive collections of data, and then extracting useful knowledge from the information. To date a large number of approaches have been proposed to find useful information and discover useful knowledge; for example, decision trees, Bayesian belief networks, evidence theory, rough set theory, fuzzy set theory, kNN (k-nearest-neighborhood) classifier, neural networks, and support vector machines. However, these approaches are based on a specific data type. In the real world, an intelligent system often encounters mixed data types, incomplete information (missing values), and imprecise information (fuzzy conditions). In the UCI (University of California – Irvine) Machine Learning Repository, it can be seen that there are many real world data sets with missing values and mixed data types. It is a challenge to enable machine learning or data mining approaches to deal with mixed data types (Ching, 1995; Coppock, 2003) because there are difficulties in finding a measure of similarity between objects with mixed data type attributes. The problem with mixed data types is a long-standing issue faced in data mining. The emerging techniques targeted at this issue can be classified into three classes as follows: (1) Symbolic data mining approaches plus different discretizers (e.g., Dougherty et al., 1995; Wu, 1996; Kurgan et al., 2004; Diday, 2004; Darmont et al., 2006; Wu et al., 2007) for transformation from continuous data to symbolic data; (2) Numerical data mining approaches plus transformation from symbolic data to numerical data (e.g.,, Kasabov, 2003; Darmont et al., 2006; Hadzic et al., 2007); (3) Hybrid of symbolic data mining approaches and numerical data mining approaches (e.g.,, Tung, 2002; Kasabov, 2003; Leng et al., 2005; Wu et al., 2006). Since hybrid approaches have the potential to exploit the advantages from both symbolic data mining and numerical data mining approaches, this chapter, after discassing the merits and shortcomings of current approaches, focuses on applying Self-Organizing Computing Network Model to construct a hybrid system to solve the problems of knowledge discovery from databases with a diversity of data types. Future trends for data mining on mixed type data are then discussed. Finally a conclusion is presented.


Author(s):  
William Perrizo ◽  
Qiang Ding ◽  
Masum Serazi ◽  
Taufik Abidin ◽  
Baoying Wang

For several decades and especially with the preeminence of relational database systems, data is almost always formed into horizontal record structures and then processed vertically (vertical scans of files of horizontal records). This makes good sense when the requested result is a set of horizontal records. In knowledge discovery and data mining, however, researchers are typically interested in collective properties or predictions that can be expressed very briefly. Therefore, the approaches for scan-based processing of horizontal records are known to be inadequate for data mining in very large data repositories (Han & Kamber, 2001; Han, Pei, & Yin, 2000; Shafer, Agrawal, & Mehta, 1996).


Author(s):  
Shahab Wahhab Kareem ◽  
Raghad Zuhair Yousif ◽  
Shadan Mohammed Jihad Abdalwahid

<p class="Abstract">The amount of data processed and stored in the cloud is growing dramatically. The traditional storage devices at both hardware and software levels cannot meet the requirement of the cloud. This fact motivates the need for a platform which can handle this problem. Hadoop is a deployed platform proposed to overcome this big data problem which often uses MapReduce architecture to process vast amounts of data of the cloud system. Hadoop has no strategy to assure the safety and confidentiality of the files saved inside the Hadoop distributed File system(HDFS). In the cloud, the protection of sensitive data is a critical issue in which data encryption schemes plays avital rule. This research proposes a hybrid system between two well-known asymmetric key cryptosystems (RSA, and Paillier) to encrypt the files stored in HDFS. Thus before saving data in HDFS, the proposed cryptosystem is utilized for encrypting the data. Each user of the cloud might upload files in two ways, non-safe or secure. The hybrid system shows higher computational complexity and less latency in comparison to the RSA cryptosystem alone.</p>


2017 ◽  
Vol 7 (1.5) ◽  
pp. 221
Author(s):  
Pankaj Singh ◽  
Sachin Kumar

Cryptography is about protecting the data from third parties or from public to read confidential data. Cryptography mainly focuses on encrypting the data or we can say converting the data and decrypting the actual data or we can say reconverting the data by different methods. These encryption and decryption methods are based on mathematical theories and are implemented by computer science practices. But as cryptography progressed ways were found to decode the secured data and view actual data. This was also done by the use of mathematical theories and computer science practices. Popular algorithms which are used in today’s world are, AES (Advance Encryption Standard), Blowfish, DES (Data Encryption Standard), T-DES (Triple Data Encryption Standard), etc. Some of the previously known algorithms were RSA (Rivest–Shamir–Adleman), ECC (Elliptic curve cryptography), etc. These algorithms have their own advantages and drawbacks. But as people were progressing more in breaking them down, these algorithms were supported by digital signatures or hash done by different algorithms like MD5, SHA, etc. By these means data integrity, data confidentiality, and authentication of data are maintained. But as the things are progressing it seems that new advancements are always needed in the field of cryptography to keep the data secure.


Sign in / Sign up

Export Citation Format

Share Document