Attribute reduction for dynamic data sets

2013 ◽  
Vol 13 (1) ◽  
pp. 676-689 ◽  
Author(s):  
Feng Wang ◽  
Jiye Liang ◽  
Chuangyin Dang
Entropy ◽  
2019 ◽  
Vol 21 (2) ◽  
pp. 138 ◽  
Author(s):  
Lin Sun ◽  
Lanying Wang ◽  
Jiucheng Xu ◽  
Shiguang Zhang

For continuous numerical data sets, neighborhood rough sets-based attribute reduction is an important step for improving classification performance. However, most of the traditional reduction algorithms can only handle finite sets, and yield low accuracy and high cardinality. In this paper, a novel attribute reduction method using Lebesgue and entropy measures in neighborhood rough sets is proposed, which has the ability of dealing with continuous numerical data whilst maintaining the original classification information. First, Fisher score method is employed to eliminate irrelevant attributes to significantly reduce computation complexity for high-dimensional data sets. Then, Lebesgue measure is introduced into neighborhood rough sets to investigate uncertainty measure. In order to analyze the uncertainty and noisy of neighborhood decision systems well, based on Lebesgue and entropy measures, some neighborhood entropy-based uncertainty measures are presented, and by combining algebra view with information view in neighborhood rough sets, a neighborhood roughness joint entropy is developed in neighborhood decision systems. Moreover, some of their properties are derived and the relationships are established, which help to understand the essence of knowledge and the uncertainty of neighborhood decision systems. Finally, a heuristic attribute reduction algorithm is designed to improve the classification performance of large-scale complex data. The experimental results under an instance and several public data sets show that the proposed method is very effective for selecting the most relevant attributes with high classification accuracy.


2019 ◽  
Vol 9 (14) ◽  
pp. 2841 ◽  
Author(s):  
Nan Zhang ◽  
Xueyi Gao ◽  
Tianyou Yu

Attribute reduction is a challenging problem in rough set theory, which has been applied in many research fields, including knowledge representation, machine learning, and artificial intelligence. The main objective of attribute reduction is to obtain a minimal attribute subset that can retain the same classification or discernibility properties as the original information system. Recently, many attribute reduction algorithms, such as positive region preservation, generalized decision preservation, and distribution preservation, have been proposed. The existing attribute reduction algorithms for generalized decision preservation are mainly based on the discernibility matrix and are, thus, computationally very expensive and hard to use in large-scale and high-dimensional data sets. To overcome this problem, we introduce the similarity degree for generalized decision preservation. On this basis, the inner and outer significance measures are proposed. By using heuristic strategies, we develop two quick reduction algorithms for generalized decision preservation. Finally, theoretical and experimental results show that the proposed heuristic reduction algorithms are effective and efficient.


Author(s):  
Qing-Hua Zhang ◽  
Long-Yang Yao ◽  
Guan-Sheng Zhang ◽  
Yu-Ke Xin

In this paper, a new incremental knowledge acquisition method is proposed based on rough set theory, decision tree and granular computing. In order to effectively process dynamic data, describing the data by rough set theory, computing equivalence classes and calculating positive region with hash algorithm are analyzed respectively at first. Then, attribute reduction, value reduction and the extraction of rule set by hash algorithm are completed efficiently. Finally, for each new additional data, the incremental knowledge acquisition method is proposed and used to update the original rules. Both algorithm analysis and experiments show that for processing the dynamic information systems, compared with the traditional algorithms and the incremental knowledge acquisition algorithms based on granular computing, the time complexity of the proposed algorithm is lower due to the efficiency of hash algorithm and also this algorithm is more effective when it is used to deal with the huge data sets.


2018 ◽  
Vol 465 ◽  
pp. 202-218 ◽  
Author(s):  
Yunge Jing ◽  
Tianrui Li ◽  
Hamido Fujita ◽  
Baoli Wang ◽  
Ni Cheng

2014 ◽  
Vol 644-650 ◽  
pp. 2120-2123 ◽  
Author(s):  
De Zhi An ◽  
Guang Li Wu ◽  
Jun Lu

At present there are many data mining methods. This paper studies the application of rough set method in data mining, mainly on the application of attribute reduction algorithm based on rough set in the data mining rules extraction stage. Rough set in data mining is often used for reduction of knowledge, and thus for the rule extraction. Attribute reduction is one of the core research contents of rough set theory. In this paper, the traditional attribute reduction algorithm based on rough sets is studied and improved, and for large data sets of data mining, a new attribute reduction algorithm is proposed.


2017 ◽  
Vol 312 ◽  
pp. 66-86 ◽  
Author(s):  
Yanyan Yang ◽  
Degang Chen ◽  
Hui Wang ◽  
Eric C.C. Tsang ◽  
Deli Zhang

2016 ◽  
Vol 16 (4) ◽  
pp. 13-28 ◽  
Author(s):  
Cao Chinh Nghia ◽  
Demetrovics Janos ◽  
Nguyen Long Giang ◽  
Vu Duc Thi

Abstract According to traditional rough set theory approach, attribute reduction methods are performed on the decision tables with the discretized value domain, which are decision tables obtained by discretized data methods. In recent years, researches have proposed methods based on fuzzy rough set approach to solve the problem of attribute reduction in decision tables with numerical value domain. In this paper, we proposeafuzzy distance between two partitions and an attribute reduction method in numerical decision tables based on proposed fuzzy distance. Experiments on data sets show that the classification accuracy of proposed method is more efficient than the ones based fuzzy entropy.


2017 ◽  
Author(s):  
Harun Mustafa ◽  
André Kahles ◽  
Mikhail Karasikov ◽  
Gunnar Rätsch

AbstractMuch of the DNA and RNA sequencing data available is in the form of high-throughput sequencing (HTS) reads and is currently unindexed by established sequence search databases. Recent succinct data structures for indexing both reference sequences and HTS data, along with associated metadata, have been based on either hashing or graph models, but many of these structures are static in nature, and thus, not well-suited as backends for dynamic databases.We propose a parallel construction method for and novel application of the wavelet trie as a dynamic data structure for compressing and indexing graph metadata. By developing an algorithm for merging wavelet tries, we are able to construct large tries in parallel by merging smaller tries constructed concurrently from batches of data.When compared against general compression algorithms and those developed specifically for graph colors (VARI and Rainbowfish), our method achieves compression ratios superior to gzip and VARI, converging to compression ratios of 6.5% to 2% on data sets constructed from over 600 virus genomes.While marginally worse than compression by bzip2 or Rainbowfish, this structure allows for both fast extension and query. We also found that additionally encoding graph topology metadata improved compression ratios, particularly on data sets consisting of several mutually-exclusive reference genomes.It was also observed that the compression ratio of wavelet tries grew sublinearly with the density of the annotation matrices.This work is a significant step towards implementing a dynamic data structure for indexing large annotated sequence data sets that supports fast query and update operations. At the time of writing, no established standard tool has filled this niche.


2019 ◽  
Vol 18 (3) ◽  
pp. 305-326
Author(s):  
Vanessa Chang

Created with digital motion capture, or mocap, the virtual dances Ghostcatching and as.phyx.ia render movement abstracted from choreographic bodies. These depictions of gestural doubles or ‘ghosts’ trigger a sense of the uncanny rooted in mocap’s digital processes. Examining these material processes, this article argues that this digital optical uncanny precipitates from the intersubjective relationship of performer, technology, and spectator. Mocap interpolates living bodies into a technologized visual field that parses these bodies as dynamic data sets, a process by which performing bodies and digital capture technologies coalesce into the film’s virtual body. This virtual body signals a computational agency at its heart, one that choreographs the intersubjective embodiments of real and virtual dancers, and spectators. Destabilizing the human body as a locus of perception, movement, and sensation, mocap triggers uncanny uncertainty in human volition. In this way, Ghostcatching and as.phyx.ia reflect the infiltration of computer vision technologies, such as facial recognition, into numerous aspects of contemporary life. Through these works, the author hopes to show how the digital gaze of these algorithms, imperceptible to the human eye, threatens individual autonomy with automation.


Sign in / Sign up

Export Citation Format

Share Document