Pattern classification using grey tolerance rough sets

Kybernetes ◽  
2016 ◽  
Vol 45 (2) ◽  
pp. 266-281 ◽  
Author(s):  
Yi-Chung Hu

Purpose – The purpose of this paper is to propose that the grey tolerance rough set (GTRS) and construct the GTRS-based classifiers. Design/methodology/approach – The authors use grey relational analysis to implement a relationship-based similarity measure for tolerance rough sets. Findings – The proposed classification method has been tested on several real-world data sets. Its classification performance is comparable to that of other rough-set-based methods. Originality/value – The authors design a variant of a similarity measure which can be used to estimate the relationship between any two patterns, such that the closer the relationship, the greater the similarity will be.

2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Peng Jiang ◽  
Wenbao Wang ◽  
Yi-Chung Hu ◽  
Yu-Jing Chiu ◽  
Shu-Ju Tsao

PurposeIt is challenging to derive an appropriate tolerance relation for tolerance rough set-based classifiers (TRSCs). The traditional tolerance rough set employs a simple distance function to determine the tolerance relation. However, such a simple function does not take into account criterion weights and the interaction among criteria. Further, the traditional tolerance relation ignores interdependencies concerning direct and indirect influences among patterns. This study aimed to incorporate interaction and interdependencies into the tolerance relation to develop non-additive grey TRSCs (NG-TRSCs).Design/methodology/approachFor pattern classification, this study applied non-additive grey relational analysis (GRA) and the decision-making trial and evaluation laboratory (DEMATEL) technique to solve problems arising from interaction and interdependencies, respectively.FindingsThe classification accuracy rates derived from the proposed NG-TRSC were compared to those of other TRSCs with distinctive features. The results showed that the proposed classifier was superior to the other TRSCs considered.Practical implicationsIn addition to pattern classification, the proposed non-additive grey DEMATEL can further benefit the applications for managerial decision-making because it simplifies the operations for decision-makers and enhances the applicability of DEMATEL.Originality/valueThis paper contributes to the field by proposing the non-additive grey tolerance rough set (NG-TRS) for pattern classification. The proposed NG-TRSC can be constructed by integrating the non-additive GRA with DEMATEL by using a genetic algorithm to determine the relevant parameters.


Entropy ◽  
2019 ◽  
Vol 21 (2) ◽  
pp. 155 ◽  
Author(s):  
Lin Sun ◽  
Xiaoyu Zhang ◽  
Jiucheng Xu ◽  
Shiguang Zhang

Attribute reduction as an important preprocessing step for data mining, and has become a hot research topic in rough set theory. Neighborhood rough set theory can overcome the shortcoming that classical rough set theory may lose some useful information in the process of discretization for continuous-valued data sets. In this paper, to improve the classification performance of complex data, a novel attribute reduction method using neighborhood entropy measures, combining algebra view with information view, in neighborhood rough sets is proposed, which has the ability of dealing with continuous data whilst maintaining the classification information of original attributes. First, to efficiently analyze the uncertainty of knowledge in neighborhood rough sets, by combining neighborhood approximate precision with neighborhood entropy, a new average neighborhood entropy, based on the strong complementarity between the algebra definition of attribute significance and the definition of information view, is presented. Then, a concept of decision neighborhood entropy is investigated for handling the uncertainty and noisiness of neighborhood decision systems, which integrates the credibility degree with the coverage degree of neighborhood decision systems to fully reflect the decision ability of attributes. Moreover, some of their properties are derived and the relationships among these measures are established, which helps to understand the essence of knowledge content and the uncertainty of neighborhood decision systems. Finally, a heuristic attribute reduction algorithm is proposed to improve the classification performance of complex data sets. The experimental results under an instance and several public data sets demonstrate that the proposed method is very effective for selecting the most relevant attributes with great classification performance.


2018 ◽  
Vol 8 (7) ◽  
pp. 1173 ◽  
Author(s):  
Yi-Chung Hu ◽  
Yu-Jing Chiu

Tolerance-rough-set-based classifiers (TRSCs) are known to operate effectively on real-valued attributes for classification problems. This involves creating a tolerance relation that is defined by a distance function to estimate proximity between any pair of patterns. To improve the classification performance of the TRSC, distance may not be an appropriate means of estimating similarity. As certain relations hold among the patterns, it is interesting to consider similarity from the perspective of these relations. Thus, this study uses grey relational analysis to identify direct influences by generating a total influence matrix to verify the interdependence among patterns. In particular, to maintain the balance between a direct and a total influence matrix, an aggregated influence matrix is proposed to form the basis for the proposed grey-total-influence-based tolerance rough set (GTI-TRS) for pattern classification. A real-valued genetic algorithm is designed to generate the grey tolerance class of a pattern to yield high classification accuracy. The results of experiments showed that the classification accuracy obtained by the proposed method was comparable to those obtained by other rough-set-based methods.


2016 ◽  
Vol 12 (2) ◽  
pp. 126-149 ◽  
Author(s):  
Masoud Mansoury ◽  
Mehdi Shajari

Purpose This paper aims to improve the recommendations performance for cold-start users and controversial items. Collaborative filtering (CF) generates recommendations on the basis of similarity between users. It uses the opinions of similar users to generate the recommendation for an active user. As a similarity model or a neighbor selection function is the key element for effectiveness of CF, many variations of CF are proposed. However, these methods are not very effective, especially for users who provide few ratings (i.e. cold-start users). Design/methodology/approach A new user similarity model is proposed that focuses on improving recommendations performance for cold-start users and controversial items. To show the validity of the authors’ similarity model, they conducted some experiments and showed the effectiveness of this model in calculating similarity values between users even when only few ratings are available. In addition, the authors applied their user similarity model to a recommender system and analyzed its results. Findings Experiments on two real-world data sets are implemented and compared with some other CF techniques. The results show that the authors’ approach outperforms previous CF techniques in coverage metric while preserves accuracy for cold-start users and controversial items. Originality/value In the proposed approach, the conditions in which CF is unable to generate accurate recommendations are addressed. These conditions affect CF performance adversely, especially in the cold-start users’ condition. The authors show that their similarity model overcomes CF weaknesses effectively and improve its performance even in the cold users’ condition.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Sifeng Liu

PurposeThe purpose of this paper is to construct some negative grey relational analysis models to measure the relationship between reverse sequences.Design/methodology/approachThe definition of reverse sequence has been given at first based on analysis of relative position and change trend of sequences. Then, several different negative grey relational analysis models, such as the negative grey similarity relational analysis model, the negative grey absolute relational analysis model, the negative grey relative relational analysis model, the negative grey comprehensive relational analysis model and the negative Deng’s grey relational analysis model have been put forward based on the corresponding common grey relational analysis models. The properties of the new models have been studied.FindingsThe negative grey relational analysis models proposed in this paper can solve the problem of relationship measurement of reverse sequences effectively. All the new negative grey relational degree satisfying the requirements of normalization and reversibility.Practical implicationsThe proposed negative grey relational analysis models can be used to measure the relationship between reverse sequences. As a living example, the reverse incentive effect of winning Fields Medal on the research output of winners is measured based on the research output data of the medalists and the contenders using the proposed negative grey relational analysis model.Originality/valueThe definition of reverse sequence and the negative grey similarity relational analysis model, the negative grey absolute relational analysis model, the negative grey relative relational analysis model, the negative grey comprehensive relational analysis model and the negative Deng’s grey relational analysis model are first proposed in this paper.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Wenguang Yang ◽  
Lianhai Lin ◽  
Hongkui Gao

PurposeTo solve the problem of simulation evaluation with small samples, a fresh approach of grey estimation is presented based on classical statistical theory and grey system theory. The purpose of this paper is to make full use of the difference of data distribution and avoid the marginal data being ignored.Design/methodology/approachBased upon the grey distribution characteristics of small sample data, the definition about a new concept of grey relational similarity measure comes into being. At the same time, the concept of sample weight is proposed according to the grey relational similarity measure. Based on the new definition of grey weight, the grey point estimation and grey confidence interval are studied. Then the improved Bootstrap resampling is designed by uniform distribution and randomness as an important supplement of the grey estimation. In addition, the accuracy of grey bilateral and unilateral confidence intervals is introduced by using the new grey relational similarity measure approach.FindingsThe new small sample evaluation method can realize the effective expansion and enrichment of data and avoid the excessive concentration of data. This method is an organic fusion of grey estimation and improved Bootstrap method. Several examples are used to demonstrate the feasibility and validity of the proposed methods to illustrate the credibility of some simulation data, which has no need to know the probability distribution of small samples.Originality/valueThis research has completed the combination of grey estimation and improved Bootstrap, which makes more reasonable use of the value of different data than the unimproved method.


Kybernetes ◽  
2019 ◽  
Vol 48 (9) ◽  
pp. 2006-2029
Author(s):  
Hongshan Xiao ◽  
Yu Wang

Purpose Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance. Design/methodology/approach A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification. Findings The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets. Research limitations/implications Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue. Practical implications Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems. Originality/value A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.


2019 ◽  
Vol 38 (1) ◽  
pp. 155-169
Author(s):  
Chihli Hung ◽  
You-Xin Cao

Purpose This paper aims to propose a novel approach which integrates collocations and domain concepts for Chinese cosmetic word of mouth (WOM) sentiment classification. Most sentiment analysis works by collecting sentiment scores from each unigram or bigram. However, not every unigram or bigram in a WOM document contains sentiments. Chinese collocations consist of the main sentiments of WOM. This paper reduces the complexity of the document dimensionality and makes an improvement for sentiment classification. Design/methodology/approach This paper builds two contextual lexicons for feature words and sentiment words, respectively. Based on these contextual lexicons, this paper uses the techniques of associated rules and mutual information to build possible Chinese collocation sets. This paper applies preference vector modelling as the vector representation approach to catch the relationship between Chinese collocations and their associated concepts. Findings This paper compares the proposed preference vector models with benchmarks, using three classification techniques (i.e. support vector machine, J48 decision tree and multilayer perceptron). According to the experimental results, the proposed models outperform all benchmarks evaluated by the criterion of accuracy. Originality/value This paper focuses on Chinese collocations and proposes a novel research approach for sentiment classification. The Chinese collocations used in this paper are adaptable to the content and domains. Finally, this paper integrates collocations with the preference vector modelling approach, which not only achieves a better sentiment classification performance for Chinese WOM documents but also avoids the curse of dimensionality.


Entropy ◽  
2019 ◽  
Vol 21 (2) ◽  
pp. 138 ◽  
Author(s):  
Lin Sun ◽  
Lanying Wang ◽  
Jiucheng Xu ◽  
Shiguang Zhang

For continuous numerical data sets, neighborhood rough sets-based attribute reduction is an important step for improving classification performance. However, most of the traditional reduction algorithms can only handle finite sets, and yield low accuracy and high cardinality. In this paper, a novel attribute reduction method using Lebesgue and entropy measures in neighborhood rough sets is proposed, which has the ability of dealing with continuous numerical data whilst maintaining the original classification information. First, Fisher score method is employed to eliminate irrelevant attributes to significantly reduce computation complexity for high-dimensional data sets. Then, Lebesgue measure is introduced into neighborhood rough sets to investigate uncertainty measure. In order to analyze the uncertainty and noisy of neighborhood decision systems well, based on Lebesgue and entropy measures, some neighborhood entropy-based uncertainty measures are presented, and by combining algebra view with information view in neighborhood rough sets, a neighborhood roughness joint entropy is developed in neighborhood decision systems. Moreover, some of their properties are derived and the relationships are established, which help to understand the essence of knowledge and the uncertainty of neighborhood decision systems. Finally, a heuristic attribute reduction algorithm is designed to improve the classification performance of large-scale complex data. The experimental results under an instance and several public data sets show that the proposed method is very effective for selecting the most relevant attributes with high classification accuracy.


2017 ◽  
Vol 7 (1) ◽  
pp. 45-59 ◽  
Author(s):  
Engin Duran ◽  
Burcu Uzgur Duran ◽  
Diyar Akay ◽  
Fatih Emre Boran

Purpose It is of great importance for economy policy makers to comprehend the relationship between macroeconomic indicators and domestic savings, and to find out which indicator is more determinative on the dynamics of domestic savings. The purpose of this paper is to analyze the degree of relationship between Turkey’s domestic savings and selected macroeconomic indicators. Design/methodology/approach To examine the relationship, grey relational analysis (GRA) is applied together with the entropy method to determine the weight of the indicators according to the information level they provide. The analysis covers the data of the period from 1990 to 2014. In practice, however, the data set is used by dividing into two separate periods including before and after the 2001 crisis. Findings The results indicate that the unemployment rate and the gross domestic product (GDP) per capita growth stand out with a relatively high degree of relationship for the period before 2001. When examining the post-2001 period, current balance ratio and GDP growth are ascertained as indicators which have a high degree of relationship with domestic savings. Practical implications These indicators have different aspects affecting both public and private savings. Therefore, it may be beneficial to concentrate on these indicators when designing a policy in order to increase the domestic saving rate. Originality/value There are many econometric models used for investigating Turkey’s macroeconomic indicators and domestic savings causality. But before now, any study which investigates relationship between macroeconomic indicators and domestic savings by GRA could not be encountered. Using one of the newest developed theories (the grey systems theory) for this subject is the significance of this research.


Sign in / Sign up

Export Citation Format

Share Document