scholarly journals Anytime Subgroup Discovery in High Dimensional Numerical Data

Author(s):  
Romain Mathonat ◽  
Diana Nurbakova ◽  
Jean-Francois Boulicaut ◽  
Mehdi Kaytoue
2021 ◽  
Vol 7 ◽  
pp. e512
Author(s):  
Reynald Eugenie ◽  
Erick Stattner

In this paper, we focus on the problem of the search for subgroups in numerical data. This approach aims to identify the subsets of objects, called subgroups, which exhibit interesting characteristics compared to the average, according to a quality measure calculated on a target variable. In this article, we present DISGROU, a new approach that identifies subgroups whose attribute intervals may be discontinuous. Unlike the main algorithms in the field, the originality of our proposal lies in the way it breaks down the intervals of the attributes during the subgroup research phase. The basic assumption of our approach is that the range of attributes defining the groups can be disjoint to improve the quality of the identified subgroups. Indeed the traditional methods in the field perform the subgroup search process only over continuous intervals, which results in the identification of subgroups defined over wider intervals thus containing some irrelevant objects that degrade the quality function. In this way, another advantage of our approach is that it does not require a prior discretization of the attributes, since it works directly on the numerical attributes. The efficiency of our proposal is first demonstrated by comparing the results with two algorithms that are references in the field and then by applying to a case study.


2019 ◽  
Vol 18 (01) ◽  
pp. 1950007
Author(s):  
Bhagyashri A. Kelkar ◽  
Sunil F. Rodd ◽  
Umakant P. Kulkarni

Subspace clustering is a challenging high-dimensional data mining task. There have been several approaches proposed in the literature to identify clusters in subspaces, however their performance and quality is highly affected by input parameters. A little research is done so far on identifying proper parameter values automatically. Other observed drawbacks are requirement of multiple database scans resulting into increased demand for computing resources and generation of many redundant clusters. Here, we propose a parameter light subspace clustering method for numerical data hereafter referred to as CLUSLINK. The algorithm is based on single linkage clustering method and works in bottom up, greedy fashion. The only input user has to provide is how coarse or fine the resulting clusters should be, and if not given, the algorithm operates with default values. The empirical results obtained over synthetic and real benchmark datasets show significant improvement in terms of accuracy and execution time.


Author(s):  
Hisao Ishibuchi ◽  
◽  
Tadahiko Murata ◽  
Tomoharu Nakashima ◽  

We discuss the linguistic rule extraction from numerical data for high-dimensional classification problems. Difficulties in the handling of high-dimensional problems stem from the curse of dimensionality: the number of combinations of antecedent linguistic values exponentially increases as the number of attributes increases. Our goal is to extract a small number of simple linguistic rules with high classification ability. In this paper, the rule extraction is to find a set of linguistic rules using three criteria: its classification ability, its compactness, and the simplicity of each rule. Our approach consists of two phases: candidate rule generation and rule selection. We first propose a pre-screening method for generating a tractable number of promising candidate rules for high-dimensional classification problems where it is impossible to examine all combinations of antecedent linguistic values. Next we show how genetic algorithms can be applied to the rule selection. Then we combine a heuristic rule elimination procedure with genetic algorithms for improving their search ability. Finally, the performance of our approach is examined by computer simulations on commonly used data sets.


2017 ◽  
Vol 31 (5) ◽  
pp. 1391-1418 ◽  
Author(s):  
Mario Boley ◽  
Bryan R. Goldsmith ◽  
Luca M. Ghiringhelli ◽  
Jilles Vreeken

Sign in / Sign up

Export Citation Format

Share Document