exploratory data mining
Recently Published Documents


TOTAL DOCUMENTS

51
(FIVE YEARS 10)

H-INDEX

8
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Hansapani Rodrigo ◽  
Eldré W. Beukes ◽  
Gerhard Andersson ◽  
Vinaya Manchaiah

BACKGROUND There is a huge variability in the way individuals with tinnitus respond to interventions. These experiential variations together with a range of associated etiologies, contribute to tinnitus being a highly heterogeneous condition. Despite this heterogeneity, a “one size fits all” approach is taken when making management recommendations. Although there are various management approaches, not all are equally effective. Psychological approaches such as cognitive behavioral therapy (CBT) have the most evidence-base. OBJECTIVE Managing tinnitus is challenging due to the significant variations in tinnitus experiences and treatment success. Tailored interventions based on individual tinnitus profiles may improve outcomes. Predictive models of treatment success are, however, lacking. The current study aimed to used exploratory data mining techniques (i.e., decision tree models) to identify the variables associated with treatment success for an Internet-based cognitive behavioral therapy (ICBT) for tinnitus. METHODS Individuals (n = 228) who underwent ICBT in three separate clinical trials were included in this analysis. The primary outcome variable was reducing 13 points in tinnitus severity as measured by the Tinnitus Functional Index following the intervention. Predictor variables included demographic characteristics, tinnitus, and hearing-related variables, and clinical factors (i.e., anxiety, depression, insomnia, hyperacusis, hearing disability, cognitive function, and life satisfaction). Analyses were undertaken using various exploratory machine learning algorithms to identify the most suitable variable. Five decision tree models were implemented, namely CART, C5.0, Gradient Boosting, AdaBoost algorithm, and Random Forest. The SHapley Additive exPlanations (SHAP) framework was applied to the two best models to identify the relative predictor importance. RESULTS Of the five decision tree models, CART (accuracy of 74%, sensitivity of 74%, specificity of 64%, and AUC .69) and Gradient boosting (accuracy of 72%, sensitivity of 78%, specificity of 59%, and area under the curve .68) were found to be the best predictive models. Although the other models had an acceptable accuracy (ranged between 56 to 66%) and sensitivity (varied between 69 to 75%), they all had relatively weak specificity (varied between 31 to 50%) and area under the curve (varied between .52 to .6). Higher baseline tinnitus severity and higher education level were the most influencing factors in the ICBT outcome. The CART decision tree model identified three participant groups who had at least 85% success probability following undertaking ICBT. CONCLUSIONS In this study, decision tree models, especially the CART and Gradient boosting models, appear to be promising in predicting the ICBT outcomes. Their predictive power may be improved by using larger sample sizes and including a wider range of predictive factors in future studies.


2020 ◽  
Vol 5 (2) ◽  
pp. 117-127
Author(s):  
Ahamed Shafeeq ◽  

Data clustering is the method of gathering of data points so that the more similar points will be in the same group. It is a key role in exploratory data mining and a popular technique used in many fields to analyze statistical data. Quality clusters are the key requirement of the cluster analysis result. There will be tradeoffs between the speed of the clustering algorithm and the quality of clusters it produces. Both the quality and speed criteria must be considered for the state-of-the-art clustering algorithm for applications. The Bio-inspired technique has ensured that the process is not trapped in local minima, which is the main bottleneck of the traditional clustering algorithm. The results produced by the bio-inspired clustering algorithms are better than the traditional clustering algorithms. The newly introduced Whale optimization-based clustering is one of the promising algorithms from the bio-inspired family. The quality of clusters produced by Whale optimization-based clustering is compared with k-means, Kohonen self-organizing feature diagram, Grey wolf optimization. Popular quality measures such as the Silhouette index, Davies-Bouldin index, and Calianski-Harabasz index are used in the evaluation.


2020 ◽  
Vol 24 (6) ◽  
pp. 1403-1439
Author(s):  
Marvin Meeng ◽  
Harm de Vries ◽  
Peter Flach ◽  
Siegfried Nijssen ◽  
Arno Knobbe

Subgroup Discovery is a supervised, exploratory data mining paradigm that aims to identify subsets of a dataset that show interesting behaviour with respect to some designated target attribute. The way in which such distributional differences are quantified varies with the target attribute type. This work concerns continuous targets, which are important in many practical applications. For such targets, differences are often quantified using z-score and similar measures that compare simple statistics such as the mean and variance of the subset and the data. However, most distributions are not fully determined by their mean and variance alone. As a result, measures of distributional difference solely based on such simple statistics will miss potentially interesting subgroups. This work proposes methods to recognise distributional differences in a much broader sense. To this end, density estimation is performed using histogram and kernel density estimation techniques. In the spirit of Exceptional Model Mining, the proposed methods are extended to deal with multiple continuous target attributes, such that comparisons are not restricted to univariate distributions, but are available for joint distributions of any dimensionality. The methods can be incorporated easily into existing Subgroup Discovery frameworks, so no new frameworks are developed.


Diabetes ◽  
2020 ◽  
Vol 69 (Supplement 1) ◽  
pp. 1624-P
Author(s):  
ERIN M. TALLON ◽  
MARK A. CLEMENTS ◽  
DANLU LIU ◽  
KATRINA BOLES ◽  
RACHEL A. STUCK ◽  
...  

2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Wael Bahia ◽  
Ismael Soltani ◽  
Anouar Abidi ◽  
Anis Haddad ◽  
Salima Ferchichi ◽  
...  

2020 ◽  
Vol 24 (5) ◽  
pp. 1456-1468 ◽  
Author(s):  
Danlu Liu ◽  
William Baskett ◽  
David Beversdorf ◽  
Chi-Ren Shyu

Author(s):  
Harendra Kumar

Clustering is a process of grouping a set of data points in such a way that data points in the same group (called cluster) are more similar to each other than to data points lying in other groups (clusters). Clustering is a main task of exploratory data mining, and it has been widely used in many areas such as pattern recognition, image analysis, machine learning, bioinformatics, information retrieval, and so on. Clusters are always identified by similarity measures. These similarity measures include intensity, distance, and connectivity. Based on the applications of the data, different similarity measures may be chosen. The purpose of this chapter is to produce an overview of much (certainly not all) of clustering algorithms. The chapter covers valuable surveys, the types of clusters, and methods used for constructing the clusters.


Sign in / Sign up

Export Citation Format

Share Document