scholarly journals Adaptive kernel fuzzy clustering for missing data

PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0259266
Author(s):  
Anny K. G. Rodrigues ◽  
Raydonal Ospina ◽  
Marcelo R. P. Ferreira

Many machine learning procedures, including clustering analysis are often affected by missing values. This work aims to propose and evaluate a Kernel Fuzzy C-means clustering algorithm considering the kernelization of the metric with local adaptive distances (VKFCM-K-LP) under three types of strategies to deal with missing data. The first strategy, called Whole Data Strategy (WDS), performs clustering only on the complete part of the dataset, i.e. it discards all instances with missing data. The second approach uses the Partial Distance Strategy (PDS), in which partial distances are computed among all available resources and then re-scaled by the reciprocal of the proportion of observed values. The third technique, called Optimal Completion Strategy (OCS), computes missing values iteratively as auxiliary variables in the optimization of a suitable objective function. The clustering results were evaluated according to different metrics. The best performance of the clustering algorithm was achieved under the PDS and OCS strategies. Under the OCS approach, new datasets were derive and the missing values were estimated dynamically in the optimization process. The results of clustering under the OCS strategy also presented a superior performance when compared to the resulting clusters obtained by applying the VKFCM-K-LP algorithm on a version where missing values are previously imputed by the mean or the median of the observed values.

2019 ◽  
Vol 37 (2) ◽  
pp. 2453-2471
Author(s):  
Geqi Qi ◽  
Wei Guan ◽  
Zhengbing He ◽  
Ailing Huang

2021 ◽  
Vol 1 (1) ◽  
Author(s):  
Danielle M. Rodgers ◽  
Ross Jacobucci ◽  
Kevin J. Grimm

Decision trees (DTs) is a machine learning technique that searches the predictor space for the variable and observed value that leads to the best prediction when the data are split into two nodes based on the variable and splitting value. The algorithm repeats its search within each partition of the data until a stopping rule ends the search. Missing data can be problematic in DTs because of an inability to place an observation with a missing value into a node based on the chosen splitting variable. Moreover, missing data can alter the selection process because of its inability to place observations with missing values. Simple missing data approaches (e.g., listwise deletion, majority rule, and surrogate split) have been implemented in DT algorithms; however, more sophisticated missing data techniques have not been thoroughly examined. We propose a modified multiple imputation approach to handling missing data in DTs, and compare this approach with simple missing data approaches as well as single imputation and a multiple imputation with prediction averaging via Monte Carlo Simulation. This study evaluated the performance of each missing data approach when data were MAR or MCAR. The proposed multiple imputation approach and surrogate splits had superior performance with the proposed multiple imputation approach performing best in the more severe missing data conditions. We conclude with recommendations for handling missing data in DTs.


2011 ◽  
Vol 467-469 ◽  
pp. 629-634
Author(s):  
Yi Li Fu ◽  
Guang Cai Zhang ◽  
Qiu Yue Chang ◽  
Shu Guo Wang ◽  
Xian Wei Han

For labeling the T2-weighted MR images using human brain atlas, it is prerequisite to the foundation of the Talairach space for T2W MR images, and the basic condition to found Talairach space is the location of Talairach cortical landmarks from T2W MR images. A method to locate the Talairach cortical landmarks from T2W MR images is proposed, it consists of three aspects: Firstly, determine the planes including the six cortical landmarks ; segment the planes based on fuzzy C-means clustering algorithm, gray level projection, watershed algorithm, region merging, thresholding, and morphologic operations; locate the cortical landmarks from the segmented planes. The algorithm has been validated quantitatively with 20 T2W MR images data sets. The mean errors of the Talairach cortical landmarks were below 1.00 mm. It took about 8 seconds for identifying them on P4 3.0 GHz. This fast, robust algorithm is potentially useful in clinic and for research.


2012 ◽  
Vol 538-541 ◽  
pp. 3240-3243
Author(s):  
Wei Guo Zhao ◽  
Chun Yang ◽  
Li Wang

Consumers’ product style perceptions and preference are vague and uncertain. In order to identify consumers’ needs more accurately, this paper established a questionnaire based on fuzzy data, carried out a spot check to consumers’ style preference and perceptions of twelve office chairs with typical form style, then conducted the mean, distances calculation and fuzzy clustering analysis by Excel, SPSS, and Matlab. Comparing with statistics results of traditional questionnaire data, this paper points out that fuzzy data statistics are suitable for the mean calculation of small sample and the clustering algorithm of few preference variables.


2012 ◽  
Vol 57 (1) ◽  
Author(s):  
HO MING KANG ◽  
FADHILAH YUSOF ◽  
ISMAIL MOHAMAD

This paper presents a study on the estimation of missing data. Data samples with different missingness mechanism namely Missing Completely At Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR) are simulated accordingly. Expectation maximization (EM) algorithm and mean imputation (MI) are applied to these data sets and compared and the performances are evaluated by the mean absolute error (MAE) and root mean square error (RMSE). The results showed that EM is able to estimate the missing data with minimum errors compared to mean imputation (MI) for the three missingness mechanisms. However the graphical results showed that EM failed to estimate the missing values in the missing quadrants when the situation is MNAR.


Author(s):  
Guang Hu ◽  
Zhenbin Du

In order to resolve the disadvantages of fuzzy C-means (FCM) clustering algorithm for image segmentation, an improved Kernel-based fuzzy C-means (KFCM) clustering algorithm is proposed. First, the reason why the kernel function is introduced is researched on the basis of the classical KFCM clustering. Then, using spatial neighborhood constraint property of image pixels, an adaptive weighted coefficient is introduced into KFCM to control the influence of the neighborhood pixels to the central pixel automatically. At last, a judging rule for partition fuzzy clustering numbers is proposed that can decide the best clustering partition numbers and provide an optimization foundation for clustering algorithm. An adaptive kernel-based fuzzy C-means clustering with spatial constraints (AKFCMS) model for image segmentation approach is proposed in order to improve the efficiency of image segmentation. Various experiment results show that the proposed approach can get the spatial information features of an image accurately and is robust to realize image segmentation.


2018 ◽  
Author(s):  
Monique A Ladds ◽  
Nokuthaba Sibanda ◽  
Richard Arnold ◽  
Matthew R Dunn

Background. Functional groups serve two important functions in ecology, they allow for simplification of ecosystem models and can aid in understanding diversity. Despite their important applications, there has not been a universally accepted method of how to define them. A common approach is to cluster species on a set of traits, validated through visual confirmation of resulting groups based primarily on expert opinion. The goal of this research is to determine a suitable procedure for creating and evaluating functional groups that arise from clustering nominal traits. Methods. To do so we produced a species by trait matrix of 22 traits from 116 fish species from Tasman Bay and Golden Bay, New Zealand. Data collected from photographs and published literature were predominantly nominal, and a small number of continuous traits were discretized. Some data were missing, so the benefit of imputing data was assessed using four approaches on data with known missing values. Hierarchical clustering is utilised to search for underlying data structure in the data that may represent functional groups. Within this clustering paradigm there are a number of distance matrices and linkage methods available, several combinations of which we test. The resulting clusters are evaluated using internal metrics developed specifically for nominal clustering. This revealed the choice of number of clusters, distance matrix and linkage method greatly affected the overall within- and between- cluster variability. We visualise the clustering in two dimensions and the stability of clusters is assessed through bootstrapping. Results. Missing data imputation showed up to 90% accuracy using polytomous imputation, so was used to impute the real missing data. A division of the species information into three functional groups was the most separated, compact and stable result. Increasing the number of clusters increased the inconsistency of group membership, and selection of the appropriate distance matrix and linkage method improved the fit. Discussion. We show that the commonly used methodologies used for the creation of functional groups are fraught with subjectivity, ultimately causing significant variation in the composition of resulting groups. Depending on the research goal dictates the appropriate strategy for selecting number of groups, distance matrix and clustering algorithm combination.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5795 ◽  
Author(s):  
Monique A. Ladds ◽  
Nokuthaba Sibanda ◽  
Richard Arnold ◽  
Matthew R. Dunn

Background Functional groups serve two important functions in ecology: they allow for simplification of ecosystem models and can aid in understanding diversity. Despite their important applications, there has not been a universally accepted method of how to define them. A common approach is to cluster species on a set of traits, validated through visual confirmation of resulting groups based primarily on expert opinion. The goal of this research is to determine a suitable procedure for creating and evaluating functional groups that arise from clustering nominal traits. Methods To do so, we produced a species by trait matrix of 22 traits from 116 fish species from Tasman Bay and Golden Bay, New Zealand. Data collected from photographs and published literature were predominantly nominal, and a small number of continuous traits were discretized. Some data were missing, so the benefit of imputing data was assessed using four approaches on data with known missing values. Hierarchical clustering is utilised to search for underlying data structure in the data that may represent functional groups. Within this clustering paradigm there are a number of distance matrices and linkage methods available, several combinations of which we test. The resulting clusters are evaluated using internal metrics developed specifically for nominal clustering. This revealed the choice of number of clusters, distance matrix and linkage method greatly affected the overall within- and between- cluster variability. We visualise the clustering in two dimensions and the stability of clusters is assessed through bootstrapping. Results Missing data imputation showed up to 90% accuracy using polytomous imputation, so was used to impute the real missing data. A division of the species information into three functional groups was the most separated, compact and stable result. Increasing the number of clusters increased the inconsistency of group membership, and selection of the appropriate distance matrix and linkage method improved the fit. Discussion We show that the commonly used methodologies used for the creation of functional groups are fraught with subjectivity, ultimately causing significant variation in the composition of resulting groups. Depending on the research goal dictates the appropriate strategy for selecting number of groups, distance matrix and clustering algorithm combination.


2018 ◽  
Author(s):  
Monique A Ladds ◽  
Nokuthaba Sibanda ◽  
Richard Arnold ◽  
Matthew R Dunn

Background. Functional groups serve two important functions in ecology, they allow for simplification of ecosystem models and can aid in understanding diversity. Despite their important applications, there has not been a universally accepted method of how to define them. A common approach is to cluster species on a set of traits, validated through visual confirmation of resulting groups based primarily on expert opinion. The goal of this research is to determine a suitable procedure for creating and evaluating functional groups that arise from clustering nominal traits. Methods. To do so we produced a species by trait matrix of 22 traits from 116 fish species from Tasman Bay and Golden Bay, New Zealand. Data collected from photographs and published literature were predominantly nominal, and a small number of continuous traits were discretized. Some data were missing, so the benefit of imputing data was assessed using four approaches on data with known missing values. Hierarchical clustering is utilised to search for underlying data structure in the data that may represent functional groups. Within this clustering paradigm there are a number of distance matrices and linkage methods available, several combinations of which we test. The resulting clusters are evaluated using internal metrics developed specifically for nominal clustering. This revealed the choice of number of clusters, distance matrix and linkage method greatly affected the overall within- and between- cluster variability. We visualise the clustering in two dimensions and the stability of clusters is assessed through bootstrapping. Results. Missing data imputation showed up to 90% accuracy using polytomous imputation, so was used to impute the real missing data. A division of the species information into three functional groups was the most separated, compact and stable result. Increasing the number of clusters increased the inconsistency of group membership, and selection of the appropriate distance matrix and linkage method improved the fit. Discussion. We show that the commonly used methodologies used for the creation of functional groups are fraught with subjectivity, ultimately causing significant variation in the composition of resulting groups. Depending on the research goal dictates the appropriate strategy for selecting number of groups, distance matrix and clustering algorithm combination.


Author(s):  
Krzysztof Simiński

Neuro-rough-fuzzy approach for regression modelling from missing dataReal life data sets often suffer from missing data. The neuro-rough-fuzzy systems proposed hitherto often cannot handle such situations. The paper presents a neuro-fuzzy system for data sets with missing values. The proposed solution is a complete neuro-fuzzy system. The system creates a rough fuzzy model from presented data (both full and with missing values) and is able to elaborate the answer for full and missing data examples. The paper also describes the dedicated clustering algorithm. The paper is accompanied by results of numerical experiments.


Sign in / Sign up

Export Citation Format

Share Document