Adaptive kernel fuzzy clustering for missing data

Decision trees (DTs) is a machine learning technique that searches the predictor space for the variable and observed value that leads to the best prediction when the data are split into two nodes based on the variable and splitting value. The algorithm repeats its search within each partition of the data until a stopping rule ends the search. Missing data can be problematic in DTs because of an inability to place an observation with a missing value into a node based on the chosen splitting variable. Moreover, missing data can alter the selection process because of its inability to place observations with missing values. Simple missing data approaches (e.g., listwise deletion, majority rule, and surrogate split) have been implemented in DT algorithms; however, more sophisticated missing data techniques have not been thoroughly examined. We propose a modified multiple imputation approach to handling missing data in DTs, and compare this approach with simple missing data approaches as well as single imputation and a multiple imputation with prediction averaging via Monte Carlo Simulation. This study evaluated the performance of each missing data approach when data were MAR or MCAR. The proposed multiple imputation approach and surrogate splits had superior performance with the proposed multiple imputation approach performing best in the more severe missing data conditions. We conclude with recommendations for handling missing data in DTs.

Download Full-text

Automatic Location of the Talairach Cortical Landmarks from T2-Weighted MR Images

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.467-469.629 ◽

2011 ◽

Vol 467-469 ◽

pp. 629-634

Author(s):

Yi Li Fu ◽

Guang Cai Zhang ◽

Qiu Yue Chang ◽

Shu Guo Wang ◽

Xian Wei Han

Keyword(s):

Clustering Algorithm ◽

Brain Atlas ◽

Watershed Algorithm ◽

Data Sets ◽

Gray Level ◽

Region Merging ◽

Mr Images ◽

Automatic Location ◽

Fuzzy C Means Clustering ◽

The Mean

For labeling the T2-weighted MR images using human brain atlas, it is prerequisite to the foundation of the Talairach space for T2W MR images, and the basic condition to found Talairach space is the location of Talairach cortical landmarks from T2W MR images. A method to locate the Talairach cortical landmarks from T2W MR images is proposed, it consists of three aspects: Firstly, determine the planes including the six cortical landmarks ; segment the planes based on fuzzy C-means clustering algorithm, gray level projection, watershed algorithm, region merging, thresholding, and morphologic operations; locate the cortical landmarks from the segmented planes. The algorithm has been validated quantitatively with 20 T2W MR images data sets. The mean errors of the Talairach cortical landmarks were below 1.00 mm. It took about 8 seconds for identifying them on P4 3.0 GHz. This fast, robust algorithm is potentially useful in clinic and for research.

Download Full-text

A Comparative Study on Statistics Based on Binary Data and Fuzzy Data in Office Chair Design

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.538-541.3240 ◽

2012 ◽

Vol 538-541 ◽

pp. 3240-3243

Author(s):

Wei Guo Zhao ◽

Chun Yang ◽

Li Wang

Keyword(s):

Clustering Analysis ◽

Binary Data ◽

Clustering Algorithm ◽

Small Sample ◽

Fuzzy Data ◽

Fuzzy Clustering Analysis ◽

Spot Check ◽

Data Statistics ◽

The Mean ◽

Chair Design

Consumers’ product style perceptions and preference are vague and uncertain. In order to identify consumers’ needs more accurately, this paper established a questionnaire based on fuzzy data, carried out a spot check to consumers’ style preference and perceptions of twelve office chairs with typical form style, then conducted the mean, distances calculation and fuzzy clustering analysis by Excel, SPSS, and Matlab. Comparing with statistics results of traditional questionnaire data, this paper points out that fuzzy data statistics are suitable for the mean calculation of small sample and the clustering algorithm of few preference variables.

Download Full-text

IMPUTATION OF MISSING DATA WITH DIFFERENT MISSINGNESS MECHANISM

Jurnal Teknologi ◽

10.11113/jt.v57.1523 ◽

2012 ◽

Vol 57 (1) ◽

Author(s):

HO MING KANG ◽

FADHILAH YUSOF ◽

ISMAIL MOHAMAD

Keyword(s):

Missing Data ◽

Missing Values ◽

Missing At Random ◽

Absolute Error ◽

Data Sets ◽

Missing Completely At Random ◽

Missingness Mechanism ◽

Mean Imputation ◽

The Mean ◽

Estimation Of Missing Data

This paper presents a study on the estimation of missing data. Data samples with different missingness mechanism namely Missing Completely At Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR) are simulated accordingly. Expectation maximization (EM) algorithm and mean imputation (MI) are applied to these data sets and compared and the performances are evaluated by the mean absolute error (MAE) and root mean square error (RMSE). The results showed that EM is able to estimate the missing data with minimum errors compared to mean imputation (MI) for the three missingness mechanisms. However the graphical results showed that EM failed to estimate the missing values in the missing quadrants when the situation is MNAR.

Download Full-text

Adaptive Kernel-Based Fuzzy C-Means Clustering with Spatial Constraints for Image Segmentation

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800141954003x ◽

2018 ◽

Vol 33 (01) ◽

pp. 1954003 ◽

Cited By ~ 10

Author(s):

Guang Hu ◽

Zhenbin Du

Keyword(s):

Image Segmentation ◽

Clustering Algorithm ◽

Spatial Information ◽

Spatial Constraints ◽

Fuzzy C Means ◽

Central Pixel ◽

Fuzzy C Means Clustering ◽

Adaptive Kernel ◽

Fcm Clustering ◽

Image Pixels

In order to resolve the disadvantages of fuzzy C-means (FCM) clustering algorithm for image segmentation, an improved Kernel-based fuzzy C-means (KFCM) clustering algorithm is proposed. First, the reason why the kernel function is introduced is researched on the basis of the classical KFCM clustering. Then, using spatial neighborhood constraint property of image pixels, an adaptive weighted coefficient is introduced into KFCM to control the influence of the neighborhood pixels to the central pixel automatically. At last, a judging rule for partition fuzzy clustering numbers is proposed that can decide the best clustering partition numbers and provide an optimization foundation for clustering algorithm. An adaptive kernel-based fuzzy C-means clustering with spatial constraints (AKFCMS) model for image segmentation approach is proposed in order to improve the efficiency of image segmentation. Various experiment results show that the proposed approach can get the spatial information features of an image accurately and is robust to realize image segmentation.

Download Full-text

Creating functional groups of marine fish from categorical traits

10.7287/peerj.preprints.27148v1 ◽

2018 ◽

Author(s):

Monique A Ladds ◽

Nokuthaba Sibanda ◽

Richard Arnold ◽

Matthew R Dunn

Keyword(s):

Missing Data ◽

Functional Groups ◽

Missing Values ◽

Clustering Algorithm ◽

Distance Matrix ◽

Two Dimensions ◽

Number Of Clusters ◽

Missing Data Imputation ◽

Linkage Method ◽

Visual Confirmation

Background. Functional groups serve two important functions in ecology, they allow for simplification of ecosystem models and can aid in understanding diversity. Despite their important applications, there has not been a universally accepted method of how to define them. A common approach is to cluster species on a set of traits, validated through visual confirmation of resulting groups based primarily on expert opinion. The goal of this research is to determine a suitable procedure for creating and evaluating functional groups that arise from clustering nominal traits. Methods. To do so we produced a species by trait matrix of 22 traits from 116 fish species from Tasman Bay and Golden Bay, New Zealand. Data collected from photographs and published literature were predominantly nominal, and a small number of continuous traits were discretized. Some data were missing, so the benefit of imputing data was assessed using four approaches on data with known missing values. Hierarchical clustering is utilised to search for underlying data structure in the data that may represent functional groups. Within this clustering paradigm there are a number of distance matrices and linkage methods available, several combinations of which we test. The resulting clusters are evaluated using internal metrics developed specifically for nominal clustering. This revealed the choice of number of clusters, distance matrix and linkage method greatly affected the overall within- and between- cluster variability. We visualise the clustering in two dimensions and the stability of clusters is assessed through bootstrapping. Results. Missing data imputation showed up to 90% accuracy using polytomous imputation, so was used to impute the real missing data. A division of the species information into three functional groups was the most separated, compact and stable result. Increasing the number of clusters increased the inconsistency of group membership, and selection of the appropriate distance matrix and linkage method improved the fit. Discussion. We show that the commonly used methodologies used for the creation of functional groups are fraught with subjectivity, ultimately causing significant variation in the composition of resulting groups. Depending on the research goal dictates the appropriate strategy for selecting number of groups, distance matrix and clustering algorithm combination.

Download Full-text

Creating functional groups of marine fish from categorical traits

PeerJ ◽

10.7717/peerj.5795 ◽

2018 ◽

Vol 6 ◽

pp. e5795 ◽

Cited By ~ 2

Author(s):

Monique A. Ladds ◽

Nokuthaba Sibanda ◽

Richard Arnold ◽

Matthew R. Dunn

Keyword(s):

Missing Data ◽

Functional Groups ◽

Missing Values ◽

Clustering Algorithm ◽

Distance Matrix ◽

Two Dimensions ◽

Number Of Clusters ◽

Missing Data Imputation ◽

Linkage Method ◽

Visual Confirmation

Background Functional groups serve two important functions in ecology: they allow for simplification of ecosystem models and can aid in understanding diversity. Despite their important applications, there has not been a universally accepted method of how to define them. A common approach is to cluster species on a set of traits, validated through visual confirmation of resulting groups based primarily on expert opinion. The goal of this research is to determine a suitable procedure for creating and evaluating functional groups that arise from clustering nominal traits. Methods To do so, we produced a species by trait matrix of 22 traits from 116 fish species from Tasman Bay and Golden Bay, New Zealand. Data collected from photographs and published literature were predominantly nominal, and a small number of continuous traits were discretized. Some data were missing, so the benefit of imputing data was assessed using four approaches on data with known missing values. Hierarchical clustering is utilised to search for underlying data structure in the data that may represent functional groups. Within this clustering paradigm there are a number of distance matrices and linkage methods available, several combinations of which we test. The resulting clusters are evaluated using internal metrics developed specifically for nominal clustering. This revealed the choice of number of clusters, distance matrix and linkage method greatly affected the overall within- and between- cluster variability. We visualise the clustering in two dimensions and the stability of clusters is assessed through bootstrapping. Results Missing data imputation showed up to 90% accuracy using polytomous imputation, so was used to impute the real missing data. A division of the species information into three functional groups was the most separated, compact and stable result. Increasing the number of clusters increased the inconsistency of group membership, and selection of the appropriate distance matrix and linkage method improved the fit. Discussion We show that the commonly used methodologies used for the creation of functional groups are fraught with subjectivity, ultimately causing significant variation in the composition of resulting groups. Depending on the research goal dictates the appropriate strategy for selecting number of groups, distance matrix and clustering algorithm combination.

Download Full-text

Creating functional groups of marine fish from categorical traits

10.7287/peerj.preprints.27148 ◽

2018 ◽

Author(s):

Monique A Ladds ◽

Nokuthaba Sibanda ◽

Richard Arnold ◽

Matthew R Dunn

Keyword(s):

Missing Data ◽

Functional Groups ◽

Missing Values ◽

Clustering Algorithm ◽

Distance Matrix ◽

Two Dimensions ◽

Number Of Clusters ◽

Missing Data Imputation ◽

Linkage Method ◽

Visual Confirmation

Background. Functional groups serve two important functions in ecology, they allow for simplification of ecosystem models and can aid in understanding diversity. Despite their important applications, there has not been a universally accepted method of how to define them. A common approach is to cluster species on a set of traits, validated through visual confirmation of resulting groups based primarily on expert opinion. The goal of this research is to determine a suitable procedure for creating and evaluating functional groups that arise from clustering nominal traits. Methods. To do so we produced a species by trait matrix of 22 traits from 116 fish species from Tasman Bay and Golden Bay, New Zealand. Data collected from photographs and published literature were predominantly nominal, and a small number of continuous traits were discretized. Some data were missing, so the benefit of imputing data was assessed using four approaches on data with known missing values. Hierarchical clustering is utilised to search for underlying data structure in the data that may represent functional groups. Within this clustering paradigm there are a number of distance matrices and linkage methods available, several combinations of which we test. The resulting clusters are evaluated using internal metrics developed specifically for nominal clustering. This revealed the choice of number of clusters, distance matrix and linkage method greatly affected the overall within- and between- cluster variability. We visualise the clustering in two dimensions and the stability of clusters is assessed through bootstrapping. Results. Missing data imputation showed up to 90% accuracy using polytomous imputation, so was used to impute the real missing data. A division of the species information into three functional groups was the most separated, compact and stable result. Increasing the number of clusters increased the inconsistency of group membership, and selection of the appropriate distance matrix and linkage method improved the fit. Discussion. We show that the commonly used methodologies used for the creation of functional groups are fraught with subjectivity, ultimately causing significant variation in the composition of resulting groups. Depending on the research goal dictates the appropriate strategy for selecting number of groups, distance matrix and clustering algorithm combination.

Download Full-text

Neuro-rough-fuzzy approach for regression modelling from missing data

International Journal of Applied Mathematics and Computer Science ◽

10.2478/v10006-012-0035-4 ◽

2012 ◽

Vol 22 (2) ◽

pp. 461-476 ◽

Cited By ~ 12

Author(s):

Krzysztof Simiński

Keyword(s):

Missing Data ◽

Fuzzy System ◽

Missing Values ◽

Clustering Algorithm ◽

Real Life ◽

Fuzzy Model ◽

Data Sets ◽

Fuzzy Approach ◽

Regression Modelling ◽

Neuro Fuzzy

Neuro-rough-fuzzy approach for regression modelling from missing dataReal life data sets often suffer from missing data. The neuro-rough-fuzzy systems proposed hitherto often cannot handle such situations. The paper presents a neuro-fuzzy system for data sets with missing values. The proposed solution is a complete neuro-fuzzy system. The system creates a rough fuzzy model from presented data (both full and with missing values) and is able to elaborate the answer for full and missing data examples. The paper also describes the dedicated clustering algorithm. The paper is accompanied by results of numerical experiments.

Download Full-text