Special Issue on Recent Methodological Developments in Fuzzy Clustering and Related Topics

Author(s):  
Sadaaki Miyamoto

Various applications of data analysis and their effects have been reported recently. With the remarkable progress in classification methods, one example being support vector machines, clustering as the main method of unsupervised classification has also been studied extensively. Consequently, fuzzy methods of clustering is becoming a standard technique. However, unsolved theoretical and methodological problems in fuzzy clustering remain and have to be studied more deeply. This issue collects five papers concerned with fuzzy clustering and related fields, and in all of them the main interest is methodology. Kondo and Kanzawa consider fuzzy clustering with a new objective function using q-divergence, which is a generalization of the well-known Kullback-Leibler divergence. Among different data types, they focus on categorical data. They also show the relations of different methods of fuzzy c-means. Thus, this study tends to further generalize methods of fuzzy clustering, trying to find the methodological boundaries of the capabilities of fuzzy clustering models. Kitajima, Endo, and Hamasuna propose a method of controlling cluster sizes so that the resulting clusters have an even size, which is different from the optimizing of cluster sizes dealt with in other studies. This technique enhances application fields of clustering in which cluster sizes are more important than cluster shapes. Hamasuna et al. study the validity measures of clusters for network data. Cluster validity measures are generally proposed for points in Euclidean spaces, but the authors consider the application of validity measures to network data. Several validity measures are modified and adapted to network data, and their effectiveness is examined using simple network examples. Ubukata et al. propose a new method of c-means related to rough sets, a method based on a different idea from well-known rough c-means by Lingras. Finally, Kusunoki, Wakou, and Tatsumi study the maximum margin model for the nearest prototype classifier that leads to the optimization of the difference of convex functions. All papers include methodologically important ideas that have to be further investigated and applied to real-world problems.

2020 ◽  
Vol 15 (6) ◽  
pp. 517-527
Author(s):  
Yunyun Liang ◽  
Shengli Zhang

Background: Apoptosis proteins have a key role in the development and the homeostasis of the organism, and are very important to understand the mechanism of cell proliferation and death. The function of apoptosis protein is closely related to its subcellular location. Objective: Prediction of apoptosis protein subcellular localization is a meaningful task. Methods: In this study, we predict the apoptosis protein subcellular location by using the PSSMbased second-order moving average descriptor, nonnegative matrix factorization based on Kullback-Leibler divergence and over-sampling algorithms. This model is named by SOMAPKLNMF- OS and constructed on the ZD98, ZW225 and CL317 benchmark datasets. Then, the support vector machine is adopted as the classifier, and the bias-free jackknife test method is used to evaluate the accuracy. Results: Our prediction system achieves the favorable and promising performance of the overall accuracy on the three datasets and also outperforms the other listed models. Conclusion: The results show that our model offers a high throughput tool for the identification of apoptosis protein subcellular localization.


2020 ◽  
Vol 12 (7) ◽  
pp. 1218
Author(s):  
Laura Tuşa ◽  
Mahdi Khodadadzadeh ◽  
Cecilia Contreras ◽  
Kasra Rafiezadeh Shahi ◽  
Margret Fuchs ◽  
...  

Due to the extensive drilling performed every year in exploration campaigns for the discovery and evaluation of ore deposits, drill-core mapping is becoming an essential step. While valuable mineralogical information is extracted during core logging by on-site geologists, the process is time consuming and dependent on the observer and individual background. Hyperspectral short-wave infrared (SWIR) data is used in the mining industry as a tool to complement traditional logging techniques and to provide a rapid and non-invasive analytical method for mineralogical characterization. Additionally, Scanning Electron Microscopy-based image analyses using a Mineral Liberation Analyser (SEM-MLA) provide exhaustive high-resolution mineralogical maps, but can only be performed on small areas of the drill-cores. We propose to use machine learning algorithms to combine the two data types and upscale the quantitative SEM-MLA mineralogical data to drill-core scale. This way, quasi-quantitative maps over entire drill-core samples are obtained. Our upscaling approach increases result transparency and reproducibility by employing physical-based data acquisition (hyperspectral imaging) combined with mathematical models (machine learning). The procedure is tested on 5 drill-core samples with varying training data using random forests, support vector machines and neural network regression models. The obtained mineral abundance maps are further used for the extraction of mineralogical parameters such as mineral association.


2020 ◽  
Author(s):  
Zhanyou Xu ◽  
Andreomar Kurek ◽  
Steven B. Cannon ◽  
Williams D. Beavis

AbstractSelection of markers linked to alleles at quantitative trait loci (QTL) for tolerance to Iron Deficiency Chlorosis (IDC) has not been successful. Genomic selection has been advocated for continuous numeric traits such as yield and plant height. For ordinal data types such as IDC, genomic prediction models have not been systematically compared. The objectives of research reported in this manuscript were to evaluate the most commonly used genomic prediction method, ridge regression and it’s equivalent logistic ridge regression method, with algorithmic modeling methods including random forest, gradient boosting, support vector machine, K-nearest neighbors, Naïve Bayes, and artificial neural network using the usual comparator metric of prediction accuracy. In addition we compared the methods using metrics of greater importance for decisions about selecting and culling lines for use in variety development and genetic improvement projects. These metrics include specificity, sensitivity, precision, decision accuracy, and area under the receiver operating characteristic curve. We found that Support Vector Machine provided the best specificity for culling IDC susceptible lines, while Random Forest GP models provided the best combined set of decision metrics for retaining IDC tolerant and culling IDC susceptible lines.


Author(s):  
K. Honda ◽  
A. Notsu ◽  
T. Matsui ◽  
H. Ichihashi

Cluster validation is an important issue in fuzzy clustering research and many validity measures, most of which are motivated by intuitive justification considering geometrical features, have been developed. This paper proposes a new validation approach, which evaluates the validity degree of cluster partitions from the view point of the optimality of objective functions in FCM-type clustering. This approach makes it possible to evaluate the validity degree of robust cluster partitions, in which geometrical features are not available because of their possibilistic natures.


2016 ◽  
Vol 7 (1) ◽  
pp. 58-68 ◽  
Author(s):  
Imen Trabelsi ◽  
Med Salim Bouhlel

Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.


Genes ◽  
2020 ◽  
Vol 11 (10) ◽  
pp. 1214 ◽  
Author(s):  
Maria Schmidt ◽  
Henry Loeffler-Wirth ◽  
Hans Binder

Single-cell RNA sequencing has become a standard technique to characterize tissue development. Hereby, cross-sectional snapshots of the diversity of cell transcriptomes were transformed into (pseudo-) longitudinal trajectories of cell differentiation using computational methods, which are based on similarity measures distinguishing cell phenotypes. Cell development is driven by alterations of transcriptional programs e.g., by differentiation from stem cells into various tissues or by adapting to micro-environmental requirements. We here complement developmental trajectories in cell-state space by trajectories in gene-state space to more clearly address this latter aspect. Such trajectories can be generated using self-organizing maps machine learning. The method transforms multidimensional gene expression patterns into two dimensional data landscapes, which resemble the metaphoric Waddington epigenetic landscape. Trajectories in this landscape visualize transcriptional programs passed by cells along their developmental paths from stem cells to differentiated tissues. In addition, we generated developmental “vector fields” using RNA-velocities to forecast changes of RNA abundance in the expression landscapes. We applied the method to tissue development of planarian as an illustrative example. Gene-state space trajectories complement our data portrayal approach by (pseudo-)temporal information about changing transcriptional programs of the cells. Future applications can be seen in the fields of tissue and cell differentiation, ageing and tumor progression and also, using other data types such as genome, methylome, and also clinical and epidemiological phenotype data.


2011 ◽  
Vol 1 (1) ◽  
pp. 49-60 ◽  
Author(s):  
K. Honda ◽  
A. Notsu ◽  
T. Matsui ◽  
H. Ichihashi

Cluster validation is an important issue in fuzzy clustering research and many validity measures, most of which are motivated by intuitive justification considering geometrical features, have been developed. This paper proposes a new validation approach, which evaluates the validity degree of cluster partitions from the view point of the optimality of objective functions in FCM-type clustering. This approach makes it possible to evaluate the validity degree of robust cluster partitions, in which geometrical features are not available because of their possibilistic natures.


Author(s):  
Ryo Inokuchi ◽  
◽  
Sadaaki Miyamoto ◽  

In this paper, we discuss fuzzy clustering algorithms for discrete data. Data space is represented as a statistical manifold of the multinomial distribution, and then the Euclidean distance are not adequate in this setting. The geodesic distance on the multinomial manifold can be derived analytically, but it is difficult to use it as a metric directly. We propose fuzzyc-means algorithms using other metrics: the Kullback-Leibler divergence and the Hellinger distance, instead of the Euclidean distance. These two metrics are regarded as approximations of the geodesic distance.


Sign in / Sign up

Export Citation Format

Share Document