Unsupervised methods | ScienceGate

Image segmentation is a cost-effective way to obtain information about the sizes and structural composition of agricultural parcels in an area. To accurately obtain such information, the parameters of the segmentation algorithm ought to be optimized using supervised or unsupervised methods. The difficulty in obtaining reference data makes unsupervised methods indispensable. In this study, we evaluated an existing unsupervised evaluation metric that minimizes a global score (GS), which is computed by summing up the intra-segment uniformity and inter-segment dissimilarity within a segmentation output. We modified this metric and proposed a new metric that uses absolute difference to compute the GS. We compared this proposed metric with the existing metric in two optimization approaches based on the Multiresolution Segmentation (MRS) algorithm to optimally delineate agricultural parcels from Sentinel-2 images in Lower Saxony, Germany. The first approach searches for optimal scale while keeping shape and compactness constant, while the second approach uses Bayesian optimization to optimize the three main parameters of the MRS algorithm. Based on a reference data of agricultural parcels, the optimal segmentation result of each optimization approach was evaluated by calculating the quality rate, over-segmentation, and under-segmentation. For both approaches, our proposed metric outperformed the existing metric in different agricultural landscapes. The proposed metric identified optimal segmentations that were less under-segmented compared to the existing metric. A comparison of the optimal segmentation results obtained in this study to existing benchmark results generated via supervised optimization showed that the unsupervised Bayesian optimization approach based on our proposed metric can potentially be used as an alternative to supervised optimization, particularly in geographic regions where reference data is unavailable or an automated evaluation system is sought.

Download Full-text

Revisiting Supervised and Unsupervised Methods for Effort-Aware Cross-Project Defect Prediction

IEEE Transactions on Software Engineering ◽

10.1109/tse.2020.3001739 ◽

2020 ◽

pp. 1-1

Author(s):

Chao Ni ◽

Xin Xia ◽

David Lo ◽

Xiang Chen ◽

Qing Gu

Keyword(s):

Defect Prediction ◽

Unsupervised Methods ◽

Cross Project

Download Full-text

Unsupervised methods for Software Defect Prediction

Proceedings of the Tenth International Symposium on Information and Communication Technology - SoICT 2019 ◽

10.1145/3368926.3369711 ◽

2019 ◽

Author(s):

Duy-An Ha ◽

Ting-Hsuan Chen ◽

Shyan-Ming Yuan

Keyword(s):

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Unsupervised Methods

Download Full-text

An Application of Neural and Probabilistic Unsupervised Methods to Environmental Factor Analysis of Multi-spectral Images

Image Analysis and Processing – ICIAP 2005 - Lecture Notes in Computer Science ◽

10.1007/11553595_146 ◽

2005 ◽

pp. 1190-1197 ◽

Cited By ~ 1

Author(s):

Luca Pugliese ◽

Silvia Scarpetta ◽

Anna Esposito ◽

Maria Marinaro

Keyword(s):

Factor Analysis ◽

Environmental Factor ◽

Unsupervised Methods

Download Full-text

Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration

Briefings in Bioinformatics ◽

10.1093/bib/bbz138 ◽

2019 ◽

Vol 21 (6) ◽

pp. 2011-2030 ◽

Cited By ~ 6

Author(s):

Morgane Pierre-Jean ◽

Jean-François Deleuze ◽

Edith Le Floch ◽

Florence Mauger

Keyword(s):

Correlation Analysis ◽

Variable Selection ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Molecular Data ◽

Heterogeneous Data ◽

Data Availability ◽

Omics Data ◽

Generalized Canonical Correlation Analysis ◽

Unsupervised Methods

Abstract Recent advances in NGS sequencing, microarrays and mass spectrometry for omics data production have enabled the generation and collection of different modalities of high-dimensional molecular data. The integration of multiple omics datasets is a statistical challenge, due to the limited number of individuals, the high number of variables and the heterogeneity of the datasets to integrate. Recently, a lot of tools have been developed to solve the problem of integrating omics data including canonical correlation analysis, matrix factorization and SM. These commonly used techniques aim to analyze simultaneously two or more types of omics. In this article, we compare a panel of 13 unsupervised methods based on these different approaches to integrate various types of multi-omics datasets: iClusterPlus, regularized generalized canonical correlation analysis, sparse generalized canonical correlation analysis, multiple co-inertia analysis (MCIA), integrative-NMF (intNMF), SNF, MoCluster, mixKernel, CIMLR, LRAcluster, ConsensusClustering, PINSPlus and multi-omics factor analysis (MOFA). We evaluate the ability of the methods to recover the subgroups and the variables that drive the clustering on eight benchmarks of simulation. MOFA does not provide any results on these benchmarks. For clustering, SNF, MoCluster, CIMLR, LRAcluster, ConsensusClustering and intNMF provide the best results. For variable selection, MoCluster outperforms the others. However, the performance of the methods seems to depend on the heterogeneity of the datasets (especially for MCIA, intNMF and iClusterPlus). Finally, we apply the methods on three real studies with heterogeneous data and various phenotypes. We conclude that MoCluster is the best method to analyze these omics data. Availability: An R package named CrIMMix is available on GitHub at https://github.com/CNRGH/crimmix to reproduce all the results of this article.

Download Full-text