scholarly journals RECURSIVE HIERARCHICAL CLUSTERING FOR HYPERSPECTRAL IMAGES

Author(s):  
S. May

Abstract. Partition based clustering techniques are widely used in data mining and also to analyze hyperspectral images. Unsupervised clustering only depends on data, without any external knowledge. It creates a complete partition of the image with many classes. And so, sparse labeled samples may be used to label each cluster, and so simplify the supervised step. Each clustering algorithm has its own advantages, drawbacks (initialization, training complexity). We propose in this paper to use a recursive hierarchical clustering based on standard clustering strategies such as K-Means or Fuzzy-C-Means. The recursive hierarchical approach reduces the algorithm complexity, in order to process large amount of input pixels, and also to produce a clustering with a high number of clusters. Moreover, in hyperspectral images, a classical question is related to the high dimensionality and also to the distance that shall be used. Classical clustering algorithms usually use the Euclidean distance to compute distance between samples and centroids. We propose to implement the spectral angle distance instead and evaluate its performance. It better fits the pixel spectrums and is less sensitive to illumination change or spectrum variability inside a semantic class. Different scenes are processed with this method in order to demonstrate its potential.

Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 370
Author(s):  
Shuangsheng Wu ◽  
Jie Lin ◽  
Zhenyu Zhang ◽  
Yushu Yang

The fuzzy clustering algorithm has become a research hotspot in many fields because of its better clustering effect and data expression ability. However, little research focuses on the clustering of hesitant fuzzy linguistic term sets (HFLTSs). To fill in the research gaps, we extend the data type of clustering to hesitant fuzzy linguistic information. A kind of hesitant fuzzy linguistic agglomerative hierarchical clustering algorithm is proposed. Furthermore, we propose a hesitant fuzzy linguistic Boole matrix clustering algorithm and compare the two clustering algorithms. The proposed clustering algorithms are applied in the field of judicial execution, which provides decision support for the executive judge to determine the focus of the investigation and the control. A clustering example verifies the clustering algorithm’s effectiveness in the context of hesitant fuzzy linguistic decision information.


2021 ◽  
Vol 8 (10) ◽  
pp. 43-50
Author(s):  
Truong et al. ◽  

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.


2021 ◽  
Vol 19 ◽  
pp. 310-320
Author(s):  
Suboh Alkhushayni ◽  
Taeyoung Choi ◽  
Du’a Alzaleq

This work aims to expand the knowledge of the area of data analysis through both persistence homology, as well as representations of directed graphs. To be specific, we looked for how we can analyze homology cluster groups using agglomerative Hierarchical Clustering algorithms and methods. Additionally, the Wine data, which is offered in R studio, was analyzed using various cluster algorithms such as Hierarchical Clustering, K-Means Clustering, and PAM Clustering. The goal of the analysis was to find out which cluster's method is proper for a given numerical data set. By testing the data, we tried to find the agglomerative hierarchical clustering method that will be the optimal clustering algorithm among these three; K-Means, PAM, and Random Forest methods. By comparing each model's accuracy value with cultivar coefficients, we came with a conclusion that K-Means methods are the most helpful when working with numerical variables. On the other hand, PAM clustering and Gower with random forest are the most beneficial approaches when working with categorical variables. All these tests can determine the optimal number of clustering groups, given the data set, and by doing the proper analysis. Using those the project, we can apply our method to several industrial areas such that clinical, business, and others. For example, people can make different groups based on each patient who has a common disease, required therapy, and other things in the clinical society. Additionally, for the business area, people can expect to get several clustered groups based on the marginal profit, marginal cost, or other economic indicators.


2017 ◽  
Vol 9 (2) ◽  
pp. 195-213
Author(s):  
Richárd Forster ◽  
Ágnes Fülöp

AbstractThe reconstruction and analyze of measured data play important role in the research of high energy particle physics. This leads to new results in both experimental and theoretical physics. This requires algorithm improvements and high computer capacity. Clustering algorithm makes it possible to get to know the jet structure more accurately. More granular parallelization of the kt cluster algorithms was explored by combining it with the hierarchical clustering methods used in network evaluations. The kt method allows to know the development of particles due to the collision of high-energy nucleus-nucleus. The hierarchical clustering algorithms works on graphs, so the particle information used by the standard kt algorithm was first transformed into an appropriate graph, representing the network of particles. Testing was done using data samples from the Alice offine library, which contains the required modules to simulate the ALICE detector that is a dedicated Pb-Pb detector. The proposed algorithm was compared to the FastJet toolkit's standard longitudinal invariant kt implementation. Parallelizing the standard non-optimized version of this algorithm utilizing the available CPU architecture proved to be 1:6 times faster, than the standard implementation, while the proposed solution in this paper was able to achieve a 12 times faster computing performance, also being scalable enough to efficiently run on GPUs.


Author(s):  
Naohiko Kinoshita ◽  
◽  
Yasunori Endo ◽  
Akira Sugawara ◽  
◽  
...  

Clustering is representative unsupervised classification. Many researchers have proposed clustering algorithms based on mathematical models – methods we call model-based clustering. Clustering techniques are very useful for determining data structures, but model-based clustering is difficult to use for analyzing data correctly because we cannot select a suitable method unless we know the data structure at least partially. The new clustering algorithm we propose introduces soft computing techniques such as fuzzy reasoning in what we call linguistic-based clustering, whose features are not incident to the data structure. We verify the method’s effectiveness through numerical examples.


2013 ◽  
Vol 401-403 ◽  
pp. 1440-1443 ◽  
Author(s):  
Tie Feng Zhang ◽  
Fei Lv ◽  
Rong Gu

Distance choice is an important issue in power load pattern extraction using clustering techniques, so it is necessary to find the influence on clustering result of load curves using different distances in clustering algorithms. In this paper several distances are used in the k-means algorithm for clustering load curves and their influences on the clustering results are analyzed, therefore, the suitable distance for the k-means algorithms is obtained. An example with 147 electricity customers load curves shows distances have different influences on clustering results using the same clustering algorithm. The comparison results indicate that the choice of distances is an important issue in power load pattern extraction using clustering techniques and a suitable distance may improve the accuracy of mining algorithms.


2018 ◽  
Vol 2 (4) ◽  
Author(s):  
Pengfei Zhang ◽  
Hwee-Pink Tan ◽  
Gaoxi Xiao

Motivated by recent developments in Wireless Sensor Networks(WSNs), we present distributed clustering algorithms for maximizingthe lifetime of WSNs, i.e., the duration till the first node dies. Westudy the joint problem of prolonging network lifetime by introducing clustering techniques and energy-harvesting (EH) nodes. Firstlywe propose distributed clustering algorithm for maximizing the lifetime of clustered WSN, which includes EH nodes, serving as relaynodes for cluster heads (CHs). Secondly graph-based and LP-basedEH-CH matching algorithms are proposed which serve as benchmarkalgorithms. Extensive simulation results show that the proposed algorithms can achieve optimal or suboptimal solutions efficiently


2010 ◽  
Vol 44-47 ◽  
pp. 3897-3901
Author(s):  
Hsiang Chuan Liu ◽  
Yen Kuei Yu ◽  
Jeng Ming Yih ◽  
Chin Chun Chen

Euclidean distance function based fuzzy clustering algorithms can only be used to detect spherical structural clusters. Gustafson-Kessel (GK) clustering algorithm and Gath-Geva (GG) clustering algorithm were developed to detect non-spherical structural clusters by employing Mahalanobis distance in objective function, however, both of them need to add some constrains for Mahalanobis distance. In this paper, the authors’ improved Fuzzy C-Means algorithm based on common Mahalanobis distance (FCM-CM) is used to identify the mastery concepts in linear algebra, for comparing the performances with other four partition algorithms; FCM-M, GG, GK, and FCM. The result shows that FCM-CM has better performance than others.


2016 ◽  
Vol 43 (1) ◽  
pp. 54-74 ◽  
Author(s):  
Baojun Ma ◽  
Hua Yuan ◽  
Ye Wu

Clustering is a powerful unsupervised tool for sentiment analysis from text. However, the clustering results may be affected by any step of the clustering process, such as data pre-processing strategy, term weighting method in Vector Space Model and clustering algorithm. This paper presents the results of an experimental study of some common clustering techniques with respect to the task of sentiment analysis. Different from previous studies, in particular, we investigate the combination effects of these factors with a series of comprehensive experimental studies. The experimental results indicate that, first, the K-means-type clustering algorithms show clear advantages on balanced review datasets, while performing rather poorly on unbalanced datasets by considering clustering accuracy. Second, the comparatively newly designed weighting models are better than the traditional weighting models for sentiment clustering on both balanced and unbalanced datasets. Furthermore, adjective and adverb words extraction strategy can offer obvious improvements on clustering performance, while strategies of adopting stemming and stopword removal will bring negative influences on sentiment clustering. The experimental results would be valuable for both the study and usage of clustering methods in online review sentiment analysis.


Sign in / Sign up

Export Citation Format

Share Document