RECURSIVE HIERARCHICAL CLUSTERING FOR HYPERSPECTRAL IMAGES

Abstract. Partition based clustering techniques are widely used in data mining and also to analyze hyperspectral images. Unsupervised clustering only depends on data, without any external knowledge. It creates a complete partition of the image with many classes. And so, sparse labeled samples may be used to label each cluster, and so simplify the supervised step. Each clustering algorithm has its own advantages, drawbacks (initialization, training complexity). We propose in this paper to use a recursive hierarchical clustering based on standard clustering strategies such as K-Means or Fuzzy-C-Means. The recursive hierarchical approach reduces the algorithm complexity, in order to process large amount of input pixels, and also to produce a clustering with a high number of clusters. Moreover, in hyperspectral images, a classical question is related to the high dimensionality and also to the distance that shall be used. Classical clustering algorithms usually use the Euclidean distance to compute distance between samples and centroids. We propose to implement the spectral angle distance instead and evaluate its performance. It better fits the pixel spectrums and is less sensitive to illumination change or spectrum variability inside a semantic class. Different scenes are processed with this method in order to demonstrate its potential.

Download Full-text

Handling WSD using Hierarchical Clustering Algorithm with sentences

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset1841120 ◽

2018 ◽

pp. 83-88

Author(s):

Mohana Priya K ◽

Pooja Ragavi S ◽

Krishna Priya G

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cosine Similarity Measure ◽

Hierarchical Clustering Algorithm ◽

Multiple Levels ◽

Pos Tagger ◽

Sentence Clustering ◽

The Right

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%

Download Full-text

Hesitant Fuzzy Linguistic Agglomerative Hierarchical Clustering Algorithm and Its Application in Judicial Practice

Mathematics ◽

10.3390/math9040370 ◽

2021 ◽

Vol 9 (4) ◽

pp. 370

Author(s):

Shuangsheng Wu ◽

Jie Lin ◽

Zhenyu Zhang ◽

Yushu Yang

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Agglomerative Hierarchical Clustering ◽

Research Gaps ◽

Judicial Practice ◽

Linguistic Term ◽

Clustering Effect ◽

Hierarchical Clustering Algorithm ◽

Fuzzy Linguistic

The fuzzy clustering algorithm has become a research hotspot in many fields because of its better clustering effect and data expression ability. However, little research focuses on the clustering of hesitant fuzzy linguistic term sets (HFLTSs). To fill in the research gaps, we extend the data type of clustering to hesitant fuzzy linguistic information. A kind of hesitant fuzzy linguistic agglomerative hierarchical clustering algorithm is proposed. Furthermore, we propose a hesitant fuzzy linguistic Boole matrix clustering algorithm and compare the two clustering algorithms. The proposed clustering algorithms are applied in the field of judicial execution, which provides decision support for the executive judge to determine the focus of the investigation and the control. A clustering example verifies the clustering algorithm’s effectiveness in the context of hesitant fuzzy linguistic decision information.

Download Full-text

Improved minimum-minimum roughness algorithm for clustering categorical data

International Journal of ADVANCED AND APPLIED SCIENCES ◽

10.21833/ijaas.2021.10.006 ◽

2021 ◽

Vol 8 (10) ◽

pp. 43-50

Author(s):

Truong et al. ◽

Keyword(s):

Machine Learning ◽

Data Mining ◽

Hierarchical Clustering ◽

Categorical Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Results ◽

Data Sets ◽

Top Down ◽

Hierarchical Clustering Algorithm

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.

Download Full-text

Data Analysis Using Representation Theory and Clustering Algorithms

WSEAS TRANSACTIONS ON COMPUTERS ◽

10.37394/23205.2020.19.38 ◽

2021 ◽

Vol 19 ◽

pp. 310-320

Author(s):

Suboh Alkhushayni ◽

Taeyoung Choi ◽

Du’a Alzaleq

Keyword(s):

Data Analysis ◽

Random Forest ◽

Hierarchical Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Optimal Number ◽

Categorical Variables ◽

Common Disease ◽

Agglomerative Hierarchical Clustering ◽

Data Set

This work aims to expand the knowledge of the area of data analysis through both persistence homology, as well as representations of directed graphs. To be specific, we looked for how we can analyze homology cluster groups using agglomerative Hierarchical Clustering algorithms and methods. Additionally, the Wine data, which is offered in R studio, was analyzed using various cluster algorithms such as Hierarchical Clustering, K-Means Clustering, and PAM Clustering. The goal of the analysis was to find out which cluster's method is proper for a given numerical data set. By testing the data, we tried to find the agglomerative hierarchical clustering method that will be the optimal clustering algorithm among these three; K-Means, PAM, and Random Forest methods. By comparing each model's accuracy value with cultivar coefficients, we came with a conclusion that K-Means methods are the most helpful when working with numerical variables. On the other hand, PAM clustering and Gower with random forest are the most beneficial approaches when working with categorical variables. All these tests can determine the optimal number of clustering groups, given the data set, and by doing the proper analysis. Using those the project, we can apply our method to several industrial areas such that clinical, business, and others. For example, people can make different groups based on each patient who has a common disease, required therapy, and other things in the clinical society. Additionally, for the business area, people can expect to get several clustered groups based on the marginal profit, marginal cost, or other economic indicators.

Download Full-text

Hierarchical kt jet clustering for parallel architectures

Acta Universitatis Sapientiae Informatica ◽

10.1515/ausi-2017-0012 ◽

2017 ◽

Vol 9 (2) ◽

pp. 195-213

Author(s):

Richárd Forster ◽

Ágnes Fülöp

Keyword(s):

Hierarchical Clustering ◽

Particle Physics ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

High Energy ◽

Theoretical Physics ◽

High Energy Particle ◽

Clustering Methods ◽

Hierarchical Clustering Methods ◽

Using Data

AbstractThe reconstruction and analyze of measured data play important role in the research of high energy particle physics. This leads to new results in both experimental and theoretical physics. This requires algorithm improvements and high computer capacity. Clustering algorithm makes it possible to get to know the jet structure more accurately. More granular parallelization of the kt cluster algorithms was explored by combining it with the hierarchical clustering methods used in network evaluations. The kt method allows to know the development of particles due to the collision of high-energy nucleus-nucleus. The hierarchical clustering algorithms works on graphs, so the particle information used by the standard kt algorithm was first transformed into an appropriate graph, representing the network of particles. Testing was done using data samples from the Alice offine library, which contains the required modules to simulate the ALICE detector that is a dedicated Pb-Pb detector. The proposed algorithm was compared to the FastJet toolkit's standard longitudinal invariant kt implementation. Parallelizing the standard non-optimized version of this algorithm utilizing the available CPU architecture proved to be 1:6 times faster, than the standard implementation, while the proposed solution in this paper was able to achieve a 12 times faster computing performance, also being scalable enough to efficiently run on GPUs.

Download Full-text

On Hierarchical Linguistic-Based Clustering

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2015.p0900 ◽

2015 ◽

Vol 19 (6) ◽

pp. 900-906 ◽

Cited By ~ 1

Author(s):

Naohiko Kinoshita ◽

◽

Yasunori Endo ◽

Akira Sugawara ◽

◽

...

Keyword(s):

Data Structure ◽

Data Structures ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Fuzzy Reasoning ◽

Unsupervised Classification ◽

Clustering Techniques ◽

Model Based Clustering ◽

Model Based ◽

Soft Computing Techniques

Clustering is representative unsupervised classification. Many researchers have proposed clustering algorithms based on mathematical models – methods we call model-based clustering. Clustering techniques are very useful for determining data structures, but model-based clustering is difficult to use for analyzing data correctly because we cannot select a suitable method unless we know the data structure at least partially. The new clustering algorithm we propose introduces soft computing techniques such as fuzzy reasoning in what we call linguistic-based clustering, whose features are not incident to the data structure. We verify the method’s effectiveness through numerical examples.

Download Full-text

The Influence on Clustering Results of Electricity Load Curves Using Different Distances

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.401-403.1440 ◽

2013 ◽

Vol 401-403 ◽

pp. 1440-1443 ◽

Cited By ~ 1

Author(s):

Tie Feng Zhang ◽

Fei Lv ◽

Rong Gu

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Pattern Extraction ◽

Clustering Techniques ◽

Comparison Results ◽

Power Load ◽

Load Pattern ◽

Electricity Load ◽

Mining Algorithms

Distance choice is an important issue in power load pattern extraction using clustering techniques, so it is necessary to find the influence on clustering result of load curves using different distances in clustering algorithms. In this paper several distances are used in the k-means algorithm for clustering load curves and their influences on the clustering results are analyzed, therefore, the suitable distance for the k-means algorithms is obtained. An example with 147 electricity customers load curves shows distances have different influences on clustering results using the same clustering algorithm. The comparison results indicate that the choice of distances is an important issue in power load pattern extraction using clustering techniques and a suitable distance may improve the accuracy of mining algorithms.

Download Full-text

Distributed Algorithms for MaximizingLifetime in Clustered Wireless SensorNetworks Using Energy-Harvesting RelayNod

Journal of Electronic Research and Application ◽

10.26689/jera.v2i4.510 ◽

2018 ◽

Vol 2 (4) ◽

Author(s):

Pengfei Zhang ◽

Hwee-Pink Tan ◽

Gaoxi Xiao

Keyword(s):

Energy Harvesting ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Wireless Sensor ◽

Distributed Clustering ◽

Clustering Techniques ◽

Extensive Simulation ◽

Joint Problem ◽

Recent Developments ◽

Simulation Results

Motivated by recent developments in Wireless Sensor Networks(WSNs), we present distributed clustering algorithms for maximizingthe lifetime of WSNs, i.e., the duration till the first node dies. Westudy the joint problem of prolonging network lifetime by introducing clustering techniques and energy-harvesting (EH) nodes. Firstlywe propose distributed clustering algorithm for maximizing the lifetime of clustered WSN, which includes EH nodes, serving as relaynodes for cluster heads (CHs). Secondly graph-based and LP-basedEH-CH matching algorithms are proposed which serve as benchmarkalgorithms. Extensive simulation results show that the proposed algorithms can achieve optimal or suboptimal solutions efficiently

Download Full-text

Identifying the Mastery Concepts in Linear Algebra by Using FCM-CM Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.44-47.3897 ◽

2010 ◽

Vol 44-47 ◽

pp. 3897-3901

Author(s):

Hsiang Chuan Liu ◽

Yen Kuei Yu ◽

Jeng Ming Yih ◽

Chin Chun Chen

Keyword(s):

Objective Function ◽

Linear Algebra ◽

Fuzzy Clustering ◽

Distance Function ◽

Mahalanobis Distance ◽

Euclidean Distance ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Euclidean Distance Function ◽

Fuzzy C Means Algorithm

Euclidean distance function based fuzzy clustering algorithms can only be used to detect spherical structural clusters. Gustafson-Kessel (GK) clustering algorithm and Gath-Geva (GG) clustering algorithm were developed to detect non-spherical structural clusters by employing Mahalanobis distance in objective function, however, both of them need to add some constrains for Mahalanobis distance. In this paper, the authors’ improved Fuzzy C-Means algorithm based on common Mahalanobis distance (FCM-CM) is used to identify the mastery concepts in linear algebra, for comparing the performances with other four partition algorithms; FCM-M, GG, GK, and FCM. The result shows that FCM-CM has better performance than others.

Download Full-text

Exploring performance of clustering methods on document sentiment analysis

Journal of Information Science ◽

10.1177/0165551515617374 ◽

2016 ◽

Vol 43 (1) ◽

pp. 54-74 ◽

Cited By ~ 14

Author(s):

Baojun Ma ◽

Hua Yuan ◽

Ye Wu

Keyword(s):

Sentiment Analysis ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Studies ◽

Experimental Results ◽

Clustering Methods ◽

Term Weighting ◽

Weighting Method ◽

Clustering Techniques ◽

Better Than

Clustering is a powerful unsupervised tool for sentiment analysis from text. However, the clustering results may be affected by any step of the clustering process, such as data pre-processing strategy, term weighting method in Vector Space Model and clustering algorithm. This paper presents the results of an experimental study of some common clustering techniques with respect to the task of sentiment analysis. Different from previous studies, in particular, we investigate the combination effects of these factors with a series of comprehensive experimental studies. The experimental results indicate that, first, the K-means-type clustering algorithms show clear advantages on balanced review datasets, while performing rather poorly on unbalanced datasets by considering clustering accuracy. Second, the comparatively newly designed weighting models are better than the traditional weighting models for sentiment clustering on both balanced and unbalanced datasets. Furthermore, adjective and adverb words extraction strategy can offer obvious improvements on clustering performance, while strategies of adopting stemming and stopword removal will bring negative influences on sentiment clustering. The experimental results would be valuable for both the study and usage of clustering methods in online review sentiment analysis.

Download Full-text