Handling WSD using Hierarchical Clustering Algorithm with sentences

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%

Download Full-text

Hesitant Fuzzy Linguistic Agglomerative Hierarchical Clustering Algorithm and Its Application in Judicial Practice

Mathematics ◽

10.3390/math9040370 ◽

2021 ◽

Vol 9 (4) ◽

pp. 370

Author(s):

Shuangsheng Wu ◽

Jie Lin ◽

Zhenyu Zhang ◽

Yushu Yang

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Agglomerative Hierarchical Clustering ◽

Research Gaps ◽

Judicial Practice ◽

Linguistic Term ◽

Clustering Effect ◽

Hierarchical Clustering Algorithm ◽

Fuzzy Linguistic

The fuzzy clustering algorithm has become a research hotspot in many fields because of its better clustering effect and data expression ability. However, little research focuses on the clustering of hesitant fuzzy linguistic term sets (HFLTSs). To fill in the research gaps, we extend the data type of clustering to hesitant fuzzy linguistic information. A kind of hesitant fuzzy linguistic agglomerative hierarchical clustering algorithm is proposed. Furthermore, we propose a hesitant fuzzy linguistic Boole matrix clustering algorithm and compare the two clustering algorithms. The proposed clustering algorithms are applied in the field of judicial execution, which provides decision support for the executive judge to determine the focus of the investigation and the control. A clustering example verifies the clustering algorithm’s effectiveness in the context of hesitant fuzzy linguistic decision information.

Download Full-text

Improved minimum-minimum roughness algorithm for clustering categorical data

International Journal of ADVANCED AND APPLIED SCIENCES ◽

10.21833/ijaas.2021.10.006 ◽

2021 ◽

Vol 8 (10) ◽

pp. 43-50

Author(s):

Truong et al. ◽

Keyword(s):

Machine Learning ◽

Data Mining ◽

Hierarchical Clustering ◽

Categorical Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Results ◽

Data Sets ◽

Top Down ◽

Hierarchical Clustering Algorithm

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.

Download Full-text

A SEQUENCE-ELEMENT-BASED HIERARCHICAL CLUSTERING ALGORITHM FOR CATEGORICAL SEQUENCE DATA

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622005001398 ◽

2005 ◽

Vol 04 (01) ◽

pp. 81-96 ◽

Cited By ~ 5

Author(s):

SEUNG-JOON OH ◽

JAE-YEARN KIM

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Sequence Data ◽

Clustering Algorithms ◽

Scientific Data ◽

Sequence Element ◽

Hierarchical Clustering Algorithm ◽

Synthetic Datasets ◽

Better Than

Recently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. However, few existing clustering algorithms consider sequentiality. In this paper, we study how to cluster these sequence datasets. We propose a new similarity measure to compute the similarity between two sequences. In the proposed measure, subsets of a sequence are considered, and the more identical subsets there are, the more similar the two sequences. In addition, we propose a hierarchical clustering algorithm and an efficient method for measuring similarity. Using a splice dataset and synthetic datasets, we show that the quality of clusters generated by our proposed approach is better than that of clusters produced by traditional clustering algorithms.

Download Full-text

A Novel Local Density Hierarchical Clustering Algorithm Based on Reverse Nearest Neighbors

Mathematical Problems in Engineering ◽

10.1155/2019/2959017 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10

Author(s):

Yaohui Liu ◽

Dong Liu ◽

Fang Yu ◽

Zhengming Ma

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Local Density ◽

Clustering Algorithms ◽

Real Data ◽

Nearest Neighbors ◽

Clustering Methods ◽

Density Peak ◽

Hierarchical Clustering Algorithm

Clustering is widely used in data analysis, and density-based methods are developed rapidly in the recent 10 years. Although the state-of-art density peak clustering algorithms are efficient and can detect arbitrary shape clusters, they are nonsphere type of centroid-based methods essentially. In this paper, a novel local density hierarchical clustering algorithm based on reverse nearest neighbors, RNN-LDH, is proposed. By constructing and using a reverse nearest neighbor graph, the extended core regions are found out as initial clusters. Then, a new local density metric is defined to calculate the density of each object; meanwhile, the density hierarchical relationships among the objects are built according to their densities and neighbor relations. Finally, each unclustered object is classified to one of the initial clusters or noise. Results of experiments on synthetic and real data sets show that RNN-LDH outperforms the current clustering methods based on density peak or reverse nearest neighbors.

Download Full-text

A P system for hierarchical clustering

International Journal of Modern Physics C ◽

10.1142/s0129183119500621 ◽

2019 ◽

Vol 30 (08) ◽

pp. 1950062

Author(s):

Ping Guo ◽

Wenjie Jiang ◽

Yuchi Liu

Keyword(s):

Parallel Computation ◽

Hierarchical Clustering ◽

Clustering Algorithm ◽

Membrane Computing ◽

Clustering Algorithms ◽

P System ◽

A Cell ◽

Hierarchical Clustering Algorithm

Membrane computing, also known as P system, is a distributed and parallel computation framework models. Hierarchical clustering is one of the most basic and widely applied clustering algorithms among all clustering algorithms. In this paper, the combination of membrane computing and hierarchical clustering algorithm is studied. A cell-like hierarchical clustering P system with priority evolution rules and promoters is designed by using the maximum parallelism of membrane computing. The feasibility and effectiveness of the designed P system are verified by the examples.

Download Full-text

DHC: A Distributed Hierarchical Clustering Algorithm for Large Datasets

Journal of Circuits System and Computers ◽

10.1142/s0218126619500658 ◽

2019 ◽

Vol 28 (04) ◽

pp. 1950065 ◽

Cited By ~ 1

Author(s):

Wei Zhang ◽

Gongxuan Zhang ◽

Xiaohui Chen ◽

Yueqi Liu ◽

Xiumin Zhou ◽

...

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Classical Method ◽

Distributed Storage ◽

Clustering Algorithms ◽

Main Memory ◽

Massive Datasets ◽

Memory Space ◽

Practical Applications ◽

Hierarchical Clustering Algorithm

Hierarchical clustering is a classical method to provide a hierarchical representation for the purpose of data analysis. However, in practical applications, it is difficult to deal with massive datasets due to their high computation complexity. To overcome this challenge, this paper presents a novel distributed storage and computation hierarchical clustering algorithm, which has a lower time complexity than the standard hierarchical clustering algorithms. Our proposed approach is suitable for hierarchical clustering on massive datasets, which has the following advantages. First, the algorithm is able to store massive dataset exceeding the main memory space by using distributed storage nodes. Second, the algorithm is able to efficiently process nearest neighbor searching along parallel lines by using distributed computation at each node. Extensive experiments are carried out to validate the effectiveness of the DHC algorithm. Experimental results demonstrate that the algorithm is 10 times faster than the standard hierarchical clustering algorithm, which is an effective and flexible distributed algorithm of hierarchical clustering for massive datasets.

Download Full-text

NBC: An Efficient Hierarchical Clustering Algorithm for Large Datasets

International Journal of Semantic Computing ◽

10.1142/s1793351x15400085 ◽

2015 ◽

Vol 09 (03) ◽

pp. 307-331 ◽

Cited By ~ 1

Author(s):

Wei Zhang ◽

Gongxuan Zhang ◽

Yongli Wang ◽

Zhaomeng Zhu ◽

Tao Li

Keyword(s):

Hierarchical Clustering ◽

Time Complexity ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Large Datasets ◽

Nearest Neighbor Search ◽

Large Dataset ◽

Neighbor Search ◽

Hierarchical Clustering Algorithm

Nearest neighbor search is a key technique used in hierarchical clustering and its computing complexity decides the performance of the hierarchical clustering algorithm. The time complexity of standard agglomerative hierarchical clustering is O(n3), while the time complexity of more advanced hierarchical clustering algorithms (such as nearest neighbor chain, SLINK and CLINK) is O(n2). This paper presents a new nearest neighbor search method called nearest neighbor boundary (NNB), which first divides a large dataset into independent subset and then finds nearest neighbor of each point in subset. When NNB is used, the time complexity of hierarchical clustering can be reduced to O(n log 2n). Based on NNB, we propose a fast hierarchical clustering algorithm called nearest-neighbor boundary clustering (NBC), and the proposed algorithm can be adapted to the parallel and distributed computing framework. The experimental results demonstrate that our algorithm is practical for large datasets.

Download Full-text

Text Clustering Algorithm of Co-Occurrence Word Based on Association-Rule Mining

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.599-601.1749 ◽

2014 ◽

Vol 599-601 ◽

pp. 1749-1752

Author(s):

Chun Xia Jin ◽

Hui Zhang ◽

Qiu Chan Bai

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Association Rule ◽

Association Rule Mining ◽

Clustering Algorithm ◽

Text Clustering ◽

Rule Mining ◽

Word Similarity ◽

Text Feature ◽

Hierarchical Clustering Algorithm

According to the analysis of text feature, the document with co-occurrence words expresses very stronger and more accurately topic information. So this paper puts forward a text clustering algorithm of word co-occurrence based on association-rule mining. The method uses the association-rule mining to extract those word co-occurrences of expressing the topic information in the document. According to the co-occurrence words to build the modeling and co-occurrence word similarity measure, then this paper uses the hierarchical clustering algorithm based on word co-occurrence to realize text clustering. Experimental results show the method proposed in this paper improves the efficiency and accuracy of text clustering compared with other algorithms.

Download Full-text

Based on a hierarchical clustering algorithm: detecting the community structure of a resting state brain network

Future Computer and Information Technology ◽

10.2495/icfcit130781 ◽

2013 ◽

Author(s):

Wenzhao Liu ◽

Limin Niu ◽

Junjie Chen

Keyword(s):

Community Structure ◽

Hierarchical Clustering ◽

Resting State ◽

Clustering Algorithm ◽

Brain Network ◽

Hierarchical Clustering Algorithm

Download Full-text

A hierarchical clustering method for random intervals based on a similarity measure

Computational Statistics ◽

10.1007/s00180-021-01121-3 ◽

2021 ◽

Author(s):

Ana Belén Ramos-Guajardo

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Real Life ◽

Stopping Criterion ◽

Clustering Method ◽

Bootstrap Test ◽

Empirical Performance ◽

Random Intervals ◽

Expected Values

AbstractA new clustering method for random intervals that are measured in the same units over the same group of individuals is provided. It takes into account the similarity degree between the expected values of the random intervals that can be analyzed by means of a two-sample similarity bootstrap test. Thus, the expectations of each pair of random intervals are compared through that test and a p-value matrix is finally obtained. The suggested clustering algorithm considers such a matrix where each p-value can be seen at the same time as a kind of similarity between the random intervals. The algorithm is iterative and includes an objective stopping criterion that leads to statistically similar clusters that are different from each other. Some simulations to show the empirical performance of the proposal are developed and the approach is applied to two real-life situations.

Download Full-text