A top-down information theoretic word clustering algorithm for phrase recognition

Analysis Accuracy of Similar Word Based Clustering (EWSB) Algorithm on Machine Translator Bahasa Indonesia-Minang

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v3i3.241 ◽

2018 ◽

Vol 3 (3) ◽

Author(s):

Herry Sujaini

Keyword(s):

Machine Translation ◽

Clustering Algorithm ◽

Statistical Machine Translation ◽

Target Language ◽

Word Similarity ◽

Similar Word ◽

Word Clustering ◽

Translation Accuracy ◽

Bahasa Indonesia

Extended Word Similarity Based (EWSB) Clustering is a word clustering algorithm based on the value of words similarity obtained from the computation of a corpus. One of the benefits of clustering with this algorithm is to improve the translation of a statistical machine translation. Previous research proved that EWSB algorithm could improve the Indonesian-English translator, where the algorithm was applied to Indonesian language as target language.This paper discusses the results of a research using EWSB algorithm on a Indonesian to Minang statistical machine translator, where the algorithm is applied to Minang language as the target language. The research obtained resulted that the EWSB algorithm is quite effective when used in Minang language as the target language. The results of this study indicate that EWSB algorithm can improve the translation accuracy by 6.36%.

Download Full-text

Improved minimum-minimum roughness algorithm for clustering categorical data

International Journal of ADVANCED AND APPLIED SCIENCES ◽

10.21833/ijaas.2021.10.006 ◽

2021 ◽

Vol 8 (10) ◽

pp. 43-50

Author(s):

Truong et al. ◽

Keyword(s):

Machine Learning ◽

Data Mining ◽

Hierarchical Clustering ◽

Categorical Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Results ◽

Data Sets ◽

Top Down ◽

Hierarchical Clustering Algorithm

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.

Download Full-text

A POS-based fuzzy word clustering algorithm for continuous speech recognition systems

10.1109/isspa.2007.4555528 ◽

2007 ◽

Cited By ~ 1

Author(s):

S. Momtazi ◽

H. Sameti ◽

M. Bahrani ◽

N. Hafezi

Keyword(s):

Speech Recognition ◽

Clustering Algorithm ◽

Continuous Speech ◽

Continuous Speech Recognition ◽

Recognition Systems ◽

Word Clustering

Download Full-text

A novel word clustering algorithm based on latent semantic analysis

1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings ◽

10.1109/icassp.1996.540318 ◽

2002 ◽

Cited By ~ 45

Author(s):

J.R. Bellegarda ◽

J.W. Butzberger ◽

Yen-Lu Chow ◽

N.B. Coccaro ◽

D. Naik

Keyword(s):

Latent Semantic Analysis ◽

Clustering Algorithm ◽

Semantic Analysis ◽

Word Clustering

Download Full-text

The epigenome and top-down causation

Interface Focus ◽

10.1098/rsfs.2011.0070 ◽

2011 ◽

Vol 2 (1) ◽

pp. 42-48 ◽

Cited By ~ 19

Author(s):

P. C. W. Davies

Keyword(s):

Systems Theory ◽

Physical Object ◽

Biological Information ◽

Top Down ◽

Molecular Sequences ◽

Information Theoretic ◽

Time Operation ◽

Real Time Operation ◽

Physical And Chemical ◽

Self Organizing

Genes store heritable information, but actual gene expression often depends on many so-called epigenetic factors, both physical and chemical, external to DNA. Epigenetic changes can be both reversible and heritable. The genome is associated with a physical object (DNA) with a specific location, whereas the epigenome is a global, systemic, entity. Furthermore, genomic information is tied to specific coded molecular sequences stored in DNA. Although epigenomic information can be associated with certain non-DNA molecular sequences, it is mostly not. Therefore, there does not seem to be a stored ‘epigenetic programme’ in the information-theoretic sense. Instead, epigenomic control is—to a large extent—an emergent self-organizing phenomenon, and the real-time operation of the epigenetic ‘project’ lies in the realm of nonlinear bifurcations, interlocking feedback loops, distributed networks, top-down causation and other concepts familiar from the complex systems theory. Lying at the heart of vital eukaryotic processes are chromatin structure, organization and dynamics. Epigenetics provides striking examples of how bottom-up genetic and top-down epigenetic causation intermingle. The fundamental question then arises of how causal efficacy should be attributed to biological information. A proposal is made to implement explicit downward causation by coupling information directly to the dynamics of chromatin, thus permitting the coevolution of dynamical laws and states, and opening up a new sector of dynamical systems theory that promises to display rich self-organizing and self-complexifying behaviour.

Download Full-text

A new information theoretic clustering algorithm using k-nn

2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP) ◽

10.1109/mlsp.2013.6661968 ◽

2013 ◽

Cited By ~ 1

Author(s):

Vidar Vikjord ◽

Robert Jenssen

Keyword(s):

Clustering Algorithm ◽

Information Theoretic ◽

New Information

Download Full-text

A Novel Word Clustering and Cluster Merging Technique for Named Entity Recognition

Journal of Intelligent Systems ◽

10.1515/jisys-2016-0074 ◽

2019 ◽

Vol 28 (1) ◽

pp. 15-30 ◽

Cited By ~ 1

Author(s):

Rakesh Patra ◽

Sujan Kumar Saha

Keyword(s):

Clustering Algorithm ◽

Conditional Random Fields ◽

Named Entity Recognition ◽

Entity Recognition ◽

Data Sets ◽

Clustering Techniques ◽

Named Entity ◽

Variable Window ◽

Word Clustering ◽

Cluster Merging

Abstract In this paper, we present a novel word clustering technique to capture contextual similarity among the words. Related word clustering techniques in the literature rely on the statistics of the words collected from a fixed and small word window. For example, the Brown clustering algorithm is based on bigram statistics of the words. However, in the sequential labeling tasks such as named entity recognition (NER), longer context words also carry valuable information. To capture this longer context information, we propose a new word clustering algorithm, which uses parse information of the sentences and a nonfixed word window. This proposed clustering algorithm, named as variable window clustering, performs better than Brown clustering in our experiments. Additionally, to use two different clustering techniques simultaneously in a classifier, we propose a cluster merging technique that performs an output level merging of two sets of clusters. To test the effectiveness of the approaches, we use two different NER data sets, namely, Hindi and BioCreative II Gene Mention Recognition. A baseline NER system is developed using conditional random fields classifier, and then the clusters using individual techniques as well as the merged technique are incorporated to improve the classifier. Experimental results demonstrate that the cluster merging technique is quite promising.

Download Full-text

A visual word clustering algorithm based on affinity propagation

2012 7th International Conference on System of Systems Engineering (SoSE) ◽

10.1109/sysose.2012.6333452 ◽

2012 ◽

Author(s):

Jian Zhao ◽

Cheng Sun ◽

Miao Ma ◽

Yu Xie

Keyword(s):

Clustering Algorithm ◽

Visual Word ◽

Affinity Propagation ◽

Word Clustering

Download Full-text

A Nonparametric Clustering Algorithm with a Quantile-Based Likelihood Estimator

Neural Computation ◽

10.1162/neco_a_00628 ◽

2014 ◽

Vol 26 (9) ◽

pp. 2074-2101 ◽

Cited By ~ 2

Author(s):

Hideitsu Hino ◽

Noboru Murata

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Conditional Entropy ◽

Tuning Parameter ◽

Clustering Methods ◽

Sampling Weights ◽

Data Set ◽

Information Theoretic ◽

Nonparametric Clustering ◽

Conditional Information

Clustering is a representative of unsupervised learning and one of the important approaches in exploratory data analysis. By its very nature, clustering without strong assumption on data distribution is desirable. Information-theoretic clustering is a class of clustering methods that optimize information-theoretic quantities such as entropy and mutual information. These quantities can be estimated in a nonparametric manner, and information-theoretic clustering algorithms are capable of capturing various intrinsic data structures. It is also possible to estimate information-theoretic quantities using a data set with sampling weight for each datum. Assuming the data set is sampled from a certain cluster and assigning different sampling weights depending on the clusters, the cluster-conditional information-theoretic quantities are estimated. In this letter, a simple iterative clustering algorithm is proposed based on a nonparametric estimator of the log likelihood for weighted data sets. The clustering algorithm is also derived from the principle of conditional entropy minimization with maximum entropy regularization. The proposed algorithm does not contain a tuning parameter. The algorithm is experimentally shown to be comparable to or outperform conventional nonparametric clustering methods.

Download Full-text

A novel information-theoretic clustering algorithm for robust, unsupervised classification

10.1109/isspa.2007.4555489 ◽

2007 ◽

Cited By ~ 1

Author(s):

Turgay Temel ◽

Nizamettin Aydin

Keyword(s):

Clustering Algorithm ◽

Unsupervised Classification ◽

Information Theoretic

Download Full-text