Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models

We tackle unsupervised part-of-speech (POS) tagging by learning hidden Markov models (HMMs) that are particularly well-suited for the problem. These HMMs, which we call anchor HMMs, assume that each tag is associated with at least one word that can have no other tag, which is a relatively benign condition for POS tagging (e.g., “the” is a word that appears only under the determiner tag). We exploit this assumption and extend the non-negative matrix factorization framework of Arora et al. (2013) to design a consistent estimator for anchor HMMs. In experiments, our algorithm is competitive with strong baselines such as the clustering method of Brown et al. (1992) and the log-linear model of Berg-Kirkpatrick et al. (2010). Furthermore, it produces an interpretable model in which hidden states are automatically lexicalized by words.

Download Full-text

Part of Speech Tagging Using Hidden Markov Models

International Journal of Advanced Statistics and IT&C for Economics and Life Sciences ◽

10.2478/ijasitels-2020-0005 ◽

2020 ◽

Vol 10 (1) ◽

pp. 31-42

Author(s):

Adrian Bărbulescu ◽

Daniel I. Morariu

Keyword(s):

Markov Models ◽

Hidden Markov ◽

Real Data ◽

Good Precision ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Wide Range ◽

N Gram ◽

Decoding Speed

AbstractIn this paper, we present a wide range of models based on less adaptive and adaptive approaches for a PoS tagging system. These parameters for the adaptive approach are based on the n-gram of the Hidden Markov Model, evaluated for bigram and trigram, and based on three different types of decoding method, in this case forward, backward, and bidirectional. We used the Brown Corpus for the training and the testing phase. The bidirectional trigram model almost reaches state of the art accuracy but is disadvantaged by the decoding speed time while the backward trigram reaches almost the same results with a way better decoding speed time. By these results, we can conclude that the decoding procedure it’s way better when it evaluates the sentence from the last word to the first word and although the backward trigram model is very good, we still recommend the bidirectional trigram model when we want good precision on real data.

Download Full-text

Lexicalized hidden Markov models for part-of-speech tagging

Proceedings of the 18th conference on Computational linguistics - ◽

10.3115/990820.990890 ◽

2000 ◽

Cited By ~ 13

Author(s):

Sang-Zoo Lee ◽

Jun-ichi Tsujii ◽

Hae-Chang Rim

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

TWO-STAGE MODEL SELECTION WITH PARAMETERS WEIGHTED HIDDEN MARKOV MODELS AND LIKELIHOOD RATIO FOR PART-OF-SPEECH TAGGING

Neural Network World ◽

10.14311/nnw.2012.22.014 ◽

2012 ◽

Vol 22 (3) ◽

pp. 245-262

Author(s):

Shichang Sun ◽

Hongbo Liu ◽

Pixi Zhao ◽

Hongfei Lin

Keyword(s):

Model Selection ◽

Hidden Markov Models ◽

Likelihood Ratio ◽

Markov Models ◽

Hidden Markov ◽

Stage Model ◽

Two Stage ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Use of Hidden Markov Models to Identify Background States Behind Risks of Cerebral Infarction and Ischemic Heart Disease

Journal of Mathematics Research ◽

10.5539/jmr.v9n1p24 ◽

2017 ◽

Vol 9 (1) ◽

pp. 24

Author(s):

Hiroshi Morimoto

Keyword(s):

Heart Disease ◽

Ischemic Heart Disease ◽

Human Health ◽

Hidden Markov Models ◽

Markov Models ◽

Meteorological Factors ◽

Hidden Markov ◽

Weather Patterns ◽

Weather And Human Health ◽

Hidden States

Cold exposure is often said to trigger the incidence of cerebral infarctions and ischemic heart disease. This association between weather and human health has attracted considerable interest, and has been explored using standard statistical techniques such as regression models. Meteorological factors, such as temperature, are controlled by background systems, notably weather patterns. Therefore, it is reasonable to posit that the incidence of diseases is similarly influenced by a background system. The aim of this paper was to identify and construct these respective background systems. Possible background states or "hidden states", behind the incidence of diseases were derived using the EM and Viterbi algorithms with in the framework of hidden Markov models (HMM). A self-organizing map (SOM) enabled identification of weather patterns, considered as background states behind meteorological factors. These background states were then compared, and the hidden states behind the incidence of diseases were identified by six weather patterns. This finding indicates new evidence of the links between weather and human health, shedding light on the association between changes in the weather and the onset of disease.

Download Full-text

A Text Categorization Model Based on Hidden Markov Models

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais539 ◽

2013 ◽

Cited By ~ 1

Author(s):

Kwan Yi ◽

Jamshid Beheshti

Keyword(s):

Text Categorization ◽

Classification Scheme ◽

Markov Models ◽

Hidden Markov ◽

Part Of Speech Tagging ◽

Digital Documents ◽

Part Of Speech ◽

Speech Tagging ◽

Standard Library ◽

Categorization Model

The Hidden Markov model (HMM) has been successfully used for speech recognition, part of speech tagging, and pattern recognition. In this study, we apply the HMM to automatically categorize digital documents into a standard library classification scheme. In the proposed framework, A HMM-based system is viewed as a model to generate a list of words and each document is seen as. . .

Download Full-text

Extending hidden Markov models to allow conditioning on previous observations

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720018500191 ◽

2018 ◽

Vol 16 (05) ◽

pp. 1850019 ◽

Cited By ~ 4

Author(s):

Ioannis A. Tamposis ◽

Margarita C. Theodoropoulou ◽

Konstantinos D. Tsirigos ◽

Pantelis G. Bagos

Keyword(s):

Hidden Markov Models ◽

Probabilistic Models ◽

Markov Models ◽

Transition Probabilities ◽

Hidden Markov ◽

Simple Extension ◽

Computational Molecular Biology ◽

Current State ◽

Observation Sequence ◽

Hidden States

Hidden Markov Models (HMMs) are probabilistic models widely used in computational molecular biology. However, the Markovian assumption regarding transition probabilities which dictates that the observed symbol depends only on the current state may not be sufficient for some biological problems. In order to overcome the limitations of the first order HMM, a number of extensions have been proposed in the literature to incorporate past information in HMMs conditioning either on the hidden states, or on the observations, or both. Here, we implement a simple extension of the standard HMM in which the current observed symbol (amino acid residue) depends both on the current state and on a series of observed previous symbols. The major advantage of the method is the simplicity in the implementation, which is achieved by properly transforming the observation sequence, using an extended alphabet. Thus, it can utilize all the available algorithms for the training and decoding of HMMs. We investigated the use of several encoding schemes and performed tests in a number of important biological problems previously studied by our team (prediction of transmembrane proteins and prediction of signal peptides). The evaluation shows that, when enough data are available, the performance increased by 1.8%–8.2% and the existing prediction methods may improve using this approach. The methods, for which the improvement was significant (PRED-TMBB2, PRED-TAT and HMM-TM), are available as web-servers freely accessible to academic users at www.compgen.org/tools/ .

Download Full-text

Part-of-speech tagging of Modern Hebrew text

Natural Language Engineering ◽

10.1017/s135132490700455x ◽

2008 ◽

Vol 14 (2) ◽

pp. 223-251 ◽

Cited By ~ 11

Author(s):

ROY BAR-HAIM ◽

KHALIL SIMA'AN ◽

YOAD WINTER

Keyword(s):

Markov Models ◽

Full Generality ◽

Semitic Languages ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Word Level ◽

Modern Hebrew ◽

Architectural Decision ◽

Pos Tagger

AbstractWords in Semitic texts often consist of a concatenation ofword segments, each corresponding to a part-of-speech (POS) category. Semitic words may be ambiguous with regard to their segmentation as well as to the POS tags assigned to each segment. When designing POS taggers for Semitic languages, a major architectural decision concerns the choice of the atomic input tokens (terminal symbols). If the tokenization is at the word level, the output tags must be complex, and represent both the segmentation of the word and the POS tag assigned to each word segment. If the tokenization is at the segment level, the input itself must encode the different alternative segmentations of the words, while the output consists of standard POS tags. Comparing these two alternatives is not trivial, as the choice between them may have global effects on the grammatical model. Moreover, intermediate levels of tokenization between these two extremes are conceivable, and, as we aim to show, beneficial. To the best of our knowledge, the problem of tokenization for POS tagging of Semitic languages has not been addressed before in full generality. In this paper, we study this problem for the purpose of POS tagging of Modern Hebrew texts. After extensive error analysis of the two simple tokenization models, we propose a novel, linguistically motivated, intermediate tokenization model that gives better performance for Hebrew over the two initial architectures. Our study is based on the well-known hidden Markov models (HMMs). We start out from a manually devised morphological analyzer and a very small annotated corpus, and describe how to adapt an HMM-based POS tagger for both tokenization architectures. We present an effective technique for smoothing the lexical probabilities using an untagged corpus, and a novel transformation for casting the segment-level tagger in terms of a standard, word-level HMM implementation. The results obtained using our model are on par with the best published results on Modern Standard Arabic, despite the much smaller annotated corpus available for Modern Hebrew.

Download Full-text

FactorialHMM: Fast and exact inference in factorial hidden Markov models

10.1101/383380 ◽

2018 ◽

Author(s):

Regev Schweiger ◽

Yaniv Erlich ◽

Shai Carmi

Keyword(s):

Hidden Markov Models ◽

Posterior Probability ◽

Markov Models ◽

Hidden Markov ◽

Cartesian Product ◽

Exact Inference ◽

Running Time ◽

Time Scaling ◽

Multiple Processes ◽

Hidden States

MotivationHidden Markov models (HMMs) are powerful tools for modeling processes along the genome. In a standard genomic HMM, observations are drawn, at each genomic position, from a distribution whose parameters depend on a hidden state; the hidden states evolve along the genome as a Markov chain. Often, the hidden state is the Cartesian product of multiple processes, each evolving independently along the genome. Inference in these so-called Factorial HMMs has a naïve running time that scales as the square of the number of possible states, which by itself increases exponentially with the number of subchains; such a running time scaling is impractical for many applications. While faster algorithms exist, there is no available implementation suitable for developing bioinformatics applications.ResultsWe developed FactorialHMM, a Python package for fast exact inference in Factorial HMMs. Our package allows simulating either directly from the model or from the posterior distribution of states given the observations. Additionally, we allow the inference of all key quantities related to HMMs: (1) the (Viterbi) sequence of states with the highest posterior probability; (2) the likelihood of the data; and (3) the posterior probability (given all observations) of the marginal and pairwise state probabilities. The running time and space requirement of all procedures is linearithmic in the number of possible states. Our package is highly modular, providing the user with maximal flexibility for developing downstream applications.Availabilityhttps://github.com/regevs/factorialhmm

Download Full-text

POS Tagging Bahasa Indonesia Dengan HMM dan Rule Based

Jurnal Informatika ◽

10.21460/inf.2012.82.125 ◽

2013 ◽

Vol 8 (2) ◽

Cited By ~ 1

Author(s):

Kathryn Widhiyanti ◽

Agus Harjoko

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Word Class ◽

Rule Based ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Class Labelling ◽

Speech Tagging

The research conduct a Part of Speech Tagging (POS-tagging) for text in Indonesian language, supporting another process in digitising natural language e.g. Indonesian language text parsing. POS-tagging is an automated process of labelling word classes for certain word in sentences (Jurafsky and Martin, 2000). The escalated issue is how to acquire an accurate word class labelling in sentence domain. The author would like to propose a method which combine Hidden Markov Model and Rule Based method. The expected outcome in this research is a better accurary in word class labelling, resulted by only using Hidden Markov Model. The labelling results –from Hidden Markov Model– are refined by validating with certain rule, composed by the used corpus automatically. From the conducted research through some POST document, using Hidden Markov Model, produced 100% as the highest accurary for identical text within corpus. For different text within the referenced corpus, used words subjected in corpus, produced 92,2% for the highest accurary.

Download Full-text

Tensor Train Spectral Method for Learning of Hidden Markov Models (HMM)

Computational Methods in Applied Mathematics ◽

10.1515/cmam-2018-0027 ◽

2019 ◽

Vol 19 (1) ◽

pp. 93-99 ◽

Cited By ~ 1

Author(s):

Maxim A. Kuznetsov ◽

Ivan V. Oseledets

Keyword(s):

Probability Distribution ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Joint Probability ◽

Joint Probability Distribution ◽

Order Tensor ◽

Spectral Learning ◽

Frobenius Distance ◽

Hidden States

AbstractWe propose a new algorithm for spectral learning of Hidden Markov Models (HMM). In contrast to the standard approach, we do not estimate the parameters of the HMM directly, but construct an estimate for the joint probability distribution. The idea is based on the representation of a joint probability distribution as an N-th-order tensor with low ranks represented in the tensor train (TT) format. Using TT-format, we get an approximation by minimizing the Frobenius distance between the empirical joint probability distribution and tensors with low TT-ranks with core tensors normalization constraints. We propose an algorithm for the solution of the optimization problem that is based on the alternating least squares (ALS) approach and develop its fast version for sparse tensors. The order of the tensor d is a parameter of our algorithm. We have compared the performance of our algorithm with the existing algorithm by Hsu, Kakade and Zhang proposed in 2009 and found that it is much more robust if the number of hidden states is overestimated.

Download Full-text