Maximum likelihood training of neural networks

Author(s):  
H. Gish
2010 ◽  
Vol 161 (21) ◽  
pp. 2795-2807 ◽  
Author(s):  
Hsu-Kun Wu ◽  
Jer-Guang Hsieh ◽  
Yih-Lon Lin ◽  
Jyh-Horng Jeng

2020 ◽  
Vol 37 (12) ◽  
pp. 3632-3641
Author(s):  
Alina F Leuchtenberger ◽  
Stephen M Crotty ◽  
Tamara Drucks ◽  
Heiko A Schmidt ◽  
Sebastian Burgstaller-Muehlbacher ◽  
...  

Abstract Maximum likelihood and maximum parsimony are two key methods for phylogenetic tree reconstruction. Under certain conditions, each of these two methods can perform more or less efficiently, resulting in unresolved or disputed phylogenies. We show that a neural network can distinguish between four-taxon alignments that were evolved under conditions susceptible to either long-branch attraction or long-branch repulsion. When likelihood and parsimony methods are discordant, the neural network can provide insight as to which tree reconstruction method is best suited to the alignment. When applied to the contentious case of Strepsiptera evolution, our method shows robust support for the current scientific view, that is, it places Strepsiptera with beetles, distant from flies.


1999 ◽  
Vol 11 (2) ◽  
pp. 541-563 ◽  
Author(s):  
Anders Krogh ◽  
Søren Kamaric Riis

A general framework for hybrids of hidden Markov models (HMMs) and neural networks (NNs) called hidden neural networks (HNNs) is described. The article begins by reviewing standard HMMs and estimation by conditional maximum likelihood, which is used by the HNN. In the HNN, the usual HMM probability parameters are replaced by the outputs of state-specific neural networks. As opposed to many other hybrids, the HNN is normalized globally and therefore has a valid probabilistic interpretation. All parameters in the HNN are estimated simultaneously according to the discriminative conditional maximum likelihood criterion. The HNN can be viewed as an undirected probabilistic independence network (a graphical model), where the neural networks provide a compact representation of the clique functions. An evaluation of the HNN on the task of recognizing broad phoneme classes in the TIMIT database shows clear performance gains compared to standard HMMs tested on the same task.


2002 ◽  
Author(s):  
Lisa M. Kinnard ◽  
Shih-Chung B. Lo ◽  
Paul C. Wang ◽  
Matthew T. Freedman ◽  
Mohammed F. Chouikha

2019 ◽  
Vol 490 (1) ◽  
pp. 371-384 ◽  
Author(s):  
Aristide Doussot ◽  
Evan Eames ◽  
Benoit Semelin

ABSTRACT Within the next few years, the Square Kilometre Array (SKA) or one of its pathfinders will hopefully detect the 21-cm signal fluctuations from the Epoch of Reionization (EoR). Then, the goal will be to accurately constrain the underlying astrophysical parameters. Currently, this is mainly done with Bayesian inference. Recently, neural networks have been trained to perform inverse modelling and, ideally, predict the maximum-likelihood values of the model parameters. We build on these by improving the accuracy of the predictions using several supervised learning methods: neural networks, kernel regressions, or ridge regressions. Based on a large training set of 21-cm power spectra, we compare the performances of these methods. When using a noise-free signal generated by the model itself as input, we improve on previous neural network accuracy by one order of magnitude and, using a local ridge kernel regression, we gain another factor of a few. We then reach an accuracy level on the reconstruction of the maximum-likelihood parameter values of a few per cents compared the 1σ confidence level due to SKA thermal noise (as estimated with Bayesian inference). For an input signal affected by an SKA-like thermal noise but constrained to yield the same maximum-likelihood parameter values as the noise-free signal, our neural network exhibits an error within half of the 1σ confidence level due to the SKA thermal noise. This accuracy improves to 10$\, {\rm per\, cent}$ of the 1σ level when using the local ridge kernel. We are thus reaching a performance level where supervised learning methods are a viable alternative to determine the maximum-likelihood parameters values.


Author(s):  
P. A. Gutiérrez ◽  
C. Hervás ◽  
F. J. Martínez-Estudillo ◽  
M. Carbonero

Multi-class pattern recognition has a wide range of applications including handwritten digit recognition (Chiang, 1998), speech tagging and recognition (Athanaselis, Bakamidis, Dologlou, Cowie, Douglas-Cowie & Cox, 2005), bioinformatics (Mahony, Benos, Smith & Golden, 2006) and text categorization (Massey, 2003). This chapter presents a comprehensive and competitive study in multi-class neural learning which combines different elements, such as multilogistic regression, neural networks and evolutionary algorithms. The Logistic Regression model (LR) has been widely used in statistics for many years and has recently been the object of extensive study in the machine learning community. Although logistic regression is a simple and useful procedure, it poses problems when is applied to a real-problem of classification, where frequently we cannot make the stringent assumption of additive and purely linear effects of the covariates. A technique to overcome these difficulties is to augment/replace the input vector with new variables, basis functions, which are transformations of the input variables, and then to use linear models in this new space of derived input features. Methods like sigmoidal feed-forward neural networks (Bishop, 1995), generalized additive models (Hastie & Tibshirani, 1990), and PolyMARS (Kooperberg, Bose & Stone, 1997), which is a hybrid of Multivariate Adaptive Regression Splines (MARS) (Friedman, 1991) specifically designed to handle classification problems, can all be seen as different nonlinear basis function models. The major drawback of these approaches is stating the typology and the optimal number of the corresponding basis functions. Logistic regression models are usually fit by maximum likelihood, where the Newton-Raphson algorithm is the traditional way to estimate the maximum likelihood a-posteriori parameters. Typically, the algorithm converges, since the log-likelihood is concave. It is important to point out that the computation of the Newton-Raphson algorithm becomes prohibitive when the number of variables is large. Product Unit Neural Networks, PUNN, introduced by Durbin and Rumelhart (Durbin & Rumelhart, 1989), are an alternative to standard sigmoidal neural networks and are based on multiplicative nodes instead of additive ones.


Sign in / Sign up

Export Citation Format

Share Document