scholarly journals The Relative Inefficiency of Sequence Weights Approaches in Determining a Nucleotide Position Weight Matrix

Author(s):  
Lee A Newberg ◽  
Lee Ann McCue ◽  
Charles E Lawrence

Approaches based upon sequence weights, to construct a position weight matrix of nucleotides from aligned inputs, are popular but little effort has been expended to measure their quality.We derive optimal sequence weights that minimize the sum of the variances of the estimators of base frequency parameters for sequences related by a phylogenetic tree. Using these we find that approaches based upon sequence weights can perform very poorly in comparison to approaches based upon a theoretically optimal maximum-likelihood method in the inference of the parameters of a position-weight matrix. Specifically, we find that among a collection of primate sequences, even an optimal sequences-weights approach is only 51% as efficient as the maximum-likelihood approach in inferences of base frequency parameters.We also show how to employ the variance estimators to obtain a greedy ordering of species for sequencing. Application of this ordering for the weighted estimators to a primate collection yields a curve with a long plateau that is not observed with maximum-likelihood estimators. This plateau indicates that the use of weighted estimators on these data seriously limits the utility of obtaining the sequences of more than two or three additional species.

Symmetry ◽  
2019 ◽  
Vol 11 (12) ◽  
pp. 1509
Author(s):  
Guillermo Martínez-Flórez ◽  
Artur J. Lemonte ◽  
Hugo S. Salinas

The univariate power-normal distribution is quite useful for modeling many types of real data. On the other hand, multivariate extensions of this univariate distribution are not common in the statistic literature, mainly skewed multivariate extensions that can be bimodal, for example. In this paper, based on the univariate power-normal distribution, we extend the univariate power-normal distribution to the multivariate setup. Structural properties of the new multivariate distributions are established. We consider the maximum likelihood method to estimate the unknown parameters, and the observed and expected Fisher information matrices are also derived. Monte Carlo simulation results indicate that the maximum likelihood approach is quite effective to estimate the model parameters. An empirical application of the proposed multivariate distribution to real data is provided for illustrative purposes.


Genetics ◽  
2000 ◽  
Vol 155 (2) ◽  
pp. 981-987 ◽  
Author(s):  
Nicolas Galtier ◽  
Frantz Depaulis ◽  
Nicholas H Barton

Abstract A coalescence-based maximum-likelihood method is presented that aims to (i) detect diversity-reducing events in the recent history of a population and (ii) distinguish between demographic (e.g., bottlenecks) and selective causes (selective sweep) of a recent reduction of genetic variability. The former goal is achieved by taking account of the distortion in the shape of gene genealogies generated by diversity-reducing events: gene trees tend to be more star-like than under the standard coalescent. The latter issue is addressed by comparing patterns between loci: demographic events apply to the whole genome whereas selective events affect distinct regions of the genome to a varying extent. The maximum-likelihood approach allows one to estimate the time and strength of diversity-reducing events and to choose among competing hypotheses. An application to sequence data from an African population of Drosophila melanogaster shows that the bottleneck hypothesis is unlikely and that one or several selective sweeps probably occurred in the recent history of this population.


1999 ◽  
Vol 50 (4) ◽  
pp. 307 ◽  
Author(s):  
You-Gan Wang

A simple stochastic model of a fish population subject to natural and fishing mortalities is described. The fishing effort is assumed to vary over different periods but to be constant within each period. A maximum-likelihood approach is developed for estimating natural mortality (M) and the catchability coefficient (q) simultaneously from catch-and-effort data. If there is not enough contrast in the data to provide reliable estimates of both M and q, as is often the case in practice, the method can be used to obtain the best possible values of q for a range of possible values of M. These techniques are illustrated with tiger prawn (Penaeus semisulcatus) data from the Northern Prawn Fishery of Australia.


2014 ◽  
Vol Volume 17 - 2014 - Special... ◽  
Author(s):  
Fabien Campillo ◽  
Dominique Hervé ◽  
Angelo Raherinirina ◽  
Rivo Rakotozafy

International audience We present a Markov model of a land-use dynamic along a forest corridor of Madagascar. A first approach by the maximum likelihood approach leads to a model with an absorbing state. We study the quasi-stationary distribution law of the model and the law of the hitting time of the absorbing state. According to experts, a transition not present in the data must be added to the model: this is not possible by the maximum likelihood method and we make of the Bayesian approach. We use a Markov chain Monte Carlo method to infer the transition matrix which in this case admits an invariant distribution law. Finally we analyze the two identified dynamics. Nous présentons un modèle de Markov d’une dynamique d’utilisation des sols le long d’uncorridor forestier de Madagascar. Une première approche par maximum de vraisemblance conduit àun modèle avec un état absorbant. Nous étudions la loi de probabilité quasi-stationnaire du modèle etla loi du temps d’atteinte de l’état absorbant. Selon les experts, une transition qui n’est pas présentedans les données doit néanmoins être ajoutée au modèle: ceci n’est pas possible par la méthodedu maximum de vraisemblance et nous devons faire appel à une approche bayésienne. Nous faisonsappel à une technique d’approximation de Monte Carlo par chaîne de Markov pour identifier la matricede transition qui dans ce cas admet une loi de probabilité invariante. Enfin nous analysons les deuxdynamiques ainsi identifiés.


2002 ◽  
Vol 59 (6) ◽  
pp. 976-986 ◽  
Author(s):  
Geoff M Laslett ◽  
J Paige Eveson ◽  
Tom Polacheck

We describe a novel maximum likelihood method for fitting general growth curves to tag–recapture data. The growth model allows for the asymptotic length and the time to tagging to vary from individual to individual, with other parameters being fixed. Unlike the Fabens approach, we do not take differences to fit the parameters, but instead model the joint density of the release and recapture lengths. We simulate data to examine the bias and precision of the estimated parameters obtained using our fitting method. We include simulations for which the time to tagging model is incorrect, but find that the growth curve is usually still fitted with small bias. Furthermore, we introduce a new growth curve that allows for different growth rates for juveniles and adults. The new growth curve needs sufficient data coverage before and after the transition from juvenile to adult for all parameters to be estimated precisely. We illustrate the method on real data by fitting this new growth curve to southern bluefin tuna tag–recapture data.


Author(s):  
Anggis Sagitarisman ◽  
Aceng Komarudin Mutaqin

AbstractCar manufacturers in Indonesia need to determine reasonable warranty costs that do not burden companies or consumers. Several statistical approaches have been developed to analyze warranty costs. One of them is the Gertsbakh-Kordonsky method which reduces the two-dimensional warranty problem to one dimensional. In this research, we apply the Gertsbakh-Kordonsky method to estimate the warranty cost for car type A in XYZ company. The one-dimensional data will be tested using the Kolmogorov-Smirnov to determine its distribution and the parameter of distribution will be estimated using the maximum likelihood method. There are three approaches to estimate the parameter of the distribution. The difference between these three approaches is in the calculation of mileage for units that do not claim within the warranty period. In the application, we use claim data for the car type A. The data exploration indicates the failure of car type A is mostly due to the age of the vehicle. The Kolmogorov-Smirnov shows that the most appropriate distribution for the claim data is the three-parameter Weibull. Meanwhile, the estimated using the Gertsbakh-Kordonsky method shows that the warranty costs for car type A are around 3.54% from the selling price of this car unit without warranty i.e. around Rp. 4,248,000 per unit.Keywords: warranty costs; the Gertsbakh-Kordonsky method; maximum likelihood estimation; Kolmogorov-Smirnov test.                                   AbstrakPerusahaan produsen mobil di Indonesia perlu menentukan biaya garansi yang bersifat wajar tidak memberatkan perusahaan maupun konsumen. Beberapa pendekatan statistik telah dikembangkan untuk menganalisis biaya garansi. Salah satunya adalah metode Gertsbakh-Kordonsky yang mereduksi masalah garansi dua dimensi menjadi satu dimensi. Pada penelitian ini, metode Gertsbakh-Kordonsky akan digunakan untuk mengestimasi biaya garansi untuk mobil tipe A pada perusahaan XYZ. Data satu dimensi hasil reduksi diuji kecocokan distribusinya menggunakan uji kecocokan Kolmogorov-Smirnov dan taksiran parameter distribusinya menggunakan metode penaksir kemungkinan maksimum. Ada tiga pendekatan yang digunakan untuk menaksir parameter distribusi. Perbedaan dari ketiga pendekatan tersebut terletak pada perhitungan jarak tempuh untuk unit yang tidak melakukan klaim dalam periode garansi. Sebagai bahan aplikasi, kami menggunakan data klaim unit mobil tipe A. Hasil eksplorasi data menunjukkan bahwa kegagalan mobil tipe A lebih banyak disebabkan karena faktor usia kendaraan. Hasil uji kecocokan distribusi untuk data hasil reduksi menunjukkan bahwa distribusi yang cocok adalah distribusi Weibull 3-parameter. Sementara itu, hasil perhitungan taksiran biaya garansi menunjukan bahwa taksiran biaya garansi untuk unit mobil tipe A sekitar 3,54% dari harga jual unit mobil tipe A tanpa garansi, atau sekitar Rp. 4.248.000,- per unit.Kata Kunci: biaya garansi; metode Gertsbakh-Kordonsky; penaksiran kemungkinan maksimum; uji Kolmogorov-Smirnov.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Guanglei Xu ◽  
William S. Oates

AbstractRestricted Boltzmann Machines (RBMs) have been proposed for developing neural networks for a variety of unsupervised machine learning applications such as image recognition, drug discovery, and materials design. The Boltzmann probability distribution is used as a model to identify network parameters by optimizing the likelihood of predicting an output given hidden states trained on available data. Training such networks often requires sampling over a large probability space that must be approximated during gradient based optimization. Quantum annealing has been proposed as a means to search this space more efficiently which has been experimentally investigated on D-Wave hardware. D-Wave implementation requires selection of an effective inverse temperature or hyperparameter ($$\beta $$ β ) within the Boltzmann distribution which can strongly influence optimization. Here, we show how this parameter can be estimated as a hyperparameter applied to D-Wave hardware during neural network training by maximizing the likelihood or minimizing the Shannon entropy. We find both methods improve training RBMs based upon D-Wave hardware experimental validation on an image recognition problem. Neural network image reconstruction errors are evaluated using Bayesian uncertainty analysis which illustrate more than an order magnitude lower image reconstruction error using the maximum likelihood over manually optimizing the hyperparameter. The maximum likelihood method is also shown to out-perform minimizing the Shannon entropy for image reconstruction.


Author(s):  
Vijitashwa Pandey ◽  
Deborah Thurston

Design for disassembly and reuse focuses on developing methods to minimize difficulty in disassembly for maintenance or reuse. These methods can gain substantially if the relationship between component attributes (material mix, ease of disassembly etc.) and their likelihood of reuse or disposal is understood. For products already in the marketplace, a feedback approach that evaluates willingness of manufacturers or customers (decision makers) to reuse a component can reveal how attributes of a component affect reuse decisions. This paper introduces some metrics and combines them with ones proposed in literature into a measure that captures the overall value of a decision made by the decision makers. The premise is that the decision makers would choose a decision that has the maximum value. Four decisions are considered regarding a component’s fate after recovery ranging from direct reuse to disposal. A method on the lines of discrete choice theory is utilized that uses maximum likelihood estimates to determine the parameters that define the value function. The maximum likelihood method can take inputs from actual decisions made by the decision makers to assess the value function. This function can be used to determine the likelihood that the component takes a certain path (one of the four decisions), taking as input its attributes, which can facilitate long range planning and also help determine ways reuse decisions can be influenced.


Sign in / Sign up

Export Citation Format

Share Document