scholarly journals An Asymmetric Distribution with Heavy Tails and Its Expectation–Maximization (EM) Algorithm Implementation

Symmetry ◽  
2019 ◽  
Vol 11 (9) ◽  
pp. 1150 ◽  
Author(s):  
Neveka M. Olmos ◽  
Osvaldo Venegas ◽  
Yolanda M. Gómez ◽  
Yuri A. Iriarte

In this paper we introduce a new distribution constructed on the basis of the quotient of two independent random variables whose distributions are the half-normal distribution and a power of the exponential distribution with parameter 2 respectively. The result is a distribution with greater kurtosis than the well known half-normal and slashed half-normal distributions. We studied the general density function of this distribution, with some of its properties, moments, and its coefficients of asymmetry and kurtosis. We developed the expectation–maximization algorithm and present a simulation study. We calculated the moment and maximum likelihood estimators and present three illustrations in real data sets to show the flexibility of the new model.

Mathematics ◽  
2020 ◽  
Vol 8 (7) ◽  
pp. 1116 ◽  
Author(s):  
Francisco A. Segovia ◽  
Yolanda M. Gómez ◽  
Osvaldo Venegas ◽  
Héctor W. Gómez

In this paper we introduce a distribution which is an extension of the power Maxwell distribution. This new distribution is constructed based on the quotient of two independent random variables, the distributions of which are the power Maxwell distribution and a function of the uniform distribution (0,1) respectively. Thus the result is a distribution with greater kurtosis than the power Maxwell. We study the general density of this distribution, and some properties, moments, asymmetry and kurtosis coefficients. Maximum likelihood and moments estimators are studied. We also develop the expectation–maximization algorithm to make a simulation study and present two applications to real data.


Symmetry ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 1226
Author(s):  
Inmaculada Barranco-Chamorro ◽  
Yuri A. Iriarte ◽  
Yolanda M. Gómez ◽  
Juan M. Astorga ◽  
Héctor W. Gómez

Specifying a proper statistical model to represent asymmetric lifetime data with high kurtosis is an open problem. In this paper, the three-parameter, modified, slashed, generalized Rayleigh family of distributions is proposed. Its structural properties are studied: stochastic representation, probability density function, hazard rate function, moments and estimation of parameters via maximum likelihood methods. As merits of our proposal, we highlight as particular cases a plethora of lifetime models, such as Rayleigh, Maxwell, half-normal and chi-square, among others, which are able to accommodate heavy tails. A simulation study and applications to real data sets are included to illustrate the use of our results.


2019 ◽  
pp. 1471082X1989093 ◽  
Author(s):  
Antonio Punzo ◽  
Cristina Tortora

The multivariate contaminated normal (MCN) distribution represents a simple heavy-tailed generalization of the multivariate normal (MN) distribution to model elliptical contoured scatters in the presence of mild outliers (also referred to as ‘bad’ points herein) and automatically detect bad points. The price of these advantages is two additional parameters: proportion of good observations and degree of contamination. However, in a multivariate setting, only one proportion of good observations and only one degree of contamination may be limiting. To overcome this limitation, we propose a multiple scaled contaminated normal (MSCN) distribution. Among its parameters, we have an orthogonal matrix Γ. In the space spanned by the vectors (principal components) of Γ, there is a proportion of good observations and a degree of contamination for each component. Moreover, each observation has a posterior probability of being good with respect to each principal component. Thanks to this probability, the method provides directional robust estimates of the parameters of the nested MN and automatic directional detection of bad points. The term ‘directional’ is added to specify that the method works separately for each principal component. Mixtures of MSCN distributions are also proposed, and an expectation-maximization algorithm is used for parameter estimation. Real and simulated data are considered to show the usefulness of our mixture with respect to well-established mixtures of symmetric distributions with heavy tails.


2020 ◽  
Vol 7 (2) ◽  
pp. 214
Author(s):  
Yane Laheroi Nainel ◽  
Efori Buulolo ◽  
Ikwan Lubis

Data mining, often also called knowledge discovery in database (KDD), is an activity that includes the collection, use of historical data to find regularities, patterns or relationships in large data sets. The output of data mining can be used to improve decision making in the future. PT. Pyridam Frama Tbk is a multi-national company that produces pharmaceuticals. The problems that often occur at PT Pyrida Farma are estimation problems such as out of stock of goods at distributors, lack of labor, running out of raw materials at the factory. Another problem contained in the agency is that it does not yet have a system to predict the estimated drug sales each year, so we need an algorithm, the Expectation Maximization Algorithm. then with this made the Application of Data Mining for Estimating Drug Sales Based on the Effect of Brand Image. Expectation Maximization algorithm which is a method that supports in estimating or predicting sales target estimates for the coming period. Algorithm testing is done using SPSS and MYSQL software. From the results of research that has been done it can help the PT. Pyridam Farma to make it easier to predict drug sales estimates by using SPSS Software


2017 ◽  
Vol 42 (3) ◽  
pp. 179-191 ◽  
Author(s):  
Bor-Chen Kuo ◽  
Chun-Hua Chen ◽  
Jimmy de la Torre

At present, most existing cognitive diagnosis models (CDMs) are designed to either identify the presence and absence of skills or misconceptions, but not both. This article proposes a CDM that can be used to simultaneously identify what skills and misconceptions students possess. In addition, it proposes the use of the expectation-maximization algorithm to estimate the model parameters. A simulation study is conducted to evaluate the viability of the proposed model and algorithm. Real data are analyzed to demonstrate the applicability of the proposed model, and compare it with existing CDMs. Furthermore, a real data–based simulation study is conducted to determine how the correct classification rates in the context of the proposed model can be improved. Issues related to the proposed model and future research are discussed.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Nada A. Alqahtani ◽  
Zakiah I. Kalantan

Data scientists use various machine learning algorithms to discover patterns in large data that can lead to actionable insights. In general, high-dimensional data are reduced by obtaining a set of principal components so as to highlight similarities and differences. In this work, we deal with the reduced data using a bivariate mixture model and learning with a bivariate Gaussian mixture model. We discuss a heuristic for detecting important components by choosing the initial values of location parameters using two different techniques: cluster means, k-means and hierarchical clustering, and default values in the “mixtools” R package. The parameters of the model are obtained via an expectation maximization algorithm. The criteria from Bayesian point are evaluated for both techniques, demonstrating that both techniques are efficient with respect to computation capacity. The effectiveness of the discussed techniques is demonstrated through a simulation study and using real data sets from different fields.


2005 ◽  
Vol 128 (3) ◽  
pp. 479-483
Author(s):  
Hani Hamdan ◽  
Gérard Govaert

In this paper, we present a new and original mixture model approach for acoustic emission (AE) data clustering. AE techniques have been used in a variety of applications in industrial plants. These techniques can provide the most sophisticated monitoring test and can generally be done with the plant/pressure equipment operating at several conditions. Since the AE clusters may present several constraints (different proportions, volumes, orientations, and shapes), we propose to base the AE cluster analysis on Gaussian mixture models, which will be, in such situations, a powerful approach. Furthermore, the diagonal Gaussian mixture model seems to be well adapted to the detection and monitoring of defect classes since the weldings of cylindrical pressure equipment are lengthened horizontally and vertically (cluster shapes lengthened along the axes). The EM (Expectation-Maximization) algorithm applied to a diagonal Gaussian mixture model provides a satisfactory solution but the real time constraints imposed in our problem make the application of this algorithm impossible if the number of points becomes too big. The solution that we propose is to use the CEM (Classification Expectation-Maximization) algorithm, which converges faster and generates comparable solutions in terms of resulting partition. The practical results on real data are very satisfactory from the experts point of view.


2020 ◽  
Vol 18 (2) ◽  
Author(s):  
Ružica Škurla Babić ◽  
Maja Ozmec-Ban ◽  
Jasmin Bajić

Airline revenue management systems are used to calculate booking limits on each fare class to maximize expected revenue for all future flight departures. Their performance depends critically on the forecasting module that uses historical data to project future quantities of demand. Those data are censored or constrained by the imposed booking limits and do not represent true demand since rejected requests are not recorded. Eight unconstraining methods that transform the censored data into more accurate estimates of actual historical demand ranging from naive methods such as discarding all censored observation, to complex, such as Expectation Maximization Algorithm and Projection Detruncation Algorithm, are analyzed and their accuracy is compared. Those methods are evaluated and tested on simulated data sets generated by ICE V2.0 software: first, the data sets that represent true demand were produced, then the aircraft capacity was reduced and EMSRb booking limits for every booking class were calculated. These limits constrained the original demand data at various points of the booking process and the corresponding censored data sets were obtained. The unconstrained methods were applied to the censored observations and the resulting unconstrained data were compared to the actual demand data and their performance was evaluated.


2021 ◽  
Author(s):  
Kehinde Olobatuyi

Abstract Similar to many Machine Learning models, both accuracy and speed of the Cluster weighted models (CWMs) can be hampered by high-dimensional data, leading to previous works on a parsimonious technique to reduce the effect of ”Curse of dimensionality” on mixture models. In this work, we review the background study of the cluster weighted models (CWMs). We further show that parsimonious technique is not sufficient for mixture models to thrive in the presence of huge high-dimensional data. We discuss a heuristic for detecting the hidden components by choosing the initial values of location parameters using the default values in the ”FlexCWM” R package. We introduce a dimensionality reduction technique called T-distributed stochastic neighbor embedding (TSNE) to enhance the parsimonious CWMs in high-dimensional space. Originally, CWMs are suited for regression but for classification purposes, all multi-class variables are transformed logarithmically with some noise. The parameters of the model are obtained via expectation maximization algorithm. The effectiveness of the discussed technique is demonstrated using real data sets from different fields.


Sign in / Sign up

Export Citation Format

Share Document