Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Author(s):  
Salvatore D. Tomarchio ◽  
Luca Bagnato ◽  
Antonio Punzo
2013 ◽  
Vol 11 (03) ◽  
pp. 1341007 ◽  
Author(s):  
ALBERTO COZZINI ◽  
AJAY JASRA ◽  
GIOVANNI MONTANA

Cluster analysis of biological samples using gene expression measurements is a common task which aids the discovery of heterogeneous biological sub-populations having distinct mRNA profiles. Several model-based clustering algorithms have been proposed in which the distribution of gene expression values within each sub-group is assumed to be Gaussian. In the presence of noise and extreme observations, a mixture of Gaussian densities may over-fit and overestimate the true number of clusters. Moreover, commonly used model-based clustering algorithms do not generally provide a mechanism to quantify the relative contribution of each gene to the final partitioning of the data. We propose a penalized mixture of Student's t distributions for model-based clustering and gene ranking. Together with a resampling procedure, the proposed approach provides a means for ranking genes according to their contributions to the clustering process. Experimental results show that the algorithm performs well comparably to traditional Gaussian mixtures in the presence of outliers and longer tailed distributions. The algorithm also identifies the true informative genes with high sensitivity, and achieves improved model selection. An illustrative application to breast cancer data is also presented which confirms established tumor sub-classes.


Metrika ◽  
2021 ◽  
Author(s):  
Rocío Maehara ◽  
Heleno Bolfarine ◽  
Filidor Vilca ◽  
N. Balakrishnan

Author(s):  
Charles Bouveyron ◽  
Gilles Celeux ◽  
T. Brendan Murphy ◽  
Adrian E. Raftery

Author(s):  
Stefan Thurner ◽  
Rudolf Hanel ◽  
Peter Klimekl

Phenomena, systems, and processes are rarely purely deterministic, but contain stochastic,probabilistic, or random components. For that reason, a probabilistic descriptionof most phenomena is necessary. Probability theory provides us with the tools for thistask. Here, we provide a crash course on the most important notions of probabilityand random processes, such as odds, probability, expectation, variance, and so on. Wedescribe the most elementary stochastic event—the trial—and develop the notion of urnmodels. We discuss basic facts about random variables and the elementary operationsthat can be performed on them. We learn how to compose simple stochastic processesfrom elementary stochastic events, and discuss random processes as temporal sequencesof trials, such as Bernoulli and Markov processes. We touch upon the basic logic ofBayesian reasoning. We discuss a number of classical distribution functions, includingpower laws and other fat- or heavy-tailed distributions.


Entropy ◽  
2021 ◽  
Vol 23 (1) ◽  
pp. 70
Author(s):  
Mei Ling Huang ◽  
Xiang Raney-Yan

The high quantile estimation of heavy tailed distributions has many important applications. There are theoretical difficulties in studying heavy tailed distributions since they often have infinite moments. There are also bias issues with the existing methods of confidence intervals (CIs) of high quantiles. This paper proposes a new estimator for high quantiles based on the geometric mean. The new estimator has good asymptotic properties as well as it provides a computational algorithm for estimating confidence intervals of high quantiles. The new estimator avoids difficulties, improves efficiency and reduces bias. Comparisons of efficiencies and biases of the new estimator relative to existing estimators are studied. The theoretical are confirmed through Monte Carlo simulations. Finally, the applications on two real-world examples are provided.


Entropy ◽  
2020 ◽  
Vol 23 (1) ◽  
pp. 56
Author(s):  
Haoyu Niu ◽  
Jiamin Wei ◽  
YangQuan Chen

Stochastic Configuration Network (SCN) has a powerful capability for regression and classification analysis. Traditionally, it is quite challenging to correctly determine an appropriate architecture for a neural network so that the trained model can achieve excellent performance for both learning and generalization. Compared with the known randomized learning algorithms for single hidden layer feed-forward neural networks, such as Randomized Radial Basis Function (RBF) Networks and Random Vector Functional-link (RVFL), the SCN randomly assigns the input weights and biases of the hidden nodes in a supervisory mechanism. Since the parameters in the hidden layers are randomly generated in uniform distribution, hypothetically, there is optimal randomness. Heavy-tailed distribution has shown optimal randomness in an unknown environment for finding some targets. Therefore, in this research, the authors used heavy-tailed distributions to randomly initialize weights and biases to see if the new SCN models can achieve better performance than the original SCN. Heavy-tailed distributions, such as Lévy distribution, Cauchy distribution, and Weibull distribution, have been used. Since some mixed distributions show heavy-tailed properties, the mixed Gaussian and Laplace distributions were also studied in this research work. Experimental results showed improved performance for SCN with heavy-tailed distributions. For the regression model, SCN-Lévy, SCN-Mixture, SCN-Cauchy, and SCN-Weibull used less hidden nodes to achieve similar performance with SCN. For the classification model, SCN-Mixture, SCN-Lévy, and SCN-Cauchy have higher test accuracy of 91.5%, 91.7% and 92.4%, respectively. Both are higher than the test accuracy of the original SCN.


Sign in / Sign up

Export Citation Format

Share Document