Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

A heteroscedastic measurement error model based on skew and heavy-tailed distributions with known error variances

Journal of Statistical Computation and Simulation ◽

10.1080/00949655.2018.1452925 ◽

2018 ◽

Vol 88 (11) ◽

pp. 2185-2200 ◽

Cited By ~ 7

Author(s):

Lorena Cáceres Tomaya ◽

Mário de Castro

Keyword(s):

Measurement Error ◽

Error Model ◽

Measurement Error Model ◽

Model Based ◽

Heavy Tailed Distributions ◽

Heavy Tailed

Download Full-text

MODEL-BASED CLUSTERING WITH GENE RANKING USING PENALIZED MIXTURES OF HEAVY-TAILED DISTRIBUTIONS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720013410072 ◽

2013 ◽

Vol 11 (03) ◽

pp. 1341007 ◽

Cited By ~ 3

Author(s):

ALBERTO COZZINI ◽

AJAY JASRA ◽

GIOVANNI MONTANA

Keyword(s):

Gene Expression ◽

Clustering Algorithms ◽

High Sensitivity ◽

Gene Ranking ◽

Model Based Clustering ◽

Cancer Data ◽

Model Based ◽

Relative Contribution ◽

Heavy Tailed ◽

Improved Model

Cluster analysis of biological samples using gene expression measurements is a common task which aids the discovery of heterogeneous biological sub-populations having distinct mRNA profiles. Several model-based clustering algorithms have been proposed in which the distribution of gene expression values within each sub-group is assumed to be Gaussian. In the presence of noise and extreme observations, a mixture of Gaussian densities may over-fit and overestimate the true number of clusters. Moreover, commonly used model-based clustering algorithms do not generally provide a mechanism to quantify the relative contribution of each gene to the final partitioning of the data. We propose a penalized mixture of Student's t distributions for model-based clustering and gene ranking. Together with a resampling procedure, the proposed approach provides a means for ranking genes according to their contributions to the clustering process. Experimental results show that the algorithm performs well comparably to traditional Gaussian mixtures in the presence of outliers and longer tailed distributions. The algorithm also identifies the true informative genes with high sensitivity, and achieves improved model selection. An illustrative application to breast cancer data is also presented which confirms established tumor sub-classes.

Download Full-text

A robust Birnbaum–Saunders regression model based on asymmetric heavy-tailed distributions

Metrika ◽

10.1007/s00184-021-00815-4 ◽

2021 ◽

Author(s):

Rocío Maehara ◽

Heleno Bolfarine ◽

Filidor Vilca ◽

N. Balakrishnan

Keyword(s):

Regression Model ◽

Model Based ◽

Heavy Tailed Distributions ◽

Heavy Tailed

Download Full-text

Model-Based Clustering and Classification for Data Science

10.1017/9781108644181 ◽

2019 ◽

Cited By ~ 17

Author(s):

Charles Bouveyron ◽

Gilles Celeux ◽

T. Brendan Murphy ◽

Adrian E. Raftery

Keyword(s):

Data Science ◽

Model Based Clustering ◽

Model Based ◽

Clustering And Classification

Download Full-text

Heavy-Tailed Distributions, GARCH Model and the Stock Market Returns in South Korea

SSRN Electronic Journal ◽

10.2139/ssrn.3014472 ◽

2017 ◽

Author(s):

Yoon Hong ◽

Ji-chul Lee ◽

Ding Guoping

Keyword(s):

Stock Market ◽

South Korea ◽

Garch Model ◽

Market Returns ◽

Stock Market Returns ◽

Heavy Tailed Distributions ◽

Heavy Tailed

Download Full-text

Probability and Random Processes

10.1093/oso/9780198821939.003.0002 ◽

2018 ◽

Author(s):

Stefan Thurner ◽

Rudolf Hanel ◽

Peter Klimekl

Keyword(s):

Random Processes ◽

Distribution Functions ◽

Basic Logic ◽

Heavy Tailed Distributions ◽

Random Components ◽

Classical Distribution ◽

Heavy Tailed ◽

Basic Facts ◽

Stochastic Events ◽

Stochastic Event

Phenomena, systems, and processes are rarely purely deterministic, but contain stochastic,probabilistic, or random components. For that reason, a probabilistic descriptionof most phenomena is necessary. Probability theory provides us with the tools for thistask. Here, we provide a crash course on the most important notions of probabilityand random processes, such as odds, probability, expectation, variance, and so on. Wedescribe the most elementary stochastic event—the trial—and develop the notion of urnmodels. We discuss basic facts about random variables and the elementary operationsthat can be performed on them. We learn how to compose simple stochastic processesfrom elementary stochastic events, and discuss random processes as temporal sequencesof trials, such as Bernoulli and Markov processes. We touch upon the basic logic ofBayesian reasoning. We discuss a number of classical distribution functions, includingpower laws and other fat- or heavy-tailed distributions.

Download Full-text

Semiparametric Mixed-Effects Ordinary Differential Equation Models with Heavy-Tailed Distributions

Journal of Agricultural Biological and Environmental Statistics ◽

10.1007/s13253-021-00446-2 ◽

2021 ◽

Author(s):

Baisen Liu ◽

Liangliang Wang ◽

Yunlong Nie ◽

Jiguo Cao

Keyword(s):

Differential Equation ◽

Ordinary Differential Equation ◽

Mixed Effects ◽

Heavy Tailed Distributions ◽

Differential Equation Models ◽

Heavy Tailed

Download Full-text

A Method for Confidence Intervals of High Quantiles

Entropy ◽

10.3390/e23010070 ◽

2021 ◽

Vol 23 (1) ◽

pp. 70

Author(s):

Mei Ling Huang ◽

Xiang Raney-Yan

Keyword(s):

Confidence Intervals ◽

Computational Algorithm ◽

Geometric Mean ◽

Asymptotic Properties ◽

Quantile Estimation ◽

Heavy Tailed Distributions ◽

High Quantiles ◽

High Quantile ◽

Heavy Tailed ◽

Infinite Moments

The high quantile estimation of heavy tailed distributions has many important applications. There are theoretical difficulties in studying heavy tailed distributions since they often have infinite moments. There are also bias issues with the existing methods of confidence intervals (CIs) of high quantiles. This paper proposes a new estimator for high quantiles based on the geometric mean. The new estimator has good asymptotic properties as well as it provides a computational algorithm for estimating confidence intervals of high quantiles. The new estimator avoids difficulties, improves efficiency and reduces bias. Comparisons of efficiencies and biases of the new estimator relative to existing estimators are studied. The theoretical are confirmed through Monte Carlo simulations. Finally, the applications on two real-world examples are provided.

Download Full-text

Optimal Randomness for Stochastic Configuration Network (SCN) with Heavy-Tailed Distributions

Entropy ◽

10.3390/e23010056 ◽

2020 ◽

Vol 23 (1) ◽

pp. 56

Author(s):

Haoyu Niu ◽

Jiamin Wei ◽

YangQuan Chen

Keyword(s):

Research Work ◽

Cauchy Distribution ◽

Classification Model ◽

Test Accuracy ◽

Functional Link ◽

Heavy Tailed Distributions ◽

Heavy Tailed ◽

Improved Performance ◽

Hidden Layer ◽

Hidden Nodes

Stochastic Configuration Network (SCN) has a powerful capability for regression and classification analysis. Traditionally, it is quite challenging to correctly determine an appropriate architecture for a neural network so that the trained model can achieve excellent performance for both learning and generalization. Compared with the known randomized learning algorithms for single hidden layer feed-forward neural networks, such as Randomized Radial Basis Function (RBF) Networks and Random Vector Functional-link (RVFL), the SCN randomly assigns the input weights and biases of the hidden nodes in a supervisory mechanism. Since the parameters in the hidden layers are randomly generated in uniform distribution, hypothetically, there is optimal randomness. Heavy-tailed distribution has shown optimal randomness in an unknown environment for finding some targets. Therefore, in this research, the authors used heavy-tailed distributions to randomly initialize weights and biases to see if the new SCN models can achieve better performance than the original SCN. Heavy-tailed distributions, such as Lévy distribution, Cauchy distribution, and Weibull distribution, have been used. Since some mixed distributions show heavy-tailed properties, the mixed Gaussian and Laplace distributions were also studied in this research work. Experimental results showed improved performance for SCN with heavy-tailed distributions. For the regression model, SCN-Lévy, SCN-Mixture, SCN-Cauchy, and SCN-Weibull used less hidden nodes to achieve similar performance with SCN. For the classification model, SCN-Mixture, SCN-Lévy, and SCN-Cauchy have higher test accuracy of 91.5%, 91.7% and 92.4%, respectively. Both are higher than the test accuracy of the original SCN.

Download Full-text

Model-based Clustering and Prediction with Mixed Measurements involving Surrogate Classifiers

Statistics in Biopharmaceutical Research ◽

10.1080/19466315.2020.1863257 ◽

2020 ◽

pp. 1-30

Author(s):

Hua Shenam ◽

Alexander R. de Leon

Keyword(s):

Model Based Clustering ◽

Model Based

Download Full-text