MODELLING INSURANCE LOSSES USING CONTAMINATED GENERALISED BETA TYPE-II DISTRIBUTION

AbstractThe four-parameter distribution family, the generalised beta type-II (GB2), also known as the transformed beta distribution, has been proposed for modelling insurance losses. As special cases, this family nests many distributions with light and heavy tails, including the lognormal, gamma, Weibull, Burr and generalised gamma distributions. This paper extends the GB2 family to the contaminated GB2 family, which offers many flexible features, including bimodality and a wide range of skewness and kurtosis. Properties of the contaminated distribution are derived and evaluated in a simulation study and the suitability of the contaminated GB2 distribution for actuarial purposes is demonstrated through two real loss data sets. Analysis of tail quantiles for the data suggests large differences in extreme quantile estimates for different loss distribution assumptions, showing that the selection of appropriate distributions has a significant impact for insurance companies.

Download Full-text

Use of a General Dose–Response Model for Rockfish Fecundity–Length Relationships

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f90-134 ◽

1990 ◽

Vol 47 (6) ◽

pp. 1148-1156 ◽

Cited By ~ 1

Author(s):

Laura J. Richards ◽

Jon T. Schnute

Keyword(s):

Dose Response ◽

Regression Models ◽

Data Sets ◽

Response Model ◽

Exact Inference ◽

Special Cases ◽

Wide Range ◽

Length Data ◽

The Relationship ◽

General Method

In this paper we describe a general method for determining the relationship between fecundity and another fish attribute, such as size or age. Our methods include linear and logarithmic regression models as special cases and are applicable to a wide range of situations. The model we propose is based on the univariate form of the Schnute–Jensen dose–response model. However, we extend the Schnute–Jensen analysis by describing exact inference regions obtained from likelihood contours, to which we assign nominal probability levels. We also provide a method for obtaining an inference band for the predicted curve. We examine the issue of model adequacy as it relates to fecundity–length data from two rockfish (Sebastes) species. We show that the extra complexity of our model is justified, as none of the traditional regression models are appropriate for all three of our data sets. Further, we use inference bands to distinguish fecundity–length relationships for quillback rockfish (S. maliger) from two areas, but we are unable to distinguish one of these relationships from a similar relationship for copper rockfish (S. caurinus).

Download Full-text

Robust Bayesian Regularized Estimation Based ontRegression Model

Journal of Probability and Statistics ◽

10.1155/2015/989412 ◽

2015 ◽

Vol 2015 ◽

pp. 1-9

Author(s):

Zean Li ◽

Weihua Zhao

Keyword(s):

Gibbs Sampler ◽

Heavy Tails ◽

Real Data ◽

Adaptive Lasso ◽

Data Sets ◽

Simulation Studies ◽

Model Framework ◽

Bayesian Hierarchical ◽

Coefficient Estimation ◽

Gamma Distributions

Thetdistribution is a useful extension of the normal distribution, which can be used for statistical modeling of data sets with heavy tails, and provides robust estimation. In this paper, in view of the advantages of Bayesian analysis, we propose a new robust coefficient estimation and variable selection method based on Bayesian adaptive Lassotregression. A Gibbs sampler is developed based on the Bayesian hierarchical model framework, where we treat thetdistribution as a mixture of normal and gamma distributions and put different penalization parameters for different regression coefficients. We also consider the Bayesiantregression with adaptive group Lasso and obtain the Gibbs sampler from the posterior distributions. Both simulation studies and real data example show that our method performs well compared with other existing methods when the error distribution has heavy tails and/or outliers.

Download Full-text

Interval Estimation in Lifetime Distributions Using Progressively Type II Censored Data

International Journal of Reliability Quality and Safety Engineering ◽

10.1142/s0218539321500443 ◽

2021 ◽

Author(s):

Ayman Baklizi

Keyword(s):

Closed Form ◽

Confidence Intervals ◽

Censored Data ◽

Interval Estimation ◽

Real Data ◽

Data Sets ◽

Type Ii ◽

Lifetime Distributions ◽

Special Cases ◽

Type Ii Censored Data

In this paper, we developed a method for constructing confidence intervals for the parameters of lifetime distributions based on progressively type II censored data. The method produces closed form expressions for the bounds of the confidence intervals for several special cases of parameters and lifetime distributions. Closed form approximations are derived for the intervals for the parameters of the location or scale families of distributions. The method is illustrated with several examples and analyses of real data sets are included to illustrate the application of the method.

Download Full-text

mtDNAcombine: tools to combine sequences from multiple studies

BMC Bioinformatics ◽

10.1186/s12859-021-04048-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Eleanor F. Miller ◽

Andrea Manica

Keyword(s):

Sequence Data ◽

Data Extraction ◽

Bayesian Skyline Plot ◽

Model Organisms ◽

Data Sets ◽

Data Handling ◽

Online Database ◽

Genetic Studies ◽

Wide Range ◽

Existing Data

Abstract Background Today an unprecedented amount of genetic sequence data is stored in publicly available repositories. For decades now, mitochondrial DNA (mtDNA) has been the workhorse of genetic studies, and as a result, there is a large volume of mtDNA data available in these repositories for a wide range of species. Indeed, whilst whole genome sequencing is an exciting prospect for the future, for most non-model organisms’ classical markers such as mtDNA remain widely used. By compiling existing data from multiple original studies, it is possible to build powerful new datasets capable of exploring many questions in ecology, evolution and conservation biology. One key question that these data can help inform is what happened in a species’ demographic past. However, compiling data in this manner is not trivial, there are many complexities associated with data extraction, data quality and data handling. Results Here we present the mtDNAcombine package, a collection of tools developed to manage some of the major decisions associated with handling multi-study sequence data with a particular focus on preparing sequence data for Bayesian skyline plot demographic reconstructions. Conclusions There is now more genetic information available than ever before and large meta-data sets offer great opportunities to explore new and exciting avenues of research. However, compiling multi-study datasets still remains a technically challenging prospect. The mtDNAcombine package provides a pipeline to streamline the process of downloading, curating, and analysing sequence data, guiding the process of compiling data sets from the online database GenBank.

Download Full-text

Classification of jujube defects in small data sets based on transfer learning

Neural Computing and Applications ◽

10.1007/s00521-021-05715-2 ◽

2021 ◽

Author(s):

Jianping Ju ◽

Hong Zheng ◽

Xiaohang Xu ◽

Zhongyuan Guo ◽

Zhaohui Zheng ◽

...

Keyword(s):

Transfer Learning ◽

Loss Function ◽

Training Model ◽

Parameter Distribution ◽

Test Accuracy ◽

Small Data ◽

Data Sets ◽

Data Set ◽

Small Data Sets

AbstractAlthough convolutional neural networks have achieved success in the field of image classification, there are still challenges in the field of agricultural product quality sorting such as machine vision-based jujube defects detection. The performance of jujube defect detection mainly depends on the feature extraction and the classifier used. Due to the diversity of the jujube materials and the variability of the testing environment, the traditional method of manually extracting the features often fails to meet the requirements of practical application. In this paper, a jujube sorting model in small data sets based on convolutional neural network and transfer learning is proposed to meet the actual demand of jujube defects detection. Firstly, the original images collected from the actual jujube sorting production line were pre-processed, and the data were augmented to establish a data set of five categories of jujube defects. The original CNN model is then improved by embedding the SE module and using the triplet loss function and the center loss function to replace the softmax loss function. Finally, the depth pre-training model on the ImageNet image data set was used to conduct training on the jujube defects data set, so that the parameters of the pre-training model could fit the parameter distribution of the jujube defects image, and the parameter distribution was transferred to the jujube defects data set to complete the transfer of the model and realize the detection and classification of the jujube defects. The classification results are visualized by heatmap through the analysis of classification accuracy and confusion matrix compared with the comparison models. The experimental results show that the SE-ResNet50-CL model optimizes the fine-grained classification problem of jujube defect recognition, and the test accuracy reaches 94.15%. The model has good stability and high recognition accuracy in complex environments.

Download Full-text

Survival and Reliability Analysis with an Epsilon-Positive Family of Distributions with Applications

Symmetry ◽

10.3390/sym13050908 ◽

2021 ◽

Vol 13 (5) ◽

pp. 908

Author(s):

Perla Celis ◽

Rolando de la Cruz ◽

Claudio Fuentes ◽

Héctor W. Gómez

Keyword(s):

Survival Data ◽

Maximum Likelihood Estimates ◽

New Class ◽

New Family ◽

Gamma Distributions ◽

Special Cases ◽

Log Normal ◽

Positive Support ◽

Positive Distributions ◽

Family Of Distributions

We introduce a new class of distributions called the epsilon–positive family, which can be viewed as generalization of the distributions with positive support. The construction of the epsilon–positive family is motivated by the ideas behind the generation of skew distributions using symmetric kernels. This new class of distributions has as special cases the exponential, Weibull, log–normal, log–logistic and gamma distributions, and it provides an alternative for analyzing reliability and survival data. An interesting feature of the epsilon–positive family is that it can viewed as a finite scale mixture of positive distributions, facilitating the derivation and implementation of EM–type algorithms to obtain maximum likelihood estimates (MLE) with (un)censored data. We illustrate the flexibility of this family to analyze censored and uncensored data using two real examples. One of them was previously discussed in the literature; the second one consists of a new application to model recidivism data of a group of inmates released from the Chilean prisons during 2007. The results show that this new family of distributions has a better performance fitting the data than some common alternatives such as the exponential distribution.

Download Full-text

Kumaraswamy Generalized Power Lomax Distributionand Its Applications

Stats ◽

10.3390/stats4010003 ◽

2021 ◽

Vol 4 (1) ◽

pp. 28-45

Author(s):

Vasili B.V. Nagarjuna ◽

R. Vishnu Vardhan ◽

Christophe Chesneau

Keyword(s):

Hazard Rate ◽

Real Data ◽

Rate Function ◽

Maximum Likelihood Estimates ◽

Parameter Estimates ◽

Parameter Distribution ◽

Data Sets ◽

Lomax Distribution ◽

Entropy Measures ◽

Modeling Behavior

In this paper, a new five-parameter distribution is proposed using the functionalities of the Kumaraswamy generalized family of distributions and the features of the power Lomax distribution. It is named as Kumaraswamy generalized power Lomax distribution. In a first approach, we derive its main probability and reliability functions, with a visualization of its modeling behavior by considering different parameter combinations. As prime quality, the corresponding hazard rate function is very flexible; it possesses decreasing, increasing and inverted (upside-down) bathtub shapes. Also, decreasing-increasing-decreasing shapes are nicely observed. Some important characteristics of the Kumaraswamy generalized power Lomax distribution are derived, including moments, entropy measures and order statistics. The second approach is statistical. The maximum likelihood estimates of the parameters are described and a brief simulation study shows their effectiveness. Two real data sets are taken to show how the proposed distribution can be applied concretely; parameter estimates are obtained and fitting comparisons are performed with other well-established Lomax based distributions. The Kumaraswamy generalized power Lomax distribution turns out to be best by capturing fine details in the structure of the data considered.

Download Full-text

MUREN: a robust and multi-reference approach of RNA-seq transcript normalization

BMC Bioinformatics ◽

10.1186/s12859-021-04288-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yance Feng ◽

Lei M. Li

Keyword(s):

Biological Significance ◽

Housekeeping Genes ◽

R Package ◽

Data Sets ◽

Statistical Regression ◽

Rna Seq ◽

Least Trimmed Squares ◽

Standard Data ◽

Wide Range ◽

Multiple References

Abstract Background Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. Results We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. Conclusions MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples.

Download Full-text

A Generalized Rayleigh Family of Distributions Based on the Modified Slash Model

Symmetry ◽

10.3390/sym13071226 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1226

Author(s):

Inmaculada Barranco-Chamorro ◽

Yuri A. Iriarte ◽

Yolanda M. Gómez ◽

Juan M. Astorga ◽

Héctor W. Gómez

Keyword(s):

Heavy Tails ◽

Real Data ◽

Rate Function ◽

Data Sets ◽

Estimation Of Parameters ◽

Stochastic Representation ◽

Likelihood Methods ◽

Chi Square ◽

Maximum Likelihood Methods ◽

Family Of Distributions

Specifying a proper statistical model to represent asymmetric lifetime data with high kurtosis is an open problem. In this paper, the three-parameter, modified, slashed, generalized Rayleigh family of distributions is proposed. Its structural properties are studied: stochastic representation, probability density function, hazard rate function, moments and estimation of parameters via maximum likelihood methods. As merits of our proposal, we highlight as particular cases a plethora of lifetime models, such as Rayleigh, Maxwell, half-normal and chi-square, among others, which are able to accommodate heavy tails. A simulation study and applications to real data sets are included to illustrate the use of our results.

Download Full-text

A Visual and VAE Based Hierarchical Indoor Localization Method

Sensors ◽

10.3390/s21103406 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3406

Author(s):

Jie Jiang ◽

Yin Zou ◽

Lidong Chen ◽

Yujie Fang

Keyword(s):

Image Retrieval ◽

Indoor Localization ◽

Data Sets ◽

Indoor Environments ◽

Global Features ◽

Data Set ◽

Data Annotation ◽

Wide Range ◽

Annotation Costs ◽

Global And Local

Precise localization and pose estimation in indoor environments are commonly employed in a wide range of applications, including robotics, augmented reality, and navigation and positioning services. Such applications can be solved via visual-based localization using a pre-built 3D model. The increase in searching space associated with large scenes can be overcome by retrieving images in advance and subsequently estimating the pose. The majority of current deep learning-based image retrieval methods require labeled data, which increase data annotation costs and complicate the acquisition of data. In this paper, we propose an unsupervised hierarchical indoor localization framework that integrates an unsupervised network variational autoencoder (VAE) with a visual-based Structure-from-Motion (SfM) approach in order to extract global and local features. During the localization process, global features are applied for the image retrieval at the level of the scene map in order to obtain candidate images, and are subsequently used to estimate the pose from 2D-3D matches between query and candidate images. RGB images only are used as the input of the proposed localization system, which is both convenient and challenging. Experimental results reveal that the proposed method can localize images within 0.16 m and 4° in the 7-Scenes data sets and 32.8% within 5 m and 20° in the Baidu data set. Furthermore, our proposed method achieves a higher precision compared to advanced methods.

Download Full-text