Bayesian Adjustment for Insurance Misrepresentation in Heavy-Tailed Loss Regression

In this paper, we study the problem of misrepresentation under heavy-tailed regression models with the presence of both misrepresented and correctly-measured risk factors. Misrepresentation is a type of fraud when a policy applicant gives a false statement on a risk factor that determines the insurance premium. Under the regression context, we introduce heavy-tailed misrepresentation models based on the lognormal, Weibull and Pareto distributions. The proposed models allow insurance modelers to identify risk characteristics associated with the misrepresentation risk, by imposing a latent logit model on the prevalence of misrepresentation. We prove the theoretical identifiability and implement the models using Bayesian Markov chain Monte Carlo techniques. The model performance is evaluated through both simulated data and real data from the Medical Panel Expenditure Survey. The simulation study confirms the consistency of the Bayesian estimators in large samples, whereas the case study demonstrates the necessity of the proposed models for real applications when the losses exhibit heavy-tailed features.

Download Full-text

Detection of Lung Nodules in Micro-CT Imaging Using Deep Learning

Tomography ◽

10.3390/tomography7030032 ◽

2021 ◽

Vol 7 (3) ◽

pp. 358-372

Author(s):

Matthew D. Holbrook ◽

Darin P. Clark ◽

Rutulkumar Patel ◽

Yi Qi ◽

Alex M. Bassil ◽

...

Keyword(s):

Deep Learning ◽

Lung Nodule ◽

Model Performance ◽

Area Under The Curve ◽

Simulated Data ◽

Real Data ◽

Lung Tumors ◽

Training Data ◽

Micro Ct ◽

Micro Computed Tomography

We are developing imaging methods for a co-clinical trial investigating synergy between immunotherapy and radiotherapy. We perform longitudinal micro-computed tomography (micro-CT) of mice to detect lung metastasis after treatment. This work explores deep learning (DL) as a fast approach for automated lung nodule detection. We used data from control mice both with and without primary lung tumors. To augment the number of training sets, we have simulated data using real augmented tumors inserted into micro-CT scans. We employed a convolutional neural network (CNN), trained with four competing types of training data: (1) simulated only, (2) real only, (3) simulated and real, and (4) pretraining on simulated followed with real data. We evaluated our model performance using precision and recall curves, as well as receiver operating curves (ROC) and their area under the curve (AUC). The AUC appears to be almost identical (0.76–0.77) for all four cases. However, the combination of real and synthetic data was shown to improve precision by 8%. Smaller tumors have lower rates of detection than larger ones, with networks trained on real data showing better performance. Our work suggests that DL is a promising approach for fast and relatively accurate detection of lung tumors in mice.

Download Full-text

On Downward Continuing Airborne Gravity Data for Local Geoid Modeling

10.5194/egusphere-egu21-2706 ◽

2021 ◽

Author(s):

Xiaopeng Li ◽

Jianliang Huang ◽

Martin Willberg ◽

Roland Pail ◽

Cornelis Slobbe ◽

...

Keyword(s):

Gravity Data ◽

Simulated Data ◽

Real Data ◽

Airborne Gravity ◽

Test Bed ◽

Collective Wisdom ◽

Resource Limitations ◽

Work Done ◽

Local Geoid

<p>The theories of downward continuation (DC) have been extensively studied for many decades, during which many different approaches were developed. In real applications, however, researchers often just use one method, probably due to resource limitations or to finish their work, without a rigorous head-to-head comparison with other alternatives. Considering that different methods perform quite differently under various conditions, comparing results from different methods can help a lot for identifying potential problems when dramatic differences occur, and for confirming the correctness of the solutions when results converge together, which is extremely important for real applications such as building official national vertical datums. This paper gives exactly such a case study by recording the collective wisdom recently developed within &#160;the IAG&#8217;s study group SC2.4.1. A total of six normally used DC methods, which are SHA (NGS), LSC (DTU Space), Poisson and ADC (NRCan), RBF (DU Delft), and RLSC (TUM), are applied to both simulated data (in the combination of two sampling strategies with three noise levels) and real data in a Colorado-area test bed. The data are downward continued to both surface points and to the reference ellipsoid surface. The surface points are directly evaluated with the observed gravity data on the topography. The ellipsoid points are then transformed into geoid heights according to NRCan&#8217;s Stokes-Helmert&#8217;s scheme and eventually evaluated at the GNSS/Leveling benchmarks. In this presentation, we will summarize the work done and results obtained by the aforementioned workgroup.</p>

Download Full-text

Nonparametric Regression Model for Longitudinal Data with Mixed Truncated Spline and Fourier Series

Abstract and Applied Analysis ◽

10.1155/2020/4710745 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Made Ayu Dwi Octavanny ◽

I. Nyoman Budiantara ◽

Heri Kuswanto ◽

Dyah Putri Rahmawati

Keyword(s):

Fourier Series ◽

Longitudinal Data ◽

Nonparametric Regression ◽

Simulated Data ◽

Real Data ◽

Least Square ◽

Consistent Finding ◽

Weighted Least Square ◽

Stage Estimation

Existing literature in nonparametric regression has established a model that only applies one estimator to all predictors. This study is aimed at developing a mixed truncated spline and Fourier series model in nonparametric regression for longitudinal data. The mixed estimator is obtained by solving the two-stage estimation, consisting of a penalized weighted least square (PWLS) and weighted least square (WLS) optimization. To demonstrate the performance of the proposed method, simulation and real data are provided. The results of the simulated data and case study show a consistent finding.

Download Full-text

Protracted speciation under the state-dependent speciation and extinction approach

10.1101/2021.06.29.450466 ◽

2021 ◽

Author(s):

Xia Hua ◽

Tyara Herdha ◽

Conrad Burden

Keyword(s):

Evolutionary Biology ◽

Genetic Difference ◽

Model Performance ◽

Simulated Data ◽

Real Data ◽

Species Group ◽

Maximum Likelihood Estimates ◽

Model Organisms ◽

State Dependent ◽

Improve Accuracy

How long does speciation take? What are the speciation processes that generated a species group? The answers to these important questions in evolutionary biology lie in the genetic difference not only among species, but also among lineages within each species. With the advance of genome sequencing in non-model organisms and the statistical tools to improve accuracy in inferring evolutionary histories among recently diverged lineages, we now have the lineage-level trees to answer these questions. However, we do not yet have an analytical tool for inferring speciation processes from these trees. What is needed is a model of speciation processes that generates both the trees and species identities of extant lineages. The model should allow calculation of the probability that certain lineages belong to certain species and have an evolutionary history consistent with the tree. Here we propose such a model and test the model performance on both simulated data and real data. We show that maximum likelihood estimates of the model are highly accurate and give estimates from real data that generate patterns consistent with observations. We discuss how to extend the model to account for different rates and types of speciation processes across lineages in a species group. By linking evolutionary processes on lineage level to species level, the model provides a new phylogenetic approach to study not just when speciation happened, but how speciation happened.

Download Full-text

Ship Emission Mitigation Strategies Choice Under Uncertainty

Energies ◽

10.3390/en13092213 ◽

2020 ◽

Vol 13 (9) ◽

pp. 2213 ◽

Cited By ~ 1

Author(s):

Jun Yuan ◽

Haowei Wang ◽

Szu Hui Ng ◽

Victor Nian

Keyword(s):

Fuel Consumption ◽

Simulated Data ◽

Simulation Models ◽

Real Data ◽

Mitigation Strategies ◽

Data Sources ◽

Emission Mitigation ◽

Choice Under Uncertainty ◽

Global Emissions

Various mitigation strategies have been proposed to reduce the CO2 emissions from ships, which have become a major contributor to global emissions. The fuel consumption under different mitigation strategies can be evaluated based on two data sources, real data from the real ship systems and simulated data from the simulation models. In practice, the uncertainties in the obtained data may have non-negligible impacts on the evaluation of mitigation strategies. In this paper, a Gaussian process metamodel-based approach is proposed to evaluate the ship fuel consumption under different mitigation strategies. The proposed method not only can incorporate different data sources but also consider the uncertainties in the data to obtain a more reliable evaluation. A cost-effectiveness analysis based on the fuel consumption prediction is then applied to rank the mitigation strategies under uncertainty. The accuracy and efficiency of the proposed method is illustrated in a chemical tanker case study, and the results indicate that it is critical to consider the uncertainty, as they can lead to suboptimal decisions when ignored. Here, trim optimisation is ranked more effective than draft optimisation when the uncertainty is ignored, but the reverse is the case when the uncertainty in the estimations are fully accounted for.

Download Full-text

Optimizing the Environmental and Economic Sustainability of Remote Community Infrastructure

Sustainability ◽

10.3390/su12062208 ◽

2020 ◽

Vol 12 (6) ◽

pp. 2208 ◽

Cited By ~ 1

Author(s):

Jamie E. Filer ◽

Justin D. Delorit ◽

Andrew J. Hoisington ◽

Steven J. Schuldt

Keyword(s):

Environmental Impacts ◽

Sustainability Assessment ◽

Model Performance ◽

Assessment Model ◽

Remote Communities ◽

Remote Community ◽

Rural Villages ◽

Sustainable Technologies ◽

Study Results

Remote communities such as rural villages, post-disaster housing camps, and military forward operating bases are often located in remote and hostile areas with limited or no access to established infrastructure grids. Operating these communities with conventional assets requires constant resupply, which yields a significant logistical burden, creates negative environmental impacts, and increases costs. For example, a 2000-member isolated village in northern Canada relying on diesel generators required 8.6 million USD of fuel per year and emitted 8500 tons of carbon dioxide. Remote community planners can mitigate these negative impacts by selecting sustainable technologies that minimize resource consumption and emissions. However, the alternatives often come at a higher procurement cost and mobilization requirement. To assist planners with this challenging task, this paper presents the development of a novel infrastructure sustainability assessment model capable of generating optimal tradeoffs between minimizing environmental impacts and minimizing life-cycle costs over the community’s anticipated lifespan. Model performance was evaluated using a case study of a hypothetical 500-person remote military base with 864 feasible infrastructure portfolios and 48 procedural portfolios. The case study results demonstrated the model’s novel capability to assist planners in identifying optimal combinations of infrastructure alternatives that minimize negative sustainability impacts, leading to remote communities that are more self-sufficient with reduced emissions and costs.

Download Full-text

MIXED CAUSAL-NONCAUSAL AR PROCESSES AND THE MODELLING OF EXPLOSIVE BUBBLES

Econometric Theory ◽

10.1017/s0266466618000452 ◽

2019 ◽

Vol 35 (6) ◽

pp. 1234-1270 ◽

Cited By ~ 7

Author(s):

Sébastien Fries ◽

Jean-Michel Zakoian

Keyword(s):

Conditional Distribution ◽

Stable Distribution ◽

Financial Time Series ◽

Domain Of Attraction ◽

Real Data ◽

Autoregressive Processes ◽

Financial Time ◽

Causal Representation ◽

Heavy Tailed ◽

Non Gaussian

Noncausal autoregressive models with heavy-tailed errors generate locally explosive processes and, therefore, provide a convenient framework for modelling bubbles in economic and financial time series. We investigate the probability properties of mixed causal-noncausal autoregressive processes, assuming the errors follow a stable non-Gaussian distribution. Extending the study of the noncausal AR(1) model by Gouriéroux and Zakoian (2017), we show that the conditional distribution in direct time is lighter-tailed than the errors distribution, and we emphasize the presence of ARCH effects in a causal representation of the process. Under the assumption that the errors belong to the domain of attraction of a stable distribution, we show that a causal AR representation with non-i.i.d. errors can be consistently estimated by classical least-squares. We derive a portmanteau test to check the validity of the estimated AR representation and propose a method based on extreme residuals clustering to determine whether the AR generating process is causal, noncausal, or mixed. An empirical study on simulated and real data illustrates the potential usefulness of the results.

Download Full-text

Separation of Chromatographic Co-Eluted Compounds by Clustering and by Functional Data Analysis

Metabolites ◽

10.3390/metabo11040214 ◽

2021 ◽

Vol 11 (4) ◽

pp. 214

Author(s):

Aneta Sawikowska ◽

Anna Piasecka ◽

Piotr Kachlicki ◽

Paweł Krajewski

Keyword(s):

Simulated Data ◽

Principal Component ◽

Real Data ◽

Functional Principal Component Analysis ◽

Additional Advantage ◽

Time Alignment ◽

Peak Separation ◽

Biological Mixtures ◽

Overlapping Peaks ◽

Retention Time Alignment

Peak overlapping is a common problem in chromatography, mainly in the case of complex biological mixtures, i.e., metabolites. Due to the existence of the phenomenon of co-elution of different compounds with similar chromatographic properties, peak separation becomes challenging. In this paper, two computational methods of separating peaks, applied, for the first time, to large chromatographic datasets, are described, compared, and experimentally validated. The methods lead from raw observations to data that can form inputs for statistical analysis. First, in both methods, data are normalized by the mass of sample, the baseline is removed, retention time alignment is conducted, and detection of peaks is performed. Then, in the first method, clustering is used to separate overlapping peaks, whereas in the second method, functional principal component analysis (FPCA) is applied for the same purpose. Simulated data and experimental results are used as examples to present both methods and to compare them. Real data were obtained in a study of metabolomic changes in barley (Hordeum vulgare) leaves under drought stress. The results suggest that both methods are suitable for separation of overlapping peaks, but the additional advantage of the FPCA is the possibility to assess the variability of individual compounds present within the same peaks of different chromatograms.

Download Full-text

A Closed-Form Solution to Planar Feature-Based Registration of LiDAR Point Clouds

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070435 ◽

2021 ◽

Vol 10 (7) ◽

pp. 435

Author(s):

Yongbo Wang ◽

Nanshan Zheng ◽

Zhengfu Bian

Keyword(s):

Closed Form ◽

Closed Form Solution ◽

Simulated Data ◽

Real Data ◽

Point Clouds ◽

Form Solution ◽

Spatial Transformation ◽

Dual Quaternions ◽

Feature Based ◽

Planar Feature

Since pairwise registration is a necessary step for the seamless fusion of point clouds from neighboring stations, a closed-form solution to planar feature-based registration of LiDAR (Light Detection and Ranging) point clouds is proposed in this paper. Based on the Plücker coordinate-based representation of linear features in three-dimensional space, a quad tuple-based representation of planar features is introduced, which makes it possible to directly determine the difference between any two planar features. Dual quaternions are employed to represent spatial transformation and operations between dual quaternions and the quad tuple-based representation of planar features are given, with which an error norm is constructed. Based on L2-norm-minimization, detailed derivations of the proposed solution are explained step by step. Two experiments were designed in which simulated data and real data were both used to verify the correctness and the feasibility of the proposed solution. With the simulated data, the calculated registration results were consistent with the pre-established parameters, which verifies the correctness of the presented solution. With the real data, the calculated registration results were consistent with the results calculated by iterative methods. Conclusions can be drawn from the two experiments: (1) The proposed solution does not require any initial estimates of the unknown parameters in advance, which assures the stability and robustness of the solution; (2) Using dual quaternions to represent spatial transformation greatly reduces the additional constraints in the estimation process.

Download Full-text

Penalized partial least squares for pleiotropy

BMC Bioinformatics ◽

10.1186/s12859-021-03968-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Camilo Broc ◽

Therese Truong ◽

Benoit Liquet

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Association Studies ◽

A Priori ◽

Simulated Data ◽

Real Data ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Multiple Traits ◽

Application Fields

Abstract Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Results Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. Conclusion The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.

Download Full-text