scholarly journals Statistical tests for latent class in censored data due to detection limit

2019 ◽  
Vol 29 (8) ◽  
pp. 2179-2197
Author(s):  
Hua He ◽  
Wan Tang ◽  
Tanika Kelly ◽  
Shengxu Li ◽  
Jiang He

Measures of substance concentration in urine, serum or other biological matrices often have an assay limit of detection. When concentration levels fall below the limit, the exact measures cannot be obtained. Instead, the measures are censored as only partial information that the levels are under the limit is known. Assuming the concentration levels are from a single population with a normal distribution or follow a normal distribution after some transformation, Tobit regression models, or censored normal regression models, are the standard approach for analyzing such data. However, in practice, it is often the case that the data can exhibit more censored observations than what would be expected under the Tobit regression models. One common cause is the heterogeneity of the study population, caused by the existence of a latent group of subjects who lack the substance measured. For such subjects, the measurements will always be under the limit. If a censored normal regression model is appropriate for modeling the subjects with the substance, the whole population follows a mixture of a censored normal regression model and a degenerate distribution of the latent class. While there are some studies on such mixture models, a fundamental question about testing whether such mixture modeling is necessary, i.e. whether such a latent class exists, has not been studied yet. In this paper, three tests including Wald test, likelihood ratio test and score test are developed for testing the existence of such latent class. Simulation studies are conducted to evaluate the performance of the tests, and two real data examples are employed to illustrate the tests.

Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 173
Author(s):  
Ayman Alzaatreh ◽  
Mohammad Aljarrah ◽  
Ayanna Almagambetova ◽  
Nazgul Zakiyeva

The traditional linear regression model that assumes normal residuals is applied extensively in engineering and science. However, the normality assumption of the model residuals is often ineffective. This drawback can be overcome by using a generalized normal regression model that assumes a non-normal response. In this paper, we propose regression models based on generalizations of the normal distribution. The proposed regression models can be used effectively in modeling data with a highly skewed response. Furthermore, we study in some details the structural properties of the proposed generalizations of the normal distribution. The maximum likelihood method is used for estimating the parameters of the proposed method. The performance of the maximum likelihood estimators in estimating the distributional parameters is assessed through a small simulation study. Applications to two real datasets are given to illustrate the flexibility and the usefulness of the proposed distributions and their regression models.


2019 ◽  
Vol 11 (01n02) ◽  
pp. 1950003
Author(s):  
Fábio Prataviera ◽  
Gauss M. Cordeiro ◽  
Edwin M. M. Ortega ◽  
Adriano K. Suzuki

In several applications, the distribution of the data is frequently unimodal, asymmetric or bimodal. The regression models commonly used for applications to data with real support are the normal, skew normal, beta normal and gamma normal, among others. We define a new regression model based on the odd log-logistic geometric normal distribution for modeling asymmetric or bimodal data with support in [Formula: see text], which generalizes some known regression models including the widely known heteroscedastic linear regression. We adopt the maximum likelihood method for estimating the model parameters and define diagnostic measures to detect influential observations. For some parameter settings, sample sizes and different systematic structures, various simulations are performed to verify the adequacy of the estimators of the model parameters. The empirical distribution of the quantile residuals is investigated and compared with the standard normal distribution. We prove empirically the usefulness of the proposed models by means of three applications to real data.


2019 ◽  
Vol 11 (16) ◽  
pp. 1895 ◽  
Author(s):  
Agapiou ◽  
Sarris

The integration of different remote sensing datasets acquired from optical and radar sensors can improve the overall performance and detection rate for mapping sub-surface archaeological remains. However, data fusion remains a challenge for archaeological prospection studies, since remotely sensed sensors have different instrument principles, operating in different wavelengths. Recent studies have demonstrated that some fusion modelling can be achieved under ideal measurement conditions (e.g., simultaneously measurements in no hazy days) using advance regression models, like those of the nonlinear Bayesian Neural Networks. This paper aims to go a step further and investigate the impact of noise in regression models, between datasets obtained from ground-penetrating radar (GPR) and portable field spectroradiometers. Initially, the GPR measurements provided three depth slices of 20 cm thickness, starting from 0.00 m up to 0.60 m below the ground surface while ground spectral signatures acquired from the spectroradiometer were processed to calculate 13 multispectral and 53 hyperspectral indices. Then, various levels of Gaussian random noise ranging from 0.1 to 0.5 of a normal distribution, with mean 0 and variance 1, were added at both GPR and spectral signatures datasets. Afterward, Bayesian Neural Network regression fitting was applied between the radar (GPR) versus the optical (spectral signatures) datasets. Different regression model strategies were implemented and presented in the paper. The overall results show that fusion with a noise level of up to 0.2 of the normal distribution does not dramatically drop the regression model between the radar and optical datasets (compared to the non-noisy data). Finally, anomalies appearing as strong reflectors in the GPR measurements, continue to provide an obvious contrast even with noisy regression modelling.


2018 ◽  
Vol 31 (8) ◽  
pp. 1423-1453
Author(s):  
David F. Warner ◽  
Scott A. Adams ◽  
Raeda K. Anderson

Objectives: To examine how social role configurations (SRCs)—combinations of the quality of spousal, family, and friend relationships—moderate the association between functional limitations (FLs) and loneliness among married and unmarried older adults and whether this differs by gender. Method: Longitudinal data from the National Social Life, Health, and Aging Project on married ( n = 945) and unmarried ( n = 443) older adults (aged 57-85 years). Latent class analysis was used to identify SRCs. Tobit regression models examined the associations between FLs, SRCs, and loneliness. Results: Nine SRCs were identified. The effectiveness of SRCs for coping with FLs did not differ by marital status despite higher loneliness among the unmarried. Only for women with FLs did SRCs characterized by negativity/strain exacerbate loneliness. For men with FLs, SRCs characterized by excess positivity/support were problematic. Discussion: These findings underscore the importance of considering how SRCs provide resources for coping with FLs that have gendered implications.


Author(s):  
M. V. Machado ◽  
A. M. G. Tommaselli ◽  
V. M. Tachibana ◽  
R. P. Martins-Neto ◽  
M. B. Campos

<p><strong>Abstract.</strong> Vegetation mapping requires information about trees and underlying vegetation to ensure proper management of the urban and forest environments. This information can be obtained using remote sensors. For instance, lightweight systems composed of Unmanned Aerial Vehicles (UAVs) as a platform, low-cost laser units and the recent miniaturized navigation sensors (positioning and orientation) have become a very feasible and flexible alternative. Low-cost UAV-ALS systems usually provide centimetric accuracy in altimetry, according to flight data configuration and quality of observations. This paper presents a feasibility study of a lightweight ALS system on-board a UAV to estimate the diameters at breast height (DBH) of urban trees using LiDAR data and linear regression model. A mathematical model correlating the crown diameter and height of the tree to estimate the DBH was developed based on a linear regression with stepwise method. The stepwise linear regression method enables the addition and the removal of predictor variables through statistical tests. The tree samples were separated in two classes (A and B), according to the diametric distribution. These sample classes were used to define two linear regression models. The regression models that best fit the samples achieved an R<sup>2</sup> adj value above 94% for class A and B, which demonstrates the closeness between the samples and the developed mathematical models. The quality control of the proposed regression models was performed comparing the DBH values estimated and directly measured (reference). DBH of the trees were estimated with an average discrepancy of 8.7&amp;thinsp;cm.</p>


Author(s):  
Jason Anderson ◽  
Salvador Hernandez

Studies investigating crash rates by roadway classification are few and far between and even more rare if extended to focus on heavy vehicles. This study explored and compared two advanced econometric methods—random-parameter Tobit regression and latent class Tobit regression—to determine contributing factors for heavy-vehicle crashes per million vehicle miles traveled while accounting for the unobserved heterogeneity present in crash data. The increasing crash rates in Idaho, crash proportion by roadway classification, and available data made an ideal case study. Empirical results show that although the random-parameter Tobit regression model provides better insight into heavy-vehicle crash rates than the fixed-parameter approach, the latent class Tobit regression model is the preferred methodology for the given data set. Traffic volumes, roadway characteristics, and traffic control devices were among the variables found to be statistically significant. Results from this study provide an alternate framework to account for heterogeneity while identifying key factors by roadway classification that influence heavy-vehicle crash rates. The illustrated framework and analysis by roadway classification can provide guidance to transportation agencies and policy makers and prompt future studies to include a latent class analysis, analysis by road classification, or both.


Author(s):  
C. M. Gatwiri ◽  
M. M. Muraya ◽  
L. K. Gitonga

There is growing interest among the public in demography since demographic change has become the subject of political debates in many countries. Statistics on demography are used to support policy-making and monitor demographic behaviour of political, economic, social and cultural perspectives. Most studies have used descriptive statistics to study demographic characteristics. Moreover, most of these studies investigate effects of individual character at a time. Therefore, there is a need to come up with more robust statistical methods, such as predictive models for demographic studies. The objective of this study was to predict the effect of demographic characteristics on parity using Poisson regression model. Secondary data on parity, age, marital status and education level was collected from Chuka and Embu hospital maternal units from 2013 to 2017. The data was analysed using R-statistical software. Three Poisson regression models (PRMs) were fitted. The likelihood ratio test of all the Poisson regression models had p-values < 0.05 indicating that all the models were statistically significant. Deviance test and Akaike Information Criterion (AIC) were used to assess the fit of Poisson regression models. The overall Poisson model had residual deviance of 184.23, which was the lowest of all other fitted PRM models, suggesting that it was the best fit. The AIC of the PRM with both education and marital status as the predictors had the lowest AIC value of 2078.620, indicating that it was the best fitted model. The dispersion test proved that PRM was not over-dispersed, confirming the model as a good fit of the data. The improved model can be used in prediction of population growth rates.


2014 ◽  
Vol 60 (01) ◽  
pp. 19-25 ◽  
Author(s):  
Natalija Nakov ◽  
Jasmina Tonic-Ribarska ◽  
Aneta Dimitrovska ◽  
Rumenka Petkovska

The selection of an adequate regression model is the basis for obtaining accurate and reproducible results during the bionalytical method validation. Given the wide concentration range, frequently present in bioanalytical assays, heteroscedasticity of the data may be expected. Several weighted linear and quadratic regression models were evaluated during the selection of the adequate curve fit using nonparametric statistical tests: One sample rank test and Wilcoxon signed rank test for two independent groups of samples. The results obtained with One sample rank test could not give statistical justification for the selection of linear vs. quadratic regression models because slight differences between the error (presented through the relative residuals) were obtained. Estimation of the significance of the differences in the RR was achieved using Wilcoxon signed rank test, where linear and quadratic regression models were treated as two independent groups. The application of this simple non-parametric statistical test provides statistical confirmation of the choice of an adequate regression model.


Author(s):  
Sang Nguyen Minh

This study uses the DEA (Data Envelopment Analysis) method to estimate the technical efficiency index of 34 Vietnamese commercial banks in the period 2007-2015, and then it analyzes the impact of income diversification on the operational efficiency of Vietnamese commercial banks through a censored regression model - the Tobit regression model. Research results indicate that income diversification has positive effects on the operational efficiency of Vietnamese commercial banks in the research period. Based on study results, in this research some recommendations forpolicy are given to enhance the operational efficiency of Vietnam’s commercial banking system.


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 31
Author(s):  
Mariusz Specht

Positioning systems are used to determine position coordinates in navigation (air, land and marine). The accuracy of an object’s position is described by the position error and a statistical analysis can determine its measures, which usually include: Root Mean Square (RMS), twice the Distance Root Mean Square (2DRMS), Circular Error Probable (CEP) and Spherical Probable Error (SEP). It is commonly assumed in navigation that position errors are random and that their distribution are consistent with the normal distribution. This assumption is based on the popularity of the Gauss distribution in science, the simplicity of calculating RMS values for 68% and 95% probabilities, as well as the intuitive perception of randomness in the statistics which this distribution reflects. It should be noted, however, that the necessary conditions for a random variable to be normally distributed include the independence of measurements and identical conditions of their realisation, which is not the case in the iterative method of determining successive positions, the filtration of coordinates or the dependence of the position error on meteorological conditions. In the preface to this publication, examples are provided which indicate that position errors in some navigation systems may not be consistent with the normal distribution. The subsequent section describes basic statistical tests for assessing the fit between the empirical and theoretical distributions (Anderson-Darling, chi-square and Kolmogorov-Smirnov). Next, statistical tests of the position error distributions of very long Differential Global Positioning System (DGPS) and European Geostationary Navigation Overlay Service (EGNOS) campaigns from different years (2006 and 2014) were performed with the number of measurements per session being 900’000 fixes. In addition, the paper discusses selected statistical distributions that fit the empirical measurement results better than the normal distribution. Research has shown that normal distribution is not the optimal statistical distribution to describe position errors of navigation systems. The distributions that describe navigation positioning system errors more accurately include: beta, gamma, logistic and lognormal distributions.


Sign in / Sign up

Export Citation Format

Share Document