variable selection procedure
Recently Published Documents


TOTAL DOCUMENTS

41
(FIVE YEARS 9)

H-INDEX

9
(FIVE YEARS 1)

PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0254112
Author(s):  
Faisal Maqbool Zahid ◽  
Shahla Faisal ◽  
Christian Heumann

Multiple Imputation (MI) is always challenging in high dimensional settings. The imputation model with some selected number of predictors can be incompatible with the analysis model leading to inconsistent and biased estimates. Although compatibility in such cases may not be achieved, but one can obtain consistent and unbiased estimates using a semi-compatible imputation model. We propose to relax the lasso penalty for selecting a large set of variables (at most n). The substantive model that also uses some formal variable selection procedure in high-dimensional structures is then expected to be nested in this imputation model. The resulting imputation model will be semi-compatible with high probability. The likelihood estimates can be unstable and can face the convergence issues as the number of variables becomes nearly as large as the sample size. To address these issues, we further propose to use a ridge penalty for obtaining the posterior distribution of the parameters based on the observed data. The proposed technique is compared with the standard MI software and MI techniques available for high-dimensional data in simulation studies and a real life dataset. Our results exhibit the superiority of the proposed approach to the existing MI approaches while addressing the compatibility issue.


2021 ◽  
Vol 69 (1) ◽  
pp. 7-13
Author(s):  
Md Abdus Salam Akanda ◽  
Most Sonia Khatun ◽  
AHM Musfiqur Rahman Nabeen

Underweight and overweight problems have serious consequences on the health status of women in Bangladesh. The objective of this study is to find the important factors that may influence a woman for being underweight, overweight and obese. Multinomial logistic regression model is fitted for this purpose. The stepwise variable selection procedure is used to select covariates for the model. Information of ever-married 15,323 non-pregnant women is extracted from Bangladesh Demographic and Health Survey, 2014 data. Seven covariates (region, living place, wealth index, respondent‟s marital status, current working status, education, and current age) are selected finally for the model from the initially considered thirteen variables. The results of the study demonstrate that the women living in Sylhet region, rural area, widowed or divorced, having less education and younger age are more likely to become underweight. Conversely, the women are living in Khulna region, urban area, married, not working, having more than 10 years of schooling and age 35-49 are at higher risk of experiencing overweight or obesity. Thus, the Government of Bangladesh should take proper initiatives to improve underweight and overweight problem of women considering the findings of this study. Dhaka Univ. J. Sci. 69(1): 7-13, 2021 (January)


Author(s):  
Liadira Kusuma Widya ◽  
Chin-Yu Hsu ◽  
Hsiao-Yun Lee ◽  
Lalu Muhamad Jaelani ◽  
Shih-Chun Candice Lung ◽  
...  

Because of fast-paced industrialization, urbanization, and population growth in Indonesia, there are serious health issues in the country resulting from air pollution. This study uses geospatial modelling technologies, namely land-use regression (LUR), geographically weighted regression (GWR), and geographic and temporal weighted regression (GTWR) models, to assess variations in particulate matter (PM10) and nitrogen dioxide (NO2) concentrations in Surabaya City, Indonesia. This is the first study to implement spatiotemporal variability of air pollution concentrations in Surabaya City, Indonesia. To develop the prediction models, air pollution data collected from seven monitoring stations from 2010 to 2018 were used as dependent variables, while land-use/land cover allocations within a 250 m to 5000 m circular buffer range surrounding the monitoring stations were collected as independent variables. A supervised stepwise variable selection procedure was applied to identify the important predictor variables for developing the LUR, GWR, and GTWR models. The developed models of LUR, GWR, and GTWR accounted for 49%, 50%, and 51% of PM10 variations and 46%, 47%, and 48% of NO2 variations, respectively. The GTWR model performed better (R2 = 0.51 for PM10 and 0.48 for NO2) than the other two models (R2 = 0.49–0.50 for PM10 and 0.46–0.47 for NO2), LUR and GWR. In the PM10 model four predictor variables, public facility, industry and warehousing, paddy field, and normalized difference vegetation index (NDVI), were selected during the variable selection procedure. Meanwhile, paddy field, residential area, rainfall, and temperature played important roles in explaining NO2 variations. Because of biomass burning issues in South Asia, the paddy field, which has a positive correlation with PM10 and NO2, was selected as a predictor. By using long-term monitoring data to establish prediction models, this model may better depict PM10 and NO2 concentration variations within areas across Asia.


2020 ◽  
Vol 12 (10) ◽  
pp. 1637 ◽  
Author(s):  
Le Bienfaiteur T. Sagang ◽  
Pierre Ploton ◽  
Bonaventure Sonké ◽  
Hervé Poilvé ◽  
Pierre Couteron ◽  
...  

Precise accounting of carbon stocks and fluxes in tropical vegetation using remote sensing approaches remains a challenging exercise, as both signal saturation and ground sampling limitations contribute to inaccurate extrapolations. Airborne LiDAR Scanning (ALS) data can be used as an intermediate level to radically increase sampling and enhance model calibration. Here we tested the potential of using ALS data for upscaling vegetation aboveground biomass (AGB) from field plots to a forest-savanna transitional landscape in the Guineo–Congolian region in Cameroon, using either a design-based approach or a model-based approach leveraging multispectral satellite imagery. Two sets of reference data were used: (1) AGB values collected from 62 0.16-ha plots distributed both in forests and savannas; and (2) an AGB map generated form ALS data. In the model-based approach, we trained Random Forest models using predictors from recent sensors of varying spectral and spatial resolutions (Spot 6/7, Landsat 8, and Sentinel 2), along with biophysical predictors derived after pre-processing into the Overland processing chain, following a forward variable selection procedure with a spatial 4-folds cross validation. The models calibrated with field plots lead to a systematic overestimation in AGB density estimates and a root mean squared prediction error (RMSPE) of up to 65 Mg.ha−1 (90%), whereas calibration with ALS lead to low bias and a drop of ~30% in RMSPE (down to 43 Mg.ha−1, 58%) with little effect of the satellite sensor used. Decomposing bias along the AGB density range, we show that multispectral images can (in some specific cases) be used for unbiased prediction at landscape scale on the basis of ALS-calibrated statistical models. However, our results also confirm that, whatever the spectral indices used and attention paid to sensor quality and pre-processing, the signal is not sufficient to warrant accurate pixelwise predictions, because of large relative RMSPE, especially above (200–250 t/ha). The design-based approach, for which average AGB density values were attributed to mapped land cover classes, proved to be a simple and reliable alternative (for landscape to region level estimations), when trained with dense ALS samples.


2020 ◽  
Vol 3 (1) ◽  
pp. 66-80 ◽  
Author(s):  
Sierra A. Bainter ◽  
Thomas G. McCauley ◽  
Tor Wager ◽  
Elizabeth A. Reynolds Losin

Frequently, researchers in psychology are faced with the challenge of narrowing down a large set of predictors to a smaller subset. There are a variety of ways to do this, but commonly it is done by choosing predictors with the strongest bivariate correlations with the outcome. However, when predictors are correlated, bivariate relationships may not translate into multivariate relationships. Further, any attempts to control for multiple testing are likely to result in extremely low power. Here we introduce a Bayesian variable-selection procedure frequently used in other disciplines, stochastic search variable selection (SSVS). We apply this technique to choosing the best set of predictors of the perceived unpleasantness of an experimental pain stimulus from among a large group of sociocultural, psychological, and neurobiological (functional MRI) individual-difference measures. Using SSVS provides information about which variables predict the outcome, controlling for uncertainty in the other variables of the model. This approach yields new, useful information to guide the choice of relevant predictors. We have provided Web-based open-source software for performing SSVS and visualizing the results.


2019 ◽  
Vol 29 (8) ◽  
pp. 2238-2249
Author(s):  
Daiane A Zuanetti ◽  
Júlia M Pavan Soler ◽  
José E Krieger ◽  
Luis A Milan

QTL mapping is an important tool for identifying regions in chromosomes which are relevant to explain a response of interest. It is a special case of the regression model where an unknown number of missing (non-observable) covariates is involved leading to a complex variable selection procedure. Although several methods have been proposed to identify QTLs and to estimate parameters in the associated model, minimum attention has been devoted to the estimated model adequacy. In this paper, we present an overview of a few methods for residual and diagnostic analysis in the context of Bayesian regression modeling and adapt them to work with QTL mapping. The motivation of this study is to identify QTLs associated with the blood pressure of F2 rats and check the fitted model adequacy.


2019 ◽  
Author(s):  
Zarina Vakhitova ◽  
Rob Mawby ◽  
Clair Alston-Knox

Crime can have a significant and long-lasting effect on its victims. While the literature on victim impact from traditional types of crime like robbery or assault is well established, little of the published research examining the impact of online crime like cyber abuse. The current paper examines victim impact and self-protective behaviours following victimization from different types of cyber abuse. Using the data from a large sample of American adults (N = 1,463) we identified the factors predictive of higher victim impact and adoption of self-protective behaviours, modelling the data using a Bayesian variable selection procedure implemented via a stochastic search algorithm in AutoStat\textregistered. Our findings suggest that controlling for socio-demographic characteristics such as age, gender, race and employment, different types of cyber abuse are important explanations of both victim impact and self-protective behaviours following cyber abuse victimization. Findings from this study contribute to both our understanding of cyber abuse as a broad crime category, the mechanism of adoption self-protective behaviours following victimization, as well as help inform policy responses to the needs of cyber abuse victims.


2019 ◽  
Vol 9 (2) ◽  
pp. 271-288 ◽  
Author(s):  
Jiajie Chen ◽  
Anthony Hou ◽  
Thomas Y Hou

Abstract In many applications, we need to study a linear regression model that consists of a response variable and a large number of potential explanatory variables, and determine which variables are truly associated with the response. In Foygel Barber & Candès (2015, Ann. Statist., 43, 2055–2085), the authors introduced a new variable selection procedure called the knockoff filter to control the false discovery rate (FDR) and proved that this method achieves exact FDR control. In this paper, we propose a prototype knockoff filter for group selection by extending the Reid–Tibshirani (2016, Biostatistics, 17, 364–376) prototype method. Our prototype knockoff filter improves the computational efficiency and statistical power of the Reid–Tibshirani prototype method when it is applied for group selection. In some cases when the group features are spanned by one or a few hidden factors, we demonstrate that the Principal Component Analysis (PCA) prototype knockoff filter outperforms the Dai–Foygel Barber (2016, 33rd International Conference on Machine Learning (ICML 2016)) group knockoff filter. We present several numerical experiments to compare our prototype knockoff filter with the Reid–Tibshirani prototype method and the group knockoff filter. We have also conducted some analysis of the knockoff filter. Our analysis reveals that some knockoff path method statistics, including the Lasso path statistic, may lead to loss of power for certain design matrices and a specially designed response even if their signal strengths are still relatively strong.


Sign in / Sign up

Export Citation Format

Share Document