scholarly journals Genetic and Psychosocial Predictors of Aggression: Variable Selection and Model Building With Component-Wise Gradient Boosting

Author(s):  
Robert Suchting ◽  
Joshua L. Gowin ◽  
Charles E. Green ◽  
Consuelo Walss-Bass ◽  
Scott D. Lane
2020 ◽  
Vol 41 (S1) ◽  
pp. s521-s522
Author(s):  
Debarka Sengupta ◽  
Vaibhav Singh ◽  
Seema Singh ◽  
Dinesh Tewari ◽  
Mudit Kapoor ◽  
...  

Background: The rising trend of antibiotic resistance imposes a heavy burden on healthcare both clinically and economically (US$55 billion), with 23,000 estimated annual deaths in the United States as well as increased length of stay and morbidity. Machine-learning–based methods have, of late, been used for leveraging patient’s clinical history and demographic information to predict antimicrobial resistance. We developed a machine-learning model ensemble that maximizes the accuracy of such a drug-sensitivity versus resistivity classification system compared to the existing best-practice methods. Methods: We first performed a comprehensive analysis of the association between infecting bacterial species and patient factors, including patient demographics, comorbidities, and certain healthcare-specific features. We leveraged the predictable nature of these complex associations to infer patient-specific antibiotic sensitivities. Various base-learners, including k-NN (k-nearest neighbors) and gradient boosting machine (GBM), were used to train an ensemble model for confident prediction of antimicrobial susceptibilities. Base learner selection and model performance evaluation was performed carefully using a variety of standard metrics, namely accuracy, precision, recall, F1 score, and Cohen κ. Results: For validating the performance on MIMIC-III database harboring deidentified clinical data of 53,423 distinct patient admissions between 2001 and 2012, in the intensive care units (ICUs) of the Beth Israel Deaconess Medical Center in Boston, Massachusetts. From ~11,000 positive cultures, we used 4 major specimen types namely urine, sputum, blood, and pus swab for evaluation of the model performance. Figure 1 shows the receiver operating characteristic (ROC) curves obtained for bloodstream infection cases upon model building and prediction on 70:30 split of the data. We received area under the curve (AUC) values of 0.88, 0.92, 0.92, and 0.94 for urine, sputum, blood, and pus swab samples, respectively. Figure 2 shows the comparative performance of our proposed method as well as some off-the-shelf classification algorithms. Conclusions: Highly accurate, patient-specific predictive antibiogram (PSPA) data can aid clinicians significantly in antibiotic recommendation in ICU, thereby accelerating patient recovery and curbing antimicrobial resistance.Funding: This study was supported by Circle of Life Healthcare Pvt. Ltd.Disclosures: None


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
David E. Booth ◽  
Venugopal Gopalakrishna-Remani ◽  
Matthew L. Cooper ◽  
Fiona R. Green ◽  
Margaret P. Rayman

AbstractWe begin by arguing that the often used algorithm for the discovery and use of disease risk factors, stepwise logistic regression, is unstable. We then argue that there are other algorithms available that are much more stable and reliable (e.g. the lasso and gradient boosting). We then propose a protocol for the discovery and use of risk factors using lasso or boosting variable selection. We then illustrate the use of the protocol with a set of prostate cancer data and show that it recovers known risk factors. Finally, we use the protocol to identify new and important SNP based risk factors for prostate cancer and further seek evidence for or against the hypothesis of an anticancer function for Selenium in prostate cancer. We find that the anticancer effect may depend on the SNP-SNP interaction and, in particular, which alleles are present.


2020 ◽  
Vol 36 (4) ◽  
pp. 1189-1198
Author(s):  
Nureni Olawale Adeboye ◽  
Olawale Victor Abimbola

Machine learning is a branch of artificial intelligence that helps machines learn from observational data without being explicitly programmed and its methods have been found to be very useful in the modern age for medical diagnosis and for early detection of diseases. According to the World Health Organization, 12 million deaths occur annually due to heart-related diseases. Thus, its early detection and treatment are of interest. This research introduces a better way of improving the timely prediction of cardiovascular diseases in suspected patients by comparing the efficiency of two boosting algorithms with four (4) other single based classifiers on cardiovascular official data. The best model was selected based on performances of 5 different evaluation metrics. From the results, Adaptive boosting is seen to outperform all other algorithms with a classification accuracy of 74.2%, closely followed by gradient boosting. However, gradient boosting was chosen as an acceptable technique because it trains faster than Adaboost with a better precision of 74.9% compared to 74.7% exhibited by Adaboost. Thus boosting algorithms are better predictors compared to single based classifiers with factors of age, systolic blood pressure, weight, cholesterol, height, and diastolic blood pressure as the major contributors to the model building.


SOIL ◽  
2017 ◽  
Vol 3 (4) ◽  
pp. 191-210 ◽  
Author(s):  
Madlene Nussbaum ◽  
Lorenz Walthert ◽  
Marielle Fraefel ◽  
Lucie Greiner ◽  
Andreas Papritz

Abstract. High-resolution maps of soil properties are a prerequisite for assessing soil threats and soil functions and for fostering the sustainable use of soil resources. For many regions in the world, accurate maps of soil properties are missing, but often sparsely sampled (legacy) soil data are available. Soil property data (response) can then be related by digital soil mapping (DSM) to spatially exhaustive environmental data that describe soil-forming factors (covariates) to create spatially continuous maps. With airborne and space-borne remote sensing and multi-scale terrain analysis, large sets of covariates have become common. Building parsimonious models amenable to pedological interpretation is then a challenging task. We propose a new boosted geoadditive modelling framework (geoGAM) for DSM. The geoGAM models smooth non-linear relations between responses and single covariates and combines these model terms additively. Residual spatial autocorrelation is captured by a smooth function of spatial coordinates, and non-stationary effects are included through interactions between covariates and smooth spatial functions. The core of fully automated model building for geoGAM is component-wise gradient boosting. We illustrate the application of the geoGAM framework by using soil data from the Canton of Zurich, Switzerland. We modelled effective cation exchange capacity (ECEC) in forest topsoils as a continuous response. For agricultural land we predicted the presence of waterlogged horizons in given soil depths as binary and drainage classes as ordinal responses. For the latter we used proportional odds geoGAM, taking the ordering of the response properly into account. Fitted geoGAM contained only a few covariates (7 to 17) selected from large sets (333 covariates for forests, 498 for agricultural land). Model sparsity allowed for covariate interpretation through partial effects plots. Prediction intervals were computed by model-based bootstrapping for ECEC. The predictive performance of the fitted geoGAM, tested with independent validation data and specific skill scores for continuous, binary and ordinal responses, compared well with other studies that modelled similar soil properties. Skill score (SS) values of 0.23 to 0.53 (with SS = 1 for perfect predictions and SS = 0 for zero explained variance) were achieved depending on the response and type of score. GeoGAM combines efficient model building from large sets of covariates with effects that are easy to interpret and therefore likely raises the acceptance of DSM products by end-users.


2017 ◽  
Author(s):  
Madlene Nussbaum ◽  
Lorenz Walthert ◽  
Marielle Fraefel ◽  
Lucie Greiner ◽  
Andreas Papritz

Abstract. High-resolution maps of soil properties are a prerequisite for assessing soil threats and soil functions and to foster sustainable use of soil resources. For many regions in the world precise maps of soil properties are missing, but often sparsely sampled and discontinuous (legacy) soil data are available. Soil property data (response) can then be related by digital soil mapping (DSM) to spatially exhaustive environmental data that describe soil forming factors (covariates) to create spatially continuous maps. With air- and spaceborne remote sensing data and multi-scale terrain analysis large sets of covariates have become common. Building parsimonious models, amenable to pedological interpretation, is then a challenging task. We propose a new boosted geoadditive modelling framework (geoGAM) for DSM. A geoGAM models smooth nonlinear relations between responses and single covariates and combines these model terms additively. Residual spatial autocorrelation is captured by a smooth function of spatial coordinates and nonstationary effects are included by interactions between covariates and smooth spatial functions. The core of fully automated model building for geoGAM is componentwise gradient boosting. We illustrate the application of the geoGAM framework by using soil data from the Canton of Zurich, Switzerland. We modelled effective cation exchange capacity (ECEC) in forest topsoils as continuous response. For agricultural land we predicted the presence of waterlogged horizons in given soil depth layers as binary and drainage classes as ordinal responses. For the latter we used proportional odds geoGAM taking the ordering of the response properly into account. Fitted geoGAM contained only few covariates (7 to 17) selected from large sets (333 covariates for forests, 498 for agricultural land). Model sparsity allowed covariate interpretation by partial effects plots. Prediction intervals were computed by model-based bootstrapping for ECEC. Predictive performance of the fitted geoGAM, tested with independent validation data and specific skill scores (SS) for continuous, binary and ordinal responses, compared well with other studies that modelled similar soil properties. SS of 0.23 up to 0.53 (with SS = 1 for perfect predictions and SS = 0 for zero explained variance) were achieved depending on response and type of score. geoGAM combines efficient model building from large sets of covariates with ease of effect interpretation and therefore likely raises the acceptance of DSM products by end-users.


1996 ◽  
Vol 172 ◽  
pp. 447-450 ◽  
Author(s):  
M. L. Bougeard ◽  
J.-F. Bange ◽  
M. Mahfouz ◽  
A. Bec-Borsenberger

In order to evaluate a possible rotation between the Hipparcos and the dynamical reference frames, Hipparcos minor planets preliminary data are analysed. The resolution of the problem is very sensitive to correlations induced by the short length of the interval of observation. Several statistical methods are performed to appreciate the factors of bad conditioning. A procedure for variable selection and model building is given.


2017 ◽  
Vol 28 (3) ◽  
pp. 673-687 ◽  
Author(s):  
Janek Thomas ◽  
Andreas Mayr ◽  
Bernd Bischl ◽  
Matthias Schmid ◽  
Adam Smith ◽  
...  

2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Janek Thomas ◽  
Tobias Hepp ◽  
Andreas Mayr ◽  
Bernd Bischl

We present a new variable selection method based on model-based gradient boosting and randomly permuted variables. Model-based boosting is a tool to fit a statistical model while performing variable selection at the same time. A drawback of the fitting lies in the need of multiple model fits on slightly altered data (e.g., cross-validation or bootstrap) to find the optimal number of boosting iterations and prevent overfitting. In our proposed approach, we augment the data set with randomly permuted versions of the true variables, so-called shadow variables, and stop the stepwise fitting as soon as such a variable would be added to the model. This allows variable selection in a single fit of the model without requiring further parameter tuning. We show that our probing approach can compete with state-of-the-art selection methods like stability selection in a high-dimensional classification benchmark and apply it on three gene expression data sets.


Sign in / Sign up

Export Citation Format

Share Document