Development of nomograms to assess the risk of clinical outcome

A methodology is presented for developing nomograms for assessing and stratifying the risk of a clinical outcome based on the created virtual data set using the R software environment. The virtual data set included input numerical and factor variables (variable types correspond to the R software documentation) and outcome. For quantitative variables, descriptive statistics were calculated at all levels of the outcome variable, and mosaic diagrams were constructed for factor variables. As a model that describes the association of input variables with the outcome, a logistic regression model was used. A bootstrap method was applied to validate and evaluate the model performance. The calculated validity indicators showed an acceptable discriminatory ability of the predictive model. The statistical calibration demonstrated the proximity of the model’s calibration curve to the ideal calibration curve. Based on the logistic regression coefficients, a nomogram was constructed using which the risk value of a specific outcome was calculated for each subject (patient). It is shown that with the help of the presented technique it is possible to stratify patients effectively by the risk of an adverse outcome, thus adequately altering the diagnosis and treatment tactics. The use of a nomogram greatly simplifies risk assessment and can be used in paper form as a supplement to the patient examination protocol. The article contains the codes of the R programming language with explanations.

Download Full-text

Predicting voluntary turnover through human resources database analysis

Management Research Review ◽

10.1108/mrr-04-2017-0098 ◽

2018 ◽

Vol 41 (1) ◽

pp. 96-112 ◽

Cited By ~ 3

Author(s):

Evy Rombaut ◽

Marie-Anne Guerry

Keyword(s):

Logistic Regression ◽

Decision Tree ◽

Human Resources ◽

Real Life ◽

Model Performance ◽

Voluntary Turnover ◽

Private Company ◽

Data Set ◽

Content Type ◽

Individual Level

Purpose This paper aims to question whether the available data in the human resources (HR) system could result in reliable turnover predictions without supplementary survey information. Design/methodology/approach A decision tree approach and a logistic regression model for analysing turnover were introduced. The methodology is illustrated on a real-life data set of a Belgian branch of a private company. The model performance is evaluated by the area under the ROC curve (AUC) measure. Findings It was concluded that data in the personnel system indeed lead to valuable predictions of turnover. Practical implications The presented approach brings determinants of voluntary turnover to the surface. The results yield useful information for HR departments. Where the logistic regression results in a turnover probability at the individual level, the decision tree makes it possible to ascertain employee groups that are at risk for turnover. With the data set-based approach, each company can, immediately, ascertain their own turnover risk. Originality/value The study of a data-driven approach for turnover investigation has not been done so far.

Download Full-text

An empirical investigation of alternative semi-supervised segmentation methodologies

South African Journal of Science ◽

10.17159/sajs.2019/5359 ◽

2019 ◽

Vol 115 (3/4) ◽

Author(s):

Douw G. Breed ◽

Tanja Verster

Keyword(s):

Logistic Regression ◽

Model Performance ◽

Predictive Modelling ◽

Data Sets ◽

Validation Data ◽

Data Set ◽

Supervised Segmentation ◽

Improved Performance ◽

Validation Set ◽

Combination Approach

Segmentation of data for the purpose of enhancing predictive modelling is a well-established practice in the banking industry. Unsupervised and supervised approaches are the two main types of segmentation and examples of improved performance of predictive models exist for both approaches. However, both focus on a single aspect – either target separation or independent variable distribution – and combining them may deliver better results. This combination approach is called semi-supervised segmentation. Our objective was to explore four new semi-supervised segmentation techniques that may offer alternative strengths. We applied these techniques to six data sets from different domains, and compared the model performance achieved. The original semi-supervised segmentation technique was the best for two of the data sets (as measured by the improvement in validation set Gini), but others outperformed for the other four data sets. Significance: We propose four newly developed semi-supervised segmentation techniques that can be used as additional tools for segmenting data before fitting a logistic regression. In all comparisons, using semi-supervised segmentation before fitting a logistic regression improved the modelling performance (as measured by the Gini coefficient on the validation data set) compared to using unsegmented logistic regression.

Download Full-text

MAHALANOBIS DISTANCE AND ITS APPLICATION FOR DETECTING MULTIVARIATE OUTLIERS

Facta Universitatis Series Mathematics and Informatics ◽

10.22190/fumi1903583g ◽

2019 ◽

pp. 583

Author(s):

Hamid Ghorbani

Keyword(s):

Mahalanobis Distance ◽

Multivariate Data ◽

Real Data ◽

Statistical Computing ◽

Software Environment ◽

R Software ◽

Multivariate Statistical ◽

Data Set ◽

Multivariate Outliers ◽

Outliers Detection

While methods of detecting outliers is frequently implemented by statisticians when analyzing univariate data, identifying outliers in multivariate data pose challenges that univariate data do not. In this paper, after short reviewing some tools for univariate outliers detection, the Mahalanobis distance, as a famous multivariate statistical distances, and its ability to detect multivariate outliers are discussed. As an application the univariate and multivariate outliers of a real data set has been detected using R software environment for statistical computing.

Download Full-text

Primer on binary logistic regression

Family Medicine and Community Health ◽

10.1136/fmch-2021-001290 ◽

2021 ◽

Vol 9 (Suppl 1) ◽

pp. e001290

Author(s):

Jenine K Harris

Keyword(s):

Logistic Regression ◽

Family Medicine ◽

Binary Logistic Regression ◽

Model Fit ◽

Outcome Variable ◽

Binary Logistic Regression Model ◽

Data Set ◽

Medicine Research ◽

Independent Observations ◽

Predictor Model

Family medicine has traditionally prioritised patient care over research. However, recent recommendations to strengthen family medicine include calls to focus more on research including improving research methods used in the field. Binary logistic regression is one method frequently used in family medicine research to classify, explain or predict the values of some characteristic, behaviour or outcome. The binary logistic regression model relies on assumptions including independent observations, no perfect multicollinearity and linearity. The model produces ORs, which suggest increased, decreased or no change in odds of being in one category of the outcome with an increase in the value of the predictor. Model significance quantifies whether the model is better than the baseline value (ie, the percentage of people with the outcome) at explaining or predicting whether the observed cases in the data set have the outcome. One model fit measure is the count- R2, which is the percentage of observations where the model correctly predicted the outcome variable value. Related to the count- R2 are model sensitivity—the percentage of those with the outcome who were correctly predicted to have the outcome—and specificity—the percentage of those without the outcome who were correctly predicted to not have the outcome. Complete model reporting for binary logistic regression includes descriptive statistics, a statement on whether assumptions were checked and met, ORs and CIs for each predictor, overall model significance and overall model fit.

Download Full-text

Analysis of Traffic Accident Features and Crash Severity Prediction

International Journal of Cognitive Informatics and Natural Intelligence ◽

10.4018/ijcini.20211001oa37 ◽

2021 ◽

Vol 15 (4) ◽

pp. 0-0

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Recall Performance ◽

Model Performance ◽

Sampling Technique ◽

Performance Measure ◽

Road Accident ◽

Vehicle Crashes ◽

Data Set ◽

Severity Prediction

Vehicle crashes occur because of numerous factors. It leads to loss of lives and permanent incapacity. The budgetary expenses of both individuals as well as for the nation are influenced by vehicle crashes. According to Road accident statistics, a total of 464910 road accidents were reported in India, claiming 1,47,913 lives and causing injuries to 4,70,975 persons every year. In this work, the UK data set sourced from Kaggle is used. For the study, 17 attributes and 35k records of the year 2015 are considered. The data set is imbalanced, so to balance out the data, the over-sampling technique is used. Random Forest, Decision tree, Logistic Regression, and Gradient Naïve Bayes algorithms are used to predict the severity of Accidents. To evaluate the model, performance measures like Accuracy, Precision, Recall, F1-Score are used. When Accuracy, Precision, F1-Score performance measure is considered Random Forest yielded the best result. When Recall performance measure is used, Random forest for Fatal, Decision Trees for Serious, Logistic regression for Slight yielded the best result.

Download Full-text

Intimate Partner Violence and its Association With Contraceptive Use Among Women in Pakistan

Pakistan Journal of Psychological Research ◽

10.33824/pjpr.2019.34.1.9 ◽

2019 ◽

Vol 34 (Spring 2019) ◽

pp. 157-173

Author(s):

Kashif Siddique ◽

Rubeena Zakar ◽

Ra’ana Malik ◽

Naveeda Farhat ◽

Farah Deeba

Keyword(s):

Intimate Partner Violence ◽

Logistic Regression ◽

Partner Violence ◽

Contraceptive Use ◽

Binary Logistic Regression ◽

Intimate Partner ◽

Married Women ◽

Reproductive Age ◽

Outcome Variable ◽

Odds Ratios

The aim of this study is to find the association between Intimate Partner Violence (IPV) and contraceptive use among married women in Pakistan. The analysis was conducted by using cross sectional secondary data from every married women of reproductive age 15-49 years who responded to domestic violence module (N = 3687) of the 2012-13 Pakistan Demographic and Health Survey. The association between contraceptive use (outcome variable) and IPV was measured by calculating unadjusted odds ratios and adjusted odds ratios with 95% confidence intervals using simple binary logistic regression and multivariable binary logistic regression. The result showed that out of 3687 women, majority of women 2126 (57.7%) were using contraceptive in their marital relationship. Among total, 1154 (31.3%) women experienced emotional IPV, 1045 (28.3%) women experienced physical IPV and 1402 (38%) women experienced both physical and emotional IPV together respectively. All types of IPV was significantly associated with contraceptive use and women who reported emotional IPV (AOR 1.44; 95% CI 1.23, 1.67), physical IPV (AOR 1.41; 95% CI 1.20, 1.65) and both emotional and physical IPV together (AOR 1.49; 95% CI 1.24, 1.72) were more likely to use contraceptives respectively. The study revealed that women who were living in violent relationship were more likely to use contraceptive in Pakistan. Still there is a need for women reproductive health services and government should take initiatives to promote family planning services, awareness and access to contraceptive method options for women to reduce unintended or mistimed pregnancies that occurred in violent relationships.

Download Full-text

Association between Blood Urea Nitrogen-to-creatinine Ratio and Three-Month Outcome in Patients with Acute Ischemic Stroke

Current Neurovascular Research ◽

10.2174/1567202616666190412123705 ◽

2019 ◽

Vol 16 (2) ◽

pp. 166-172 ◽

Cited By ~ 1

Author(s):

Linghui Deng ◽

Changyi Wang ◽

Shi Qiu ◽

Haiyang Bian ◽

Lu Wang ◽

...

Keyword(s):

Logistic Regression ◽

Ischemic Stroke ◽

Clinical Outcome ◽

Acute Ischemic Stroke ◽

Blood Urea Nitrogen ◽

Univariate Analysis ◽

Density Lipoprotein ◽

Hydration Status ◽

Creatinine Ratio ◽

Urea Nitrogen

Background: Hydration status significantly affects the clinical outcome of acute ischemic stroke (AIS) patients. Blood urea nitrogen-to-creatinine ratio (BUN/Cr) is a biomarker of hydration status. However, it is not known whether there is a relationship between BUN/Cr and three-month outcome as assessed by the modified Rankin Scale (mRS) score in AIS patients. Methods: AIS patients admitted to West China Hospital from 2012 to 2016 were prospectively and consecutively enrolled and baseline data were collected. Poor clinical outcome was defined as three-month mRS > 2. Univariate and multivariate logistic regression analyses were performed to determine the relationship between BUN/Cr and three-month outcome. Confounding factors were identified by univariate analysis. Stratified logistic regression analysis was performed to identify effect modifiers. Results: A total of 1738 patients were included in the study. BUN/Cr showed a positive correlation with the three-month outcome (OR 1.02, 95% CI 1.00-1.03, p=0.04). However, after adjusting for potential confounders, the correlation was no longer significant (p=0.95). An interaction between BUN/Cr and high-density lipoprotein (HDL) was discovered (p=0.03), with a significant correlation between BUN/Cr and three-month outcome in patients with higher HDL (OR 1.03, 95% CI 1.00-1.07, p=0.04). Conclusion: Elevated BUN/Cr is associated with poor three-month outcome in AIS patients with high HDL levels.

Download Full-text

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Download Full-text

Is Indoor Air Pollution From Different Fuel Types Associated With the Anemia Status of Pregnant Women in Ethiopia?

Journal of Primary Care & Community Health ◽

10.1177/21501327211034374 ◽

2021 ◽

Vol 12 ◽

pp. 215013272110343

Author(s):

Sewitemariam Desalegn Andarge ◽

Abriham Sheferaw Areba ◽

Robel Hussen Kabthymer ◽

Miheret Tesfu Legesse ◽

Girum Gebremeskel Kanno

Keyword(s):

Air Pollution ◽

Logistic Regression ◽

Pregnant Women ◽

Indoor Air ◽

Pregnancy Outcomes ◽

Indoor Air Pollution ◽

Adverse Pregnancy Outcomes ◽

Outcome Variable ◽

Multivariable Logistic Regression Analysis ◽

Adverse Pregnancy

Background Indoor air pollution from different fuel types has been linked with different adverse pregnancy outcomes. The study aimed to assess the link between indoor air pollution from different fuel types and anemia during pregnancy in Ethiopia. Method We have used the secondary data from the 2016 Ethiopian Demographic and Health Survey data. The anemia status of the pregnant women was the dichotomous outcome variable and the type of fuel used in the house was classified as high, medium, and low polluting fuels. Logistic regression was employed to determine the association between the exposure and outcome variables. Adjusted Odds Ratio was calculated at 95% Confidence Interval. Result The proportion of anemia in the low, medium, and high polluting fuel type users was 13.6%, 46%, 40.9% respectively. In the multivariable logistic regression analysis, the use of either kerosene or charcoal fuel types (AOR 4.6; 95% CI: 1.41-18.35) and being in the third trimester (AOR 1.72; 95% CI: 1.12-2.64) were significant factors associated with the anemia status of the pregnant women in Ethiopia. Conclusion According to our findings, the application of either kerosene or charcoal was associated with the anemia status during pregnancy in Ethiopia. An urgent intervention is needed to reduce the indoor air pollution that is associated with adverse pregnancy outcomes such as anemia.

Download Full-text

Predictors of Anxiety-Induced Sleep Disturbance among in-School Adolescents in Ghana: Evidence from the 2012 Global School-Based Health Survey

Behavioral Sciences ◽

10.3390/bs11020020 ◽

2021 ◽

Vol 11 (2) ◽

pp. 20 ◽

Cited By ~ 1

Author(s):

Bright Opoku Ahinkorah ◽

Richard Gyan Aboagye ◽

Francis Arthur-Holmes ◽

Abdul-Aziz Seidu ◽

James Boadu Frimpong ◽

...

Keyword(s):

Logistic Regression ◽

Sleep Disturbance ◽

Emotional Disorders ◽

Health Survey ◽

Safety Concern ◽

Statistical Significance ◽

Outcome Variable ◽

Multivariable Logistic Regression Analysis ◽

School Based ◽

Multivariable Logistic Regression

(1) Background: Psychological problems of adolescents have become a global health and safety concern. Empirical evidence has shown that adolescents experience diverse mental health conditions (e.g., anxiety, depression, and emotional disorders). However, research on anxiety-induced sleep disturbance among in-school adolescents has received less attention, particularly in low- and middle-income countries. This study’s central focus was to examine factors associated with t anxiety-induced sleep disturbance among in-school adolescents in Ghana. (2) Methods: Analysis was performed using the 2012 Global School-based Health Survey (GSHS). A sample of 1342 in-school adolescents was included in the analysis. The outcome variable was anxiety-induced sleep disturbance reported during the past 12 months. Frequencies, percentages, chi-square, and multivariable logistic regression analyses were conducted. Results from the multivariable logistic regression analysis were presented as crude and adjusted odds ratios at 95% confidence intervals (CIs) and with a statistical significance declared at p < 0.05. (3) Results: Adolescents who went hungry were more likely to report anxiety-induced sleep disturbance compared to their counterparts who did not report hunger (aOR = 1.68, CI = 1.10, 2.57). The odds of anxiety-induced sleep disturbance were higher among adolescents who felt lonely compared to those that never felt lonely (aOR = 2.82, CI = 1.98, 4.01). Adolescents who had sustained injury were more likely to have anxiety-induced sleep disturbance (aOR = 1.49, CI = 1.03, 2.14) compared to those who had no injury. Compared to adolescents who never had suicidal ideations, those who reported experiencing suicidal ideations had higher odds of anxiety-induced sleep disturbance (aOR = 1.68, CI = 1.05, 2.71). (4) Conclusions: Anxiety-induced sleep disturbance among in-school adolescents were significantly influenced by the psychosocial determinants such as hunger, loneliness, injury, and suicidal ideation in this study. The findings can help design appropriate interventions through effective strategies (e.g., early school-based screening, cognitive-behavioral therapy, face-face counseling services) to reduce psychosocial problems among in-school adolescents in Ghana.

Download Full-text