Predicting Methamphetamine Use of Homeless Youths Attending High School: Comparison of Decision Rules and Logistic Regression Classification Algorithms

Objective: To assess the patient-related barriers to access of some virtual healthcare tools among cancer patients in the USA in a population-based cohort. Materials & methods: National Health Interview Survey datasets (2011–2018) were reviewed and adult participants (≥18 years old) with a history of cancer diagnosis and complete information about virtual healthcare utilization (defined by [a] filling a prescription on the internet in the past 12 months and/or [b] communicating with a healthcare provider through email in the past 12 months) were included. Information about video-conferenced phone calls and telephone calls are not available in the National Health Interview Survey datasets; and thus, they were not examined in this study. Multivariable logistic regression analysis was used to evaluate factors associated with the utilization of virtual care tools. Results: A total of 25,121 participants were included in the current analysis; including 4499 participants (17.9%) who utilized virtual care in the past 12 months and 20,622 participants (82.1%) who did not utilize virtual care in the past 12 months. The following factors were associated with less utilization of virtual healthcare tools in multivariable logistic regression: older age (continuous odds ratio [OR] with increasing age: 0.987; 95% CI: 0.984–0.990), African-American race (OR for African American vs white race: 0.608; 95% CI: 0.517–0.715), unmarried status (OR for unmarried compared with married status: 0.689; 95% CI: 0.642–0.739), lower level of education (OR for education ≤high school vs >high school: 0.284; 95% CI: 0.259–0.311), weaker English proficiency (OR for no proficiency vs very good proficiency: 0.224; 95% CI: 0.091–0.552) and lower yearly earnings (OR for earnings <$45,000 vs earnings >$45,000: 0.582; 95% CI: 0.523–0.647). Conclusion: Older patients, those with African-American race, lower education, lower earnings and weak English proficiency are less likely to access the above studied virtual healthcare tools. Further efforts are needed to tackle disparities in telemedicine access.

Download Full-text

Predicting 30-day Hospital Readmission with Publicly Available Administrative Database

Methods of Information in Medicine ◽

10.3414/me14-02-0017 ◽

2015 ◽

Vol 54 (06) ◽

pp. 560-567 ◽

Cited By ~ 11

Author(s):

K. Zhu ◽

Z. Lou ◽

J. Zhou ◽

N. Ballester ◽

P. Parikh ◽

...

Keyword(s):

Heart Failure ◽

Logistic Regression ◽

Decision Tree ◽

Ad Hoc ◽

Prediction Models ◽

Conditional Logistic Regression ◽

Hospital Readmissions ◽

Decision Rules ◽

Classification Models ◽

Standard Classification

SummaryIntroduction: This article is part of the Focus Theme of Methods of Information in Medicine on “Big Data and Analytics in Healthcare”.Background: Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners.Objectives: Explore the use of conditional logistic regression to increase the prediction accuracy.Methods: We analyzed an HCUP statewide in-patient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models.Results: The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 – 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures.Conclusions: It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.

Download Full-text

Supervised Learning Applied to Graduation Forecast of Industrial Engineering Students

European Journal of Educational Research ◽

10.12973/eu-jer.11.1.325 ◽

2022 ◽

Vol 11 (1) ◽

pp. 325-337

Author(s):

Natalia Gil ◽

Marcelo Albuquerque ◽

Gabriela de

Keyword(s):

Machine Learning ◽

High School ◽

Logistic Regression ◽

Supervised Learning ◽

Grade Point Average ◽

Engineering Students ◽

Learning Algorithm ◽

Industrial Engineering ◽

Machine Learning Algorithm ◽

Grade Point

<p style="text-align: justify;">The article aims to develop a machine-learning algorithm that can predict student’s graduation in the Industrial Engineering course at the Federal University of Amazonas based on their performance data. The methodology makes use of an information package of 364 students with an admission period between 2007 and 2019, considering characteristics that can affect directly or indirectly in the graduation of each one, being: type of high school, number of semesters taken, grade-point average, lockouts, dropouts and course terminations. The data treatment considered the manual removal of several characteristics that did not add value to the output of the algorithm, resulting in a package composed of 2184 instances. Thus, the logistic regression, MLP and XGBoost models developed and compared could predict a binary output of graduation or non-graduation to each student using 30% of the dataset to test and 70% to train, so that was possible to identify a relationship between the six attributes explored and achieve, with the best model, 94.15% of accuracy on its predictions.</p>

Download Full-text

Bullets, Blades, and Being Afraid in Hispanic High Schools: An Exploratory Study of the Presence of Weapons and Fear of Weapon-Associated Victimization Among High School Students in a Border Town

Crime & Delinquency ◽

10.1177/0011128703254916 ◽

2004 ◽

Vol 50 (3) ◽

pp. 372-394 ◽

Cited By ~ 41

Author(s):

Ben Brown ◽

Wm. Reed Benedict

Keyword(s):

High School ◽

Logistic Regression ◽

High School Students ◽

High Schools ◽

Exploratory Study ◽

Policy Implications ◽

Regression Analyses ◽

School Students ◽

Border Town ◽

Language Spoken At Home

This article presents data obtained from a survey of high school students in Brownsville, Texas. Almost half of the students reported having seen other students carry knives at school, roughly 1 in 10 reported having seen other students carry guns at school, and more than 1 in 5 reported being fearful of weapon-associated victimization at school. Logistic regression analyses indicate that age, gender, seeing other students carry weapons, and involvement with student clubs/organizations significantly affect fear of weapon-associated victimization. Using language spoken at home as a measure of acculturation, it was also determined that immigrant juveniles are more fearful of weaponassociated victimization than nonimmigrant juveniles. The theoretical and policy implications of the findings are discussed.

Download Full-text

Lazy Classification Algorithms Based on Deterministic and Inhibitory Decision Rules

Inhibitory Rules in Data Analysis - Studies in Computational Intelligence ◽

10.1007/978-3-540-85638-2_8 ◽

2008 ◽

pp. 99-106

Author(s):

Pawel Delimata ◽

Mikhail Ju. Moshkov ◽

Andrzej Skowron ◽

Zbigniew Suraj

Keyword(s):

Decision Rules ◽

Classification Algorithms

Download Full-text

0935 Association Between Chronotype, Sleep Duration, Weekend Catch-Up Sleep, and Depression Among Korean High School Students

SLEEP ◽

10.1093/sleep/zsaa056.931 ◽

2020 ◽

Vol 43 (Supplement_1) ◽

pp. A355-A356

Author(s):

K Yang ◽

K Jee Hyun ◽

Y Hwangbo ◽

D Koo ◽

D Kim ◽

...

Keyword(s):

High School ◽

Logistic Regression ◽

High School Students ◽

Sleep Apnea ◽

Sleep Duration ◽

Internet Addiction ◽

School Students ◽

Increased Risk ◽

Sleep Environment ◽

Catch Up

Abstract Introduction The present study aimed to examine the association between chronotype, sleep duration, weekend catch-up sleep (CUS) duration, and depression among Korean high school students. Methods A total of 8,565 high school students who were analyzed from 15 nationwide districts in South Korea completed an online self-report questionnaire. Depressive mood was assessed using the Korean version of the Beck Depression Inventory. The following sleep characteristics were assessed: weekday and weekend sleep durations, weekend CUS duration, chronotype, perceived sufficiency of sleep, self-reported snoring and sleep apnea, daytime sleepiness, and sleep environment. Age, sex, body mass index, number of private classes, and proneness to internet addiction were also measured. Logistic regression analysis was conducted to compute odds ratios for the association between depression and sleep characteristics, after controlling for relevant covariates. Results The prevalence of depression (BDI ≥ 16) was 1,794 (20.9%). In the analyses of multivariate logistic regression, the late chronotype (odds ratio [OR], 1.71; 95% CI, 1.47-1.99), female (OR, 2.24; 95% CI, 1.99-2.53), underweight (OR, 1.27; 95% CI, 1.02-1.57) and obesity (OR, 1.41; 95% CI, 1.13-1.75), weekday sleep duration (OR, 0.86; 95% CI, 0.81-0.91), weekend CUS duration ≥ 2 hours (OR, 0.68; 95% CI, 0.55-0.85), ESS (OR, 1.08; 95% CI, 1.07-1.10), much (OR, 2.15; 95% CI, 1.63-2.84) and insufficient (OR, 1.71; 95% CI, 1.46-2.01) perceived sleep, snoring (OR, 1.27; 95% CI, 1.11-1.46) and witnessed apnea (OR, 2.10; 95% CI, 1.75-2.52), increased internet addiction (OR, 1.06; 95% CI, 1.05-1.06), high number of private education (OR, 0.76; 95% CI, 0.60-0.95), and poor sleep environment (OR, 1.86; 95% CI, 1.56-2.21) were associated with depression. Conclusion Eveningness preference, insufficient weekday sleep duration, short weekend CUS duration, and self-reported snoring and sleep apnea were associated with an increased risk for depression. Support

Download Full-text

Comparison of Data Mining Classification Algorithms Determining the Default Risk

Scientific Programming ◽

10.1155/2019/8706505 ◽

2019 ◽

Vol 2019 ◽

pp. 1-8 ◽

Cited By ~ 3

Author(s):

Begüm Çığşar ◽

Deniz Ünal

Keyword(s):

Data Mining ◽

Logistic Regression ◽

Default Risk ◽

Operating Characteristic ◽

Classification Algorithms ◽

Financial Industry ◽

Statistical Institute ◽

Default Risks ◽

Characteristic Area ◽

Statistical Criteria

Big data and its analysis have become a widespread practice in recent times, applicable to multiple industries. Data mining is a technique that is based on statistical applications. This method extracts previously undetermined data items from large quantities of data. The banking and insurance industries use data mining analysis to detect fraud, offer the appropriate credit or insurance solutions to customers, and better understand customer demands. This study aims to identify data mining classification algorithms and use them to predict default risks, avoid possible payment difficulties, and reduce potential problems in extending credit. The data for this study, which contains demographic and socioeconomic characteristics of individuals, were obtained from the Turkish Statistical Institute 2015 survey. Six classification algorithms—Naive Bayes, Bayesian networks, J48, random forest, multilayer perceptron, and logistic regression—were applied to the dataset using WEKA 3.9 data mining software. These algorithms were compared considering the root mean error squares, receiver operating characteristic area, accuracy, precision, F-measure, and recall statistical criteria. The best algorithm—logistic regression—was obtained and applied to the real dataset to determine the attributes causing the default risk by using odds ratios. The socioeconomic and demographic characteristics of the individuals were examined, and based on the odds ratio values, the results of which individuals and characteristics were more likely to default, were reached. These results are not only beneficial to the literature but also have a significant influence in the financial industry in terms of the ability to predict customers’ default risk.

Download Full-text

Circulating chemokines accurately identify individuals with clinically significant atherosclerotic heart disease

Physiological Genomics ◽

10.1152/physiolgenomics.00104.2007 ◽

2007 ◽

Vol 31 (3) ◽

pp. 402-409 ◽

Cited By ~ 36

Author(s):

Diego Ardigo ◽

Themistocles L. Assimes ◽

Stephen P. Fortmann ◽

Alan S. Go ◽

Mark Hlatky ◽

...

Keyword(s):

Logistic Regression ◽

Heart Disease ◽

Serum Levels ◽

Accurate Method ◽

Response To Therapy ◽

Classification Algorithms ◽

Advance Study ◽

Cross Sectional ◽

Proteomic Approach ◽

Clinically Significant

Serum inflammatory markers correlate with outcome and response to therapy in subjects with cardiovascular disease. However, current individual markers lack specificity for the diagnosis of coronary artery disease (CAD). We hypothesize that a multimarker proteomic approach measuring serum levels of vascular derived inflammatory biomarkers could reveal a “signature of disease” that can serve as a highly accurate method to assess for the presence of coronary atherosclerosis. We simultaneously measured serum levels of seven chemokines [CXCL10 (IP-10), CCL11 (eotaxin), CCL3 (MIP1α), CCL2 (MCP1), CCL8 (MCP2), CCL7 (MCP3), and CCL13 (MCP4)] in 48 subjects with clinically significant CAD (“cases”) and 44 controls from the ADVANCE Study. We applied three classification algorithms to identify the combination of variables that would best predict case-control status and assessed the diagnostic performance of these models with receiver operating characteristic (ROC) curves. The serum levels of six chemokines were significantly higher in cases compared with controls ( P < 0.05). All three classification algorithms entered three chemokines in their final model, and only logistic regression selected clinical variables. Logistic regression produced the highest ROC of the three algorithms (AUC = 0.95; SE = 0.03), which was markedly better than the AUC for the logistic regression model of traditional risk factors of CAD without (AUC = 0.67; SE = 0.06) or with CRP (AUC = 0.68; SE = 0.06). A combination of serum levels of multiple chemokines identifies subjects with clinically significant atherosclerotic heart disease with a very high degree of accuracy. These results need to be replicated in larger cross-sectional studies and their prognostic value explored.

Download Full-text

A Study on Machine Vision Techniques for the Inspection of Health Personnels’ Protective Suits for the Treatment of Patients in Extreme Isolation

Electronics ◽

10.3390/electronics8070743 ◽

2019 ◽

Vol 8 (7) ◽

pp. 743 ◽

Cited By ~ 1

Author(s):

Alice Stazio ◽

Juan G. Victores ◽

David Estevez ◽

Carlos Balaguer

Keyword(s):

Logistic Regression ◽

Machine Vision ◽

Training Data ◽

Gradient Boosting ◽

Support Vector ◽

Classification Algorithms ◽

Adaptive Boosting ◽

Blood Stains ◽

Extreme Gradient Boosting ◽

Vector Machines

The examination of Personal Protective Equipment (PPE) to assure the complete integrity of health personnel in contact with infected patients is one of the most necessary tasks when treating patients affected by infectious diseases, such as Ebola. This work focuses on the study of machine vision techniques for the detection of possible defects on the PPE that could arise after contact with the aforementioned pathological patients. A preliminary study on the use of image classification algorithms to identify blood stains on PPE subsequent to the treatment of the infected patient is presented. To produce training data for these algorithms, a synthetic dataset was generated from a simulated model of a PPE suit with blood stains. Furthermore, the study proceeded with the utilization of images of the PPE with a physical emulation of blood stains, taken by a real prototype. The dataset reveals a great imbalance between positive and negative samples; therefore, all the selected classification algorithms are able to manage this kind of data. Classifiers range from Logistic Regression and Support Vector Machines, to bagging and boosting techniques such as Random Forest, Adaptive Boosting, Gradient Boosting and eXtreme Gradient Boosting. All these algorithms were evaluated on accuracy, precision, recall and F 1 score; and additionally, execution times were considered. The obtained results report promising outcomes of all the classifiers, and, in particular Logistic Regression resulted to be the most suitable classification algorithm in terms of F 1 score and execution time, considering both datasets.

Download Full-text