Comparative analysis of multiple classification models to improve PM10 prediction performance

With the increasing requirement of high accuracy for particulate matter prediction, various attempts have been made to improve prediction accuracy by applying machine learning algorithms. However, the characteristics of particulate matter and the problem of the occurrence rate by concentration make it difficult to train prediction models, resulting in poor prediction. In order to solve this problem, in this paper, we proposed multiple classification models for predicting particulate matter concentrations required for prediction by dividing them into AQI-based classes. We designed multiple classification models using logistic regression, decision tree, SVM and ensemble among the various machine learning algorithms. The comparison results of the performance of the four classification models through error matrices confirmed the f-score of 0.82 or higher for all the models other than the logistic regression model.

Download Full-text

Predicting hospitalization following psychiatric crisis care using machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01361-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Crisis Care

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text

A Study of Predictive Models for Early Outcomes of Post-Prostatectomy Incontinence: Machine Learning Approach vs. Logistic Regression Analysis Approach

Applied Sciences ◽

10.3390/app11136225 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6225

Author(s):

Seongkeun Park ◽

Jieun Byun

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Odds Ratio ◽

Prediction Models ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

P Value ◽

Early Recovery ◽

Consistent Group ◽

Recovery Group

Background: Post-prostatectomy incontinence (PPI) is a major complication that can significantly decrease quality of life. Approximately 20% of patients experience consistent PPI as long as 1 year after radical prostatectomy (RP). This study develops a preoperative predictive model and compares its diagnostic performance with conventional tools. Methods: A total of 166 prostate cancer patients who underwent magnetic resonance imaging (MRI) and RP were evaluated. According to the date of the RP, patients were divided into a development cohort (n = 109) and a test cohort (n = 57). Patients were classified as PPI early-recovery or consistent on the basis of pad usage for incontinence at 3 months after RP. Uni- and multi-variable logistic regression analyses were performed to identify associates of PPI early recovery. Four well-known machine learning algorithms (k-nearest neighbor, decision tree, support-vector machine (SVM), and random forest) and a logistic regression model were used to build prediction models for recovery from PPI using preoperative clinical and imaging data. The performances of the prediction models were assessed internally and externally using sensitivity, specificity, accuracy, and area-under-the-curve values and estimated probabilities and the actual proportion of cases of recovery from PPI within 3 months were compared using a chi-squared test. Results: Clinical and imaging findings revealed that age (70.1 years old for the PPI early-recovery group vs. 72.8 years old for the PPI consistent group), membranous urethral length (MUL; 15.7 mm for the PPI early-recovery group vs. 13.9 mm for the PPI consistent group), and obturator internal muscle (18.2 mm for the PPI early-recovery group vs. 17.5 mm for the PPI consistent group) were significantly different between the PPI early-recovery and consistent groups (all p-values < 0.05). Multivariate analysis confirmed that age (odds ratio = 1.07, 95% confidence interval = 1.02–1.14, p-value = 0.007) and MUL (odds ratio = 0.87, 95% confidence interval = 0.80–0.95, p-value = 0.002) were significant independent factors for early recovery. The prediction model using machine learning algorithms showed superior diagnostic performance compared with conventional logistic regression (AUC = 0.59 ± 0.07), especially SVM (AUC = 0.65 ± 0.07). Moreover, all models showed good calibration between the estimated probability and actual observed proportion of cases of recovery from PPI within 3 months. Conclusions: Preoperative clinical data and anatomic features on preoperative MRI can be used to predict early recovery from PPI after RP, and machine learning algorithms provide greater diagnostic accuracy compared with conventional statistical approaches.

Download Full-text

Comparison of Classification Models for Breast Cancer Identification using Google Colab

10.20944/preprints202005.0328.v1 ◽

2020 ◽

Author(s):

SUNDARAMBAL BALARAMAN

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Logistic Regression ◽

Learning Algorithms ◽

Research Work ◽

Machine Learning Algorithms ◽

Support Vector ◽

Classification Algorithms ◽

Classification Models ◽

K Nearest Neighbor

Classification algorithms are very widely used algorithms for the study of various categories of data located in multiple databases that have real-world implementations. The main purpose of this research work is to identify the efficiency of classification algorithms in the study of breast cancer analysis. Mortality rate of women increases due to frequent cases of breast cancer. The conventional method of diagnosing breast cancer is time consuming and hence research works are being carried out in multiple dimensions to address this issue. In this research work, Google colab, an excellent environment for Python coders, is used as a tool to implement machine learning algorithms for predicting the type of cancer. The performance of machine learning algorithms is analyzed based on the accuracy obtained from various classification models such as logistic regression, K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Naïve Bayes, Decision Tree and Random forest. Experiments show that these classifiers work well for the classification of breast cancers with accuracy>90% and the logistic regression stood top with an accuracy of 98.5%. Also implementation using Google colab made the task very easier without spending hours of installation of environment and supporting libraries which we used to do earlier.

Download Full-text

Clinical Classifiers to Identify Ascending Aortic Dilatation in Patients With Bicuspid Versus Tricuspid Aortic Valves

10.21203/rs.3.rs-957446/v1 ◽

2021 ◽

Author(s):

Bamba Gaye ◽

Maxime Vignac ◽

Jesper R. Gådin ◽

Magalie Ladouceur ◽

Kenneth Caidahl ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Aortic Valve ◽

Regression Models ◽

Prediction Models ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Multidimensional Data ◽

Aortic Dilatation ◽

Logistic Regression Models

Abstract Objective: We aimed to develop clinical classifiers to identify prevalent ascending aortic dilatation in patients with BAV and tricuspid aortic valve (TAV). Methods: This study included BAV (n=543) and TAV (n=491) patients with aortic valve disease and/or ascending aortic dilatation but devoid of coronary artery disease undergoing cardiothoracic surgery. We applied machine learning algorithms and classic logistic regression models, using multiple variable selection methodologies to identify predictors of high risk of ascending aortic dilatation (ascending aorta with a diameter above 40 mm). Analyses included comprehensive multidimensional data (i.e., valve morphology, clinical data, family history of cardiovascular diseases, prevalent diseases, demographic, lifestyle and medication). Results: BAV patients were younger (60.4±12.4 years) than TAV patients (70.4±9.1 years), and had a higher frequency of aortic dilatation (45.3% vs. 28.9% for BAV and TAV, respectively. P<0.001). The aneurysm prediction models showed mean AUC values above 0.8 for TAV patients, with the absence of aortic stenosis being the main predictor, followed by diabetes and high sensitivity C-Reactive Protein. Using the same clinical measures in BAV patients our prediction model resulted in AUC values between 0.5-0.55, not useful for prediction of aortic dilatation. The classification results were consistent for all machine learning algorithms and classic logistic regression models. Conclusions: Cardiovascular risk profiles appear to be more predictive of aortopathy in TAV patients than in patients with BAV. This adds evidence to the fact that BAV- and TAV-associated aortopathy involve different pathways to aneurysm formation and highlights the need for specific aneurysm preventions in these patients. Further, our results highlight that machine learning approaches do not outperform classical prediction methods in addressing complex interactions and non-linear relations between variables.

Download Full-text

Predicting Hospitalization following Psychiatric Crisis Care using Machine Learning

10.21203/rs.2.12338/v1 ◽

2019 ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Predictor Variables ◽

Gradient Boosting ◽

K Nearest Neighbors ◽

Psychiatric Crisis ◽

Crisis Care

Abstract Background: It is difficult to accurately predict whether a patient on the verge of a potential psychiatric crisis will need to be hospitalized. Machine learning may be helpful to improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate and compare the accuracy of ten machine learning algorithms including the commonly used generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact, and explore the most important predictor variables of hospitalization. Methods: Data from 2,084 patients with at least one reported psychiatric crisis care contact included in the longitudinal Amsterdam Study of Acute Psychiatry were used. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared. We also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis. Target variable for the prediction models was whether or not the patient was hospitalized in the 12 months following inclusion in the study. The 39 predictor variables were related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts. Results: We found Gradient Boosting to perform the best (AUC=0.774) and K-Nearest Neighbors performing the least (AUC=0.702). The performance of GLM/logistic regression (AUC=0.76) was above average among the tested algorithms. Gradient Boosting outperformed GLM/logistic regression and K-Nearest Neighbors, and GLM outperformed K-Nearest Neighbors in a Net Reclassification Improvement analysis, although the differences between Gradient Boosting and GLM/logistic regression were small. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions: Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was modest. Future studies may consider to combine multiple algorithms in an ensemble model for optimal performance and to mitigate the risk of choosing suboptimal performing algorithms.

Download Full-text

Book Genre Categorization Using Machine Learning Algorithms (K-Nearest Neighbor, Support Vector Machine and Logistic Regression) using Customized Dataset

International Journal of Computer Science and Mobile Computing ◽

10.47760/ijcsmc.2021.v10i03.002 ◽

2021 ◽

Vol 10 (3) ◽

pp. 14-25

Author(s):

Parilkumar Shiroya

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Logistic Regression ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor

Download Full-text

Predicting the Risk of Hypertension Based on Several Easy-to-Collect Risk Factors: A Machine Learning Method

Frontiers in Public Health ◽

10.3389/fpubh.2021.619429 ◽

2021 ◽

Vol 9 ◽

Author(s):

Huanhuan Zhao ◽

Xiaoyu Zhang ◽

Yang Xu ◽

Lisheng Gao ◽

Zuchang Ma ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Logistic Regression ◽

Risk Prediction ◽

Disease Risk ◽

Learning Algorithms ◽

Large Population ◽

Machine Learning Algorithms ◽

Hypertension Risk ◽

Model Training

Hypertension is a widespread chronic disease. Risk prediction of hypertension is an intervention that contributes to the early prevention and management of hypertension. The implementation of such intervention requires an effective and easy-to-implement hypertension risk prediction model. This study evaluated and compared the performance of four machine learning algorithms on predicting the risk of hypertension based on easy-to-collect risk factors. A dataset of 29,700 samples collected through a physical examination was used for model training and testing. Firstly, we identified easy-to-collect risk factors of hypertension, through univariate logistic regression analysis. Then, based on the selected features, 10-fold cross-validation was utilized to optimize four models, random forest (RF), CatBoost, MLP neural network and logistic regression (LR), to find the best hyper-parameters on the training set. Finally, the performance of models was evaluated by AUC, accuracy, sensitivity and specificity on the test set. The experimental results showed that the RF model outperformed the other three models, and achieved an AUC of 0.92, an accuracy of 0.82, a sensitivity of 0.83 and a specificity of 0.81. In addition, Body Mass Index (BMI), age, family history and waist circumference (WC) are the four primary risk factors of hypertension. These findings reveal that it is feasible to use machine learning algorithms, especially RF, to predict hypertension risk without clinical or genetic data. The technique can provide a non-invasive and economical way for the prevention and management of hypertension in a large population.

Download Full-text

A Literature Review on Thyroid Hormonal Problems in Women Using Data Science and Analytics

Advances in Data Mining and Database Management - Handbook of Research on Engineering, Business, and Healthcare Applications of Data Science and Analytics ◽

10.4018/978-1-7998-3053-5.ch021 ◽

2021 ◽

pp. 416-428

Author(s):

R. Suganya ◽

Rajaram S. ◽

Kameswari M.

Keyword(s):

Machine Learning ◽

Literature Review ◽

Data Science ◽

Learning Algorithms ◽

Research Literature ◽

Machine Learning Algorithms ◽

Thyroid Disorder ◽

Classification Models ◽

Indian Women ◽

Using Data

Currently, thyroid disorders are more common and widespread among women worldwide. In India, seven out of ten women are suffering from thyroid problems. Various research literature studies predict that about 35% of Indian women are examined with prevalent goiter. It is very necessary to take preventive measures at its early stages, otherwise it causes infertility problem among women. The recent review discusses various analytics models that are used to handle different types of thyroid problems in women. This chapter is planned to analyze and compare different classification models, both machine learning algorithms and deep leaning algorithms, to classify different thyroid problems. Literature from both machine learning and deep learning algorithms is considered. This literature review on thyroid problems will help to analyze the reason and characteristics of thyroid disorder. The dataset used to build and to validate the algorithms was provided by UCI machine learning repository.

Download Full-text

Software Maintainability: Systematic Literature Review and Current Trends

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194016500431 ◽

2016 ◽

Vol 26 (08) ◽

pp. 1221-1253 ◽

Cited By ~ 16

Author(s):

Ruchika Malhotra ◽

Anuradha Chug

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Evaluation System ◽

Prediction Models ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Design Metrics ◽

Current Trends ◽

Software Maintainability ◽

Early Phases

Software maintenance is an expensive activity that consumes a major portion of the cost of the total project. Various activities carried out during maintenance include the addition of new features, deletion of obsolete code, correction of errors, etc. Software maintainability means the ease with which these operations can be carried out. If the maintainability can be measured in early phases of the software development, it helps in better planning and optimum resource utilization. Measurement of design properties such as coupling, cohesion, etc. in early phases of development often leads us to derive the corresponding maintainability with the help of prediction models. In this paper, we performed a systematic review of the existing studies related to software maintainability from January 1991 to October 2015. In total, 96 primary studies were identified out of which 47 studies were from journals, 36 from conference proceedings and 13 from others. All studies were compiled in structured form and analyzed through numerous perspectives such as the use of design metrics, prediction model, tools, data sources, prediction accuracy, etc. According to the review results, we found that the use of machine learning algorithms in predicting maintainability has increased since 2005. The use of evolutionary algorithms has also begun in related sub-fields since 2010. We have observed that design metrics is still the most favored option to capture the characteristics of any given software before deploying it further in prediction model for determining the corresponding software maintainability. A significant increase in the use of public dataset for making the prediction models has also been observed and in this regard two public datasets User Interface Management System (UIMS) and Quality Evaluation System (QUES) proposed by Li and Henry is quite popular among researchers. Although machine learning algorithms are still the most popular methods, however, we suggest that researchers working on software maintainability area should experiment on the use of open source datasets with hybrid algorithms. In this regard, more empirical studies are also required to be conducted on a large number of datasets so that a generalized theory could be made. The current paper will be beneficial for practitioners, researchers and developers as they can use these models and metrics for creating benchmark and standards. Findings of this extensive review would also be useful for novices in the field of software maintainability as it not only provides explicit definitions, but also lays a foundation for further research by providing a quick link to all important studies in the said field. Finally, this study also compiles current trends, emerging sub-fields and identifies various opportunities of future research in the field of software maintainability.

Download Full-text

Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches

Current Drug Metabolism ◽

10.2174/1389200219666180829121038 ◽

2019 ◽

Vol 20 (3) ◽

pp. 177-184 ◽

Cited By ~ 16

Author(s):

Nantao Zheng ◽

Kairou Wang ◽

Weihua Zhan ◽

Lei Deng

Keyword(s):

Machine Learning ◽

Computational Methods ◽

Protein Interactions ◽

Prediction Models ◽

Learning Algorithms ◽

Biological Data ◽

Machine Learning Algorithms ◽

Host Protein ◽

Protein Protein Interactions ◽

Protein Motifs

Background:Targeting critical viral-host Protein-Protein Interactions (PPIs) has enormous application prospects for therapeutics. Using experimental methods to evaluate all possible virus-host PPIs is labor-intensive and time-consuming. Recent growth in computational identification of virus-host PPIs provides new opportunities for gaining biological insights, including applications in disease control. We provide an overview of recent computational approaches for studying virus-host PPI interactions.Methods:In this review, a variety of computational methods for virus-host PPIs prediction have been surveyed. These methods are categorized based on the features they utilize and different machine learning algorithms including classical and novel methods.Results:We describe the pivotal and representative features extracted from relevant sources of biological data, mainly include sequence signatures, known domain interactions, protein motifs and protein structure information. We focus on state-of-the-art machine learning algorithms that are used to build binary prediction models for the classification of virus-host protein pairs and discuss their abilities, weakness and future directions.Conclusion:The findings of this review confirm the importance of computational methods for finding the potential protein-protein interactions between virus and host. Although there has been significant progress in the prediction of virus-host PPIs in recent years, there is a lot of room for improvement in virus-host PPI prediction.

Download Full-text

Comparative analysis of multiple classification models to improve PM10 prediction performance

Predicting hospitalization following psychiatric crisis care using machine learning

A Study of Predictive Models for Early Outcomes of Post-Prostatectomy Incontinence: Machine Learning Approach vs. Logistic Regression Analysis Approach

Comparison of Classification Models for Breast Cancer Identification using Google Colab

Clinical Classifiers to Identify Ascending Aortic Dilatation in Patients With Bicuspid Versus Tricuspid Aortic Valves

Predicting Hospitalization following Psychiatric Crisis Care using Machine Learning

Book Genre Categorization Using Machine Learning Algorithms (K-Nearest Neighbor, Support Vector Machine and Logistic Regression) using Customized Dataset﻿

Predicting the Risk of Hypertension Based on Several Easy-to-Collect Risk Factors: A Machine Learning Method

A Literature Review on Thyroid Hormonal Problems in Women Using Data Science and Analytics

Software Maintainability: Systematic Literature Review and Current Trends

Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches

Book Genre Categorization Using Machine Learning Algorithms (K-Nearest Neighbor, Support Vector Machine and Logistic Regression) using Customized Dataset