Building a predictive model to assist in the diagnosis of cervical cancer

Aim: Cervical cancer is still one of the most common gynecologic cancers in the world. Since cervical cancer is a potentially preventive cancer, earlier detection is the most effective technique for decreasing the worldwide incidence of the illness. Materials and methods: This research presents a novel ensemble technique for predicting cervical cancer risk. Specifically, the authors introduce a voting classifier that aggregates prediction probabilities from multiple machine-learning models: logistic regression, K-nearest neighbor, decision tree, XGBoost and multilayer perceptron. Results: The average accuracy, precision, recall and f1-score of the voting classifier were 96.6, 97.4, 95.9 and 96.6, respectively. Furthermore, the voting algorithm gains average high values for all evaluation metrics (accuracy, precision, recall and f1-score). The f1-score of the algorithm is 96%, which demonstrates the robustness of the model. Conclusion: The findings suggest that the probability of having cervical cancer can be accurately predicted utilizing the voting technique.

Download Full-text

Graduate Admission Prediction Using Machine Learning

International Journal of Computers and Communications ◽

10.46300/91013.2020.14.13 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Multilayer Perceptron ◽

Nearest Neighbor ◽

Educational Institutions ◽

Master's Program ◽

Learning Models ◽

K Nearest Neighbor ◽

Machine Learning Models

Student admission problem is very important in educational institutions. This paper addresses machine learning models to predict the chance of a student to be admitted to a master’s program. This will assist students to know in advance if they have a chance to get accepted. The machine learning models are multiple linear regression, k-nearest neighbor, random forest, and Multilayer Perceptron. Experiments show that the Multilayer Perceptron model surpasses other models.

Download Full-text

Classification of Parkinson’s disease and essential tremor based on balance and gait characteristics from wearable motion sensors via machine learning techniques: a data-driven approach

Journal of NeuroEngineering and Rehabilitation ◽

10.1186/s12984-020-00756-5 ◽

2020 ◽

Vol 17 (1) ◽

Author(s):

Sanghee Moon ◽

Hyun-Je Song ◽

Vibhash D. Sharma ◽

Kelly E. Lyons ◽

Rajesh Pahwa ◽

...

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Motion Sensors ◽

Learning Models ◽

K Nearest Neighbor ◽

Gait Characteristics ◽

Machine Learning Models

Abstract Background Parkinson’s disease (PD) and essential tremor (ET) are movement disorders that can have similar clinical characteristics including tremor and gait difficulty. These disorders can be misdiagnosed leading to delay in appropriate treatment. The aim of the study was to determine whether balance and gait variables obtained with wearable inertial motion sensors can be utilized to differentiate between PD and ET using machine learning. Additionally, we compared classification performances of several machine learning models. Methods This retrospective study included balance and gait variables collected during the instrumented stand and walk test from people with PD (n = 524) and with ET (n = 43). Performance of several machine learning techniques including neural networks, support vector machine, k-nearest neighbor, decision tree, random forest, and gradient boosting, were compared with a dummy model or logistic regression using F1-scores. Results Machine learning models classified PD and ET based on balance and gait characteristics better than the dummy model (F1-score = 0.48) or logistic regression (F1-score = 0.53). The highest F1-score was 0.61 of neural network, followed by 0.59 of gradient boosting, 0.56 of random forest, 0.55 of support vector machine, 0.53 of decision tree, and 0.49 of k-nearest neighbor. Conclusions This study demonstrated the utility of machine learning models to classify different movement disorders based on balance and gait characteristics collected from wearable sensors. Future studies using a well-balanced data set are needed to confirm the potential clinical utility of machine learning models to discern between PD and ET.

Download Full-text

Abstract 13895: Machine Learning-based Prediction Model With Contribute the Optimal Recipient Selection for the Regenerative Therapy in ICM and Non-ICM Patients

Circulation ◽

10.1161/circ.142.suppl_3.13895 ◽

2020 ◽

Vol 142 (Suppl_3) ◽

Author(s):

Yuji Sakashita ◽

Hidetsugu Asanoi ◽

Shigeru Miyagawa ◽

Satoshi Kainuma ◽

Ai Kawamura ◽

...

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Left Ventricular ◽

Left Ventricular Ejection ◽

Regenerative Therapy ◽

Learning Models ◽

K Nearest Neighbor ◽

Ventricular Ejection Fraction ◽

Free Survival ◽

Machine Learning Models

Background: Although transplantation of patch-shaped autologous skeletal muscle-derived cells has been introduced to ischemic cardiomyopathy (ICM) or non-ICM patients, we found responders or non-responders to this treatment exist and how to predict the responsiveness to this treatment is crucial for the improvement of the effectiveness. In this study, we revealed the clinical features associated with the responders using a machine learning based-model, which was discriminated between the responder and the non-responders in this treatment. Methods and Results: We used the retrospective databases of 23 ICM patients and 23 non-ICM patients undergoing autologous myoblast patch transplantation to develop machine learning models to discriminate 3-year VAD free survival. Sixty-nine pre-transplantation clinical futures were selected to train the models. In ICM models, there were 4 VADs or deaths, and in non-ICM models, there were 10 VADs or deaths during the 3-year follow-up periods. Using these databases, we trained multiple machine learning models and evaluated the models with the leave-one-out method. In ICM, k nearest neighbor demonstrated the best performance, showing the accuracy was 95.7%, and the AUC was 0.95. The features associated with 3-year VAD free survival in ICM were NYHA classification, cardiac index, and left ventricular ejection fraction. In non-ICM, k nearest neighbor demonstrated the best performance among the trained classifiers, demonstrating the accuracy was 95.7%, and the AUC was 0.93. The features associated with 3-year VAD free survival in non-ICM were pulmonary capillary wedge pressure, pulmonary vascular resistance, and albumin. Conclusion: We found the features associated with 3-year VAD free survival in autologous myoblast patch transplantation in ICM and non-ICM. Focusing on these features may facilitate optimal candidate selection in ICM and non-ICM for regenerative therapy.

Download Full-text

Identification of viruses with the potential to infect human

10.1101/597963 ◽

2019 ◽

Author(s):

Zheng Zhang ◽

Zena Cai ◽

Zhiying Tan ◽

Congyu Lu ◽

Gaihua Zhang ◽

...

Keyword(s):

Nearest Neighbor ◽

Rapid Development ◽

Human Infection ◽

Viral Metagenomics ◽

Global Public Health ◽

Learning Models ◽

K Nearest Neighbor ◽

Mortality And Morbidity ◽

Viral Genomes ◽

Machine Learning Models

AbstractThe virus has caused much mortality and morbidity to humans, and still posed a serious threat to the global public health. The virome with the human-infection potential is far from complete. Novel viruses have been discovered at an unprecedented pace as the rapid development of viral metagenomics. However, there is still a lack of a method for rapidly identifying the virus with the human-infection potential. This study built several machine learning models for discriminating the human-infecting viruses from other viruses based on the frequency of k-mers in the viral genomic sequences. The k-nearest neighbor (KNN) model could predict the human-infecting virus with an accuracy of over 90%. Even for the KNN models built on the contigs as short as 1kb, they performed comparably to those built on the viral genomes, suggesting that the models could be used to identify the human-infecting virus from the viral metagenomic sequences. This work could help for discovery of novel human-infecting virus in metagenomics studies.

Download Full-text

Computational Intelligence-Based Model for Mortality Rate Prediction in COVID-19 Patients

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18126429 ◽

2021 ◽

Vol 18 (12) ◽

pp. 6429

Author(s):

Irfan Ullah Khan ◽

Nida Aslam ◽

Malak Aljabri ◽

Sumayh S. Aljameel ◽

Mariam Moataz Aly Kamaleldin ◽

...

Keyword(s):

Mortality Rate ◽

Computational Intelligence ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Detection And Identification ◽

Proposed Model ◽

Extreme Gradient Boosting ◽

The World ◽

Detection And Diagnosis

The COVID-19 outbreak is currently one of the biggest challenges facing countries around the world. Millions of people have lost their lives due to COVID-19. Therefore, the accurate early detection and identification of severe COVID-19 cases can reduce the mortality rate and the likelihood of further complications. Machine Learning (ML) and Deep Learning (DL) models have been shown to be effective in the detection and diagnosis of several diseases, including COVID-19. This study used ML algorithms, such as Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and K-Nearest Neighbor (KNN) and DL model (containing six layers with ReLU and output layer with sigmoid activation), to predict the mortality rate in COVID-19 cases. Models were trained using confirmed COVID-19 patients from 146 countries. Comparative analysis was performed among ML and DL models using a reduced feature set. The best results were achieved using the proposed DL model, with an accuracy of 0.97. Experimental results reveal the significance of the proposed model over the baseline study in the literature with the reduced feature set.

Download Full-text

Machine Learning-Based Malicious X.509 Certificates’ Detection

Applied Sciences ◽

10.3390/app11052164 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2164

Author(s):

Jiaxin Li ◽

Zhaoxin Zhang ◽

Changyong Guo

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Ensemble Learning ◽

Traffic Analysis ◽

Learning Models ◽

Detection Model ◽

Analysis Tools ◽

Average Accuracy ◽

Machine Learning Models

X.509 certificates play an important role in encrypting the transmission of data on both sides under HTTPS. With the popularization of X.509 certificates, more and more criminals leverage certificates to prevent their communications from being exposed by malicious traffic analysis tools. Phishing sites and malware are good examples. Those X.509 certificates found in phishing sites or malware are called malicious X.509 certificates. This paper applies different machine learning models, including classical machine learning models, ensemble learning models, and deep learning models, to distinguish between malicious certificates and benign certificates with Verification for Extraction (VFE). The VFE is a system we design and implement for obtaining plentiful characteristics of certificates. The result shows that ensemble learning models are the most stable and efficient models with an average accuracy of 95.9%, which outperforms many previous works. In addition, we obtain an SVM-based detection model with an accuracy of 98.2%, which is the highest accuracy. The outcome indicates the VFE is capable of capturing essential and crucial characteristics of malicious X.509 certificates.

Download Full-text

Extra Point Under Review: Machine Learning And The NFL Field Goal

Elements ◽

10.6017/eurj.v12i2.9448 ◽

2016 ◽

Vol 12 (2) ◽

Author(s):

James LeDoux

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Learning Models ◽

The World ◽

Extra Point ◽

Machine Learning Models

<p>The new NFL extra point rule first implemented in the 2015 season requires a kicker to attempt his extra point with the ball snapped from the 15-yard line. This attempt stretches an extra point to the equivalent of a 32-yard field goal attempt, 13 yards longer than under the previous rule. Though a 32-yard attempt is still a chip shot to any professional kicker, many NFL analysts were surprised to see the number of extra points that were missed. Should this really have been a surprise, though? Beginning with a replication of a study by Clark et. al, this study aims to explore the world of NFL kicking from a statistical perspective, applying econometric and machine learning models to display a deeper perspective on what exactly makes some field goal attempts more difficult than others. Ultimately, the goal is to go beyond the previous research on this topic, providing an improved predictive model of field goal success and a better metric for evaluating placekicker ability.</p>

Download Full-text

Gas Turbine Fault Classification Using Probability Density Estimation

Volume 6: Ceramics; Controls, Diagnostics and Instrumentation; Education; Manufacturing Materials and Metallurgy ◽

10.1115/gt2014-27265 ◽

2014 ◽

Cited By ~ 1

Author(s):

Igor Loboda

Keyword(s):

Probability Density ◽

Gas Turbine ◽

Nearest Neighbor ◽

Probabilistic Neural Network ◽

Testing Procedure ◽

Fault Classification ◽

Confidence Probability ◽

Probability Density Estimation ◽

K Nearest Neighbor ◽

Average Accuracy

Diagnostics is an important aspect of a condition based maintenance program. To develop an effective gas turbine monitoring system in short time, the recommendations on how to optimally design every system algorithm are required. This paper deals with choosing a proper fault classification technique for gas turbine monitoring systems. To classify gas path faults, different artificial neural networks are typically employed. Among them the Multilayer Perceptron (MLP) is the mostly used. Some comparative studies referred to in the introduction show that the MLP and some other techniques yield practically the same classification accuracy on average for all faults. That is why in addition to the average accuracy, more criteria to choose the best technique are required. Since techniques like Probabilistic Neural Network (PNN), Parzen Window (PW) and k-Nearest Neighbor (K-NN) provide a confidence probability for every diagnostic decision, the presence of this important property can be such a criterion. The confidence probability in these techniques is computed through estimating a probability density for patterns of each concerned fault class. The present study compares all mentioned techniques and their variations using as criteria both the average accuracy and availability of the confidence probability. To compute them for each technique, a special testing procedure simulates numerous diagnosis cycles corresponding to different fault classes and fault severities. In addition to the criteria themselves, criteria imprecision due to a finite number of the diagnosis cycles is computed and involved into selecting the best technique.

Download Full-text

Diagnosis of Problems in Truck Ore Transport Operations in Underground Mines Using Various Machine Learning Models and Data Collected by Internet of Things Systems

Minerals ◽

10.3390/min11101128 ◽

2021 ◽

Vol 11 (10) ◽

pp. 1128

Author(s):

Sebeom Park ◽

Dahee Jung ◽

Hoang Nguyen ◽

Yosoon Choi

Keyword(s):

Machine Learning ◽

Internet Of Things ◽

Production Management ◽

Classification And Regression Tree ◽

Underground Mines ◽

Validation Dataset ◽

Support Vector ◽

Learning Models ◽

K Nearest Neighbor ◽

Machine Learning Models

This study proposes a method for diagnosing problems in truck ore transport operations in underground mines using four machine learning models (i.e., Gaussian naïve Bayes (GNB), k-nearest neighbor (kNN), support vector machine (SVM), and classification and regression tree (CART)) and data collected by an Internet of Things system. A limestone underground mine with an applied mine production management system (using a tablet computer and Bluetooth beacon) is selected as the research area, and log data related to the truck travel time are collected. The machine learning models are trained and verified using the collected data, and grid search through 5-fold cross-validation is performed to improve the prediction accuracy of the models. The accuracy of CART is highest when the parameters leaf and split are set to 1 and 4, respectively (94.1%). In the validation of the machine learning models performed using the validation dataset (1500), the accuracy of the CART was 94.6%, and the precision and recall were 93.5% and 95.7%, respectively. In addition, it is confirmed that the F1 score reaches values as high as 94.6%. Through field application and analysis, it is confirmed that the proposed CART model can be utilized as a tool for monitoring and diagnosing the status of truck ore transport operations.

Download Full-text

Execution Assessment of Machine Learning Algorithms for Spam Profile Detection on Instagram

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/561032021 ◽

2021 ◽

Vol 10 (3) ◽

pp. 1889-1894

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Tools ◽

Learning Models ◽

K Nearest Neighbor

Witheverypassingsecondsocialnetworkcommunityisgrowingrapidly,becauseofthat,attackershaveshownkeeninterestinthesekindsofplatformsandwanttodistributemischievouscontentsontheseplatforms.Withthefocus on introducing new set of characteristics and features forcounteractivemeasures,agreatdealofstudieshasresearchedthe possibility of lessening the malicious activities on social medianetworks. This research was to highlight features for identifyingspammers on Instagram and additional features were presentedto improve the performance of different machine learning algorithms. Performance of different machine learning algorithmsnamely, Multilayer Perceptron (MLP), Random Forest (RF), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM)were evaluated on machine learning tools named, RapidMinerand WEKA. The results from this research tells us that RandomForest (RF) outperformed all other selected machine learningalgorithmsonbothselectedmachinelearningtools.OverallRandom Forest (RF) provided best results on RapidMiner. Theseresultsareusefulfortheresearcherswhoarekeentobuildmachine learning models to find out the spamming activities onsocialnetworkcommunities.

Download Full-text