Automated invasive cervical cancer disease detection at early stage through suitable machine learning model

AbstractCervical cancer is a common cancer that affects women all over the world. This is the fourth leading cause of death among women and has no symptoms in its early stages. At the cervix, cervical cancer cells develop slowly. If it can be detected early, this cancer can be successfully treated. Health professionals are now facing a major challenge in detecting such cancer until it spreads rapidly. This study applied various machine learning classification methods to predict cervical cancer using risk factors. The main aim of this research work is to be described of the performance variation of eight most classifications algorithm to detect cervical cancer disease based on the selection of various top features sets from the dataset. Multilayer Perceptron (MLP), Random Forest and k-Nearest Neighbor, Decision Tree, Logistic Regression, SVC, Gradient Boosting, AdaBoost are examples of machine learning classification algorithms that have been used to predict cervical cancer and help in early diagnosis. A variety of approaches are used to avoid missing values in the dataset. To choose the various best features, a combination of feature selection techniques such as Chi-square, SelectBest and Random Forest was used. The performance of those classifications is evaluated using the accuracy, recall, precision and f1-score parameters. On a variety of top feature sets, MLP outperformed other classification models. The majority of classification models, on the other hand, claim to have the highest accuracy on the top 25 features in dataset splitting ratio (70:30). For each model, the percentage of correctly classified instances has been presented and all of the results are then discussed. Medical professionals will be able to use the suggested approach to perform research on cervical cancer.

Download Full-text

Sign language dactyl recognition based on machine learning algorithms

Eastern-European Journal of Enterprise Technologies ◽

10.15587/1729-4061.2021.239253 ◽

2021 ◽

Vol 4 (2(112)) ◽

pp. 58-72

Author(s):

Chingiz Kenshimov ◽

Zholdas Buribayev ◽

Yedilkhan Amirgaliyev ◽

Aisulyu Ataniyazova ◽

Askhat Aitimov

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Sign Language ◽

Gesture Recognition ◽

Research Work ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting

In the course of our research work, the American, Russian and Turkish sign languages were analyzed. The program of recognition of the Kazakh dactylic sign language with the use of machine learning methods is implemented. A dataset of 5000 images was formed for each gesture, gesture recognition algorithms were applied, such as Random Forest, Support Vector Machine, Extreme Gradient Boosting, while two data types were combined into one database, which caused a change in the architecture of the system as a whole. The quality of the algorithms was also evaluated. The research work was carried out due to the fact that scientific work in the field of developing a system for recognizing the Kazakh language of sign dactyls is currently insufficient for a complete representation of the language. There are specific letters in the Kazakh language, because of the peculiarities of the spelling of the language, problems arise when developing recognition systems for the Kazakh sign language. The results of the work showed that the Support Vector Machine and Extreme Gradient Boosting algorithms are superior in real-time performance, but the Random Forest algorithm has high recognition accuracy. As a result, the accuracy of the classification algorithms was 98.86 % for Random Forest, 98.68 % for Support Vector Machine and 98.54 % for Extreme Gradient Boosting. Also, the evaluation of the quality of the work of classical algorithms has high indicators. The practical significance of this work lies in the fact that scientific research in the field of gesture recognition with the updated alphabet of the Kazakh language has not yet been conducted and the results of this work can be used by other researchers to conduct further research related to the recognition of the Kazakh dactyl sign language, as well as by researchers, engaged in the development of the international sign language

Download Full-text

Sarcasm detection of tweets without #sarcasm: data science approach

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v23.i2.pp993-1001 ◽

2021 ◽

Vol 23 (2) ◽

pp. 993

Author(s):

Rupali Amit Bagate ◽

R. Suguna

Keyword(s):

Machine Learning ◽

Language Processing ◽

Data Science ◽

Short Term Memory ◽

Confusion Matrix ◽

Research Work ◽

Gradient Boosting ◽

Specific Context ◽

Machine Learning Classification ◽

Light Gradient

Identifying sarcasm present in the text could be a challenging work. In sarcasm, a negative word can flip the polarity of a positive sentence. Sentences can be classified as sarcastic or non-sarcastic. It is easier to identify sarcasm using facial expression or tonal weight rather detecting from plain text. Thus, sarcasm detection using natural language processing is major challenge without giving away any specific context or clue such as #sarcasm present in a tweet. Therefore, research tries to solve this classification problem using various optimized models. Proposed model, analyzes whether a given tweet, is sarcastic or not without the presnece of hashtag sarcasm or any kind of specific context present in text. To achieve better results, we used different machine learning classification methodology along with deep learning embedding techniques. Our optimized model uses a stacking technique which combines the result of logistic regression and long short-term memory (LSTM) recurrent neural net feed to light gradient boosting technique which generates better result as compare to existing machine learning and neural network algorithm. The key difference of our research work is sarcasm detection done without #sarcasm which has not been much explored earlier by any researcher. The metrics used for evolutionis F1-score and confusion matrix.

Download Full-text

Interrogation of Sentiment Perusing with Hash Counting Vectorizer and Term Inverse Frequency Transformer using Machine Learning Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8303.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 3895-3901

Keyword(s):

Machine Learning ◽

Random Forest ◽

Customer Satisfaction ◽

Current Trend ◽

Random Forest Classifier ◽

Gradient Boosting ◽

Svm Classifier ◽

Bayes Classifier ◽

Machine Learning Classification ◽

Tree Classifier

With the fast growing technology, the business is moving towards increasing their profit by interpreting the customer satisfaction. The customer satisfaction can be analyzed in many ways. It is the responsibility of the business to analyze the customer satisfaction in order to improve their turnover and profit. With the current trend, the customers are giving their feedback through mobile and internet. With this overview, this paper attempts to analyze the sentiment of the customer feedback for the movie. The sentiment Analysis on movie Review dataset from the KAGGLE Machine learning repository is used for implementation. The type of sentiment classes is predicted through the following ways. Firstly, the sentiment count for each class is displayed and the top feature words for each sentiment class are also extracted from the dataset. Secondly, the dataset is sampled with counting vectorizer and then fitted with the classifiers like Logistic Regression Classifier, Linear SVM Classifier, Multinomial Naives Bayes Classifier, Gradient Boosting Classifer, Guassian Naive Bayes Classifier, Random Forest Classifier, Decision Tree Classifier and and Extra Tree Classifier. Thirdly, the dataset is sampled with Hashing vectorizer and then fitted with the above specified classifiers. Fourth, the dataset is sampled with TFIFD vectorizer and then fitted with the above specified classifiers. Fifth, the dataset is sampled with TFIFD Transformer and then fitted with the above specified classifiers. Sixth, the Performance analysis of classifiers is performed by analyzing the metrics like Precision, Recall, Fscore and Accuracy. The implementation is carried out using python code in Spyder Anaconda Navigator IP Console. Experimental results shows that the analysis of sentiment done by the random forest classifier is found to be more effective with the Accuracy of 89% for Counting vectorizer and TFIFD transformer, Accuracy of 87% for Hashing vectorizer and Accuracy of 88% for TFIFD vectorizer.

Download Full-text

Regulation Modelling and Analysis Using Machine Learning During the Covid-19 Pandemic in Russia

10.3233/shti210610 ◽

2021 ◽

Author(s):

Egor Trofimov ◽

Oleg Metsker ◽

Georgy Kopanitsa ◽

David Pashoshev

Keyword(s):

Public Health ◽

Machine Learning ◽

Random Forest ◽

Gradient Boosting ◽

Classification Models ◽

Random Forest Regression ◽

Criminal Sanctions ◽

In The Beginning ◽

Emergency Measures ◽

Features Importance

Due to the specific circumstances related to the COVID-19 pandemic, many countries have enforced emergency measures such as self-isolation and restriction of movement and assembly, which are also directly affecting the functioning of their respective public health and judicial systems. The goal of this study is to identify the efficiency of the criminal sanctions in Russia that were introduced in the beginning of COVID-19 outbreak using machine learning methods. We have developed a regression model for the fine handed out, using random forest regression and XGBoost regression, and calculated the features importance parameters. We have developed classification models for the remission of the penalty and for setting a sentence using a gradient boosting classifier.

Download Full-text

Análisis y comparación de modelos de clasificación de aprendizaje automático aplicado a riesgo crediticio

Revista ECIPeru ◽

10.33017/reveciperu2017.0014/ ◽

2018 ◽

pp. 122-127

Keyword(s):

Machine Learning ◽

Current Knowledge ◽

Gradient Boosting ◽

Classification Models ◽

Process Automation ◽

Financial Industry ◽

Machine Learning Classification ◽

Nonperforming Loans ◽

Mathematical Algorithms ◽

Credit Granting

Análisis y comparación de modelos de clasificación de aprendizaje automático aplicado a riesgo crediticio Analysis and comparison of machine learning classification models applied to credit approval Jorge Brian Alarcón Flores, Jiam Carlos López Malca, Luis Ruiz Saldarriaga, Christian Walter Sarmiento Román Maestría en Informática con mención en Ciencias de la Computación, Pontificia Universidad Católica del Perú. Lima, Perú. Recibido el 18 de noviembre del 2017, aceptado el 26 de noviembre del 2017 DOI: https://doi.org/10.33017/RevECIPeru2017.0014/ Resumen El sector industrial financiero se ha convertido en un sector muy competitivo a nivel mundial. Dentro de este contexto, la decisión del otorgamiento de crédito es uno de los procesos más importantes del cual dependen indicadores críticos del negocio como son las colocaciones, las recuperaciones y el índice de morosidad. Este proceso se ha basado históricamente en expertos del negocio, quienes en base a su experiencia determinaban en función a ciertas variables de comportamiento del solicitante, si debían otorgar o no el crédito. En esta última década, el desarrollo de tecnologías como la inteligencia artificial y el aprendizaje de máquina han aportado mucho en la automatización de este proceso. El presente trabajo tiene como objetivo principal el análisis de varios algoritmos matemáticos basados en el aprendizaje de máquina en las predicciones de otorgamiento de crédito, dando una explicación objetiva de los resultados y sugiriendo las siguientes investigaciones que se desarrollarán con el fin de obtener mejores resultados en los algoritmos matemáticos existentes. Como resultados de la experimentación de determinó que el mejor modelo fue el de Gradient Boosting, con una exactitud de 83.71%. Descriptores: Abstract The financial industry has become into a very competitive sector worldwide. In that sense, the credit granting decision is one of the most important process of all, and in whose accuracy, rests the good performance of several critical business KPI's such as loans level, credit recoveries level and nonperforming loans ratios. This key process has historically based on the experts’ judgement, and have taken the decision of granting or not credit loans according to several customer credit behavior elements. In the last decade, the developing of certain technology such AI and machine learning has allowed this process automation. The present paper has its main goal, the analysis of several mathematical algorithms based on machine learning and the exposition of which of them have the better results in credit granting predictions to collaborate with current knowledge in this particular issue, giving an objective explanation of the results and suggesting following researches to be developed in order to get better results in existing mathematical algorithms. As results of the experimentation determined that the best model was Gradient Boosting, with an accuracy of 83.71%. Keywords: artificial intelligence, machine learning, credit risk, mathematic models, gradient boosting.

Download Full-text

Knee Muscle Force Estimating Model Using Machine Learning Approach

The Computer Journal ◽

10.1093/comjnl/bxaa160 ◽

2020 ◽

Author(s):

Anurag Sohane ◽

Ravinder Agarwal

Keyword(s):

Machine Learning ◽

Random Forest ◽

Muscle Force ◽

Vastus Lateralis ◽

Input Parameter ◽

Research Work ◽

Cost Effective ◽

Coefficient Of Determination ◽

Muscle Forces ◽

Knee Muscle

Abstract Various simulation type tools and conventional algorithms are being used to determine knee muscle forces of human during dynamic movement. These all may be good for clinical uses, but have some drawbacks, such as higher computational times, muscle redundancy and less cost-effective solution. Recently, there has been an interest to develop supervised learning-based prediction model for the computationally demanding process. The present research work is used to develop a cost-effective and efficient machine learning (ML) based models to predict knee muscle force for clinical interventions for the given input parameter like height, mass and angle. A dataset of 500 human musculoskeletal, have been trained and tested using four different ML models to predict knee muscle force. This dataset has obtained from anybody modeling software using AnyPyTools, where human musculoskeletal has been utilized to perform squatting movement during inverse dynamic analysis. The result based on the datasets predicts that the random forest ML model outperforms than the other selected models: neural network, generalized linear model, decision tree in terms of mean square error (MSE), coefficient of determination (R2), and Correlation (r). The MSE of predicted vs actual muscle forces obtained from the random forest model for Biceps Femoris, Rectus Femoris, Vastus Medialis, Vastus Lateralis are 19.92, 9.06, 5.97, 5.46, Correlation are 0.94, 0.92, 0.92, 0.94 and R2 are 0.88, 0.84, 0.84 and 0.89 for the test dataset, respectively.

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text

Prediction of activity and selectivity profiles of human Carbonic Anhydrase inhibitors using machine learning classification models

Journal of Cheminformatics ◽

10.1186/s13321-021-00499-y ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Annachiara Tinivella ◽

Luca Pinzi ◽

Giulio Rastelli

Keyword(s):

Machine Learning ◽

Carbonic Anhydrase ◽

A Priori ◽

Selective Inhibition ◽

Great Promise ◽

Classification Models ◽

Machine Learning Classification ◽

Central Interest ◽

Human Carbonic Anhydrase ◽

In Silico Models

AbstractThe development of selective inhibitors of the clinically relevant human Carbonic Anhydrase (hCA) isoforms IX and XII has become a major topic in drug research, due to their deregulation in several types of cancer. Indeed, the selective inhibition of these two isoforms, especially with respect to the homeostatic isoform II, holds great promise to develop anticancer drugs with limited side effects. Therefore, the development of in silico models able to predict the activity and selectivity against the desired isoform(s) is of central interest. In this work, we have developed a series of machine learning classification models, trained on high confidence data extracted from ChEMBL, able to predict the activity and selectivity profiles of ligands for human Carbonic Anhydrase isoforms II, IX and XII. The training datasets were built with a procedure that made use of flexible bioactivity thresholds to obtain well-balanced active and inactive classes. We used multiple algorithms and sampling sizes to finally select activity models able to classify active or inactive molecules with excellent performances. Remarkably, the results herein reported turned out to be better than those obtained by models built with the classic approach of selecting an a priori activity threshold. The sequential application of such validated models enables virtual screening to be performed in a fast and more reliable way to predict the activity and selectivity profiles against the investigated isoforms.

Download Full-text

Development and validation of a difficult laryngoscopy prediction model using machine learning of neck circumference and thyromental height

BMC Anesthesiology ◽

10.1186/s12871-021-01343-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Jong Ho Kim ◽

Haewon Kim ◽

Ji Su Jang ◽

Sung Mi Hwang ◽

So Young Lim ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Confidence Interval ◽

Neck Circumference ◽

Difficult Laryngoscopy ◽

Gradient Boosting ◽

Test Set ◽

Equal Distribution ◽

Light Gradient ◽

Extreme Gradient Boosting

Abstract Background Predicting difficult airway is challengeable in patients with limited airway evaluation. The aim of this study is to develop and validate a model that predicts difficult laryngoscopy by machine learning of neck circumference and thyromental height as predictors that can be used even for patients with limited airway evaluation. Methods Variables for prediction of difficulty laryngoscopy included age, sex, height, weight, body mass index, neck circumference, and thyromental distance. Difficult laryngoscopy was defined as Grade 3 and 4 by the Cormack-Lehane classification. The preanesthesia and anesthesia data of 1677 patients who had undergone general anesthesia at a single center were collected. The data set was randomly stratified into a training set (80%) and a test set (20%), with equal distribution of difficulty laryngoscopy. The training data sets were trained with five algorithms (logistic regression, multilayer perceptron, random forest, extreme gradient boosting, and light gradient boosting machine). The prediction models were validated through a test set. Results The model’s performance using random forest was best (area under receiver operating characteristic curve = 0.79 [95% confidence interval: 0.72–0.86], area under precision-recall curve = 0.32 [95% confidence interval: 0.27–0.37]). Conclusions Machine learning can predict difficult laryngoscopy through a combination of several predictors including neck circumference and thyromental height. The performance of the model can be improved with more data, a new variable and combination of models.

Download Full-text

An Extensive Review on Machine Learning and Deep Learning Based Cervical Cancer Diagnosis and Classification Models

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9437 ◽

2020 ◽

Vol 17 (12) ◽

pp. 5438-5446

Author(s):

C. Suguna ◽

S. P. Balamurugan

Keyword(s):

Machine Learning ◽

Cervical Cancer ◽

Deep Learning ◽

Comparative Study ◽

Cancer Diagnosis ◽

Pap Smear ◽

Classification Models ◽

Extensive Review ◽

Diagnosis And Classification ◽

Automated Machine Learning

Cervical cancer is a commonly occurring deadliest disease among women, which needs earlier diagnosis to reduce the prevalence. Pap-smear is considered as a widely employed technique to screen and diagnose cervical cancer. Since classical manual screening techniques are inefficient in the identification of cervical cancer, several research works have been started to develop automated machine learning (ML) and deep learning (DL) tools for cervical cancer diagnosis. This paper surveys the recent works made on cervical cancer diagnosis and classification. The recently presently ML and DL models for cervical cancer diagnosis and classification has been reviewed in detail. Besides, segmentation techniques developed for cervical cancer diagnosis also surveyed. At the end of the survey, a brief comparative study has been carried out to identify the significance of the reviewed methods.

Download Full-text