Giving more insight for automatic risk prediction during pregnancy with interpretable machine learning

Maternal mortality rate (MMR) in Indonesia intercensal population survey (SUPAS) was considered high. For pregnancy risk detection, the public health center (puskesmas) applies a Poedji Rochjati screening card (KSPR) demonstrating 20 features. In addition to KSPR, pregnancy risk monitoring has been assisted with a pregnancy control card. Because of the differences in the number of features between the two control cards, it is necessary to make agreements between them. Our objectives are determining the most influential features, exploring the links among features on the KSPR and pregnancy control cards, and building a machine learning model for predicting pregnancy risk. For the first objective, we use correlation-based feature selection (CFS) and C5.0 algorithm. The next objective was answered by the union operation in the features produced by the two techniques. By performing the machine learning experiment on these features, the accuracy of the XGBoost algorithm demonstrated the hightest results of 94% followed by random forest, Naïve Bayes, and k-Nearest neighbor algorithms, 87%, 66%, and 60% respectively. Interpretability aspects are implemented with SHAP and LIME to provide more insight for classification model. In conclusion, the similarity feature generated in the two interpretation approaches confirmed that Cesar was dominant in determining pregnancy risk.

Download Full-text

Framing Twitter Public Sentiment on Nigerian Government COVID-19 Palliatives Distribution Using Machine Learning

Sustainability ◽

10.3390/su13063497 ◽

2021 ◽

Vol 13 (6) ◽

pp. 3497

Author(s):

Hassan Adamu ◽

Syaheerah Lebai Lutfi ◽

Nurul Hashimah Ahamed Hassain Malim ◽

Rohail Hassan ◽

Assunta Di Vaio ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Nearest Neighbor ◽

Primary Objective ◽

Support Vector ◽

Standard English ◽

Emotion Classification ◽

K Nearest Neighbor ◽

The Public ◽

The Government

Sustainable development plays a vital role in information and communication technology. In times of pandemics such as COVID-19, vulnerable people need help to survive. This help includes the distribution of relief packages and materials by the government with the primary objective of lessening the economic and psychological effects on the citizens affected by disasters such as the COVID-19 pandemic. However, there has not been an efficient way to monitor public funds’ accountability and transparency, especially in developing countries such as Nigeria. The understanding of public emotions by the government on distributed palliatives is important as it would indicate the reach and impact of the distribution exercise. Although several studies on English emotion classification have been conducted, these studies are not portable to a wider inclusive Nigerian case. This is because Informal Nigerian English (Pidgin), which Nigerians widely speak, has quite a different vocabulary from Standard English, thus limiting the applicability of the emotion classification of Standard English machine learning models. An Informal Nigerian English (Pidgin English) emotions dataset is constructed, pre-processed, and annotated. The dataset is then used to classify five emotion classes (anger, sadness, joy, fear, and disgust) on the COVID-19 palliatives and relief aid distribution in Nigeria using standard machine learning (ML) algorithms. Six ML algorithms are used in this study, and a comparative analysis of their performance is conducted. The algorithms are Multinomial Naïve Bayes (MNB), Support Vector Machine (SVM), Random Forest (RF), Logistics Regression (LR), K-Nearest Neighbor (KNN), and Decision Tree (DT). The conducted experiments reveal that Support Vector Machine outperforms the remaining classifiers with the highest accuracy of 88%. The “disgust” emotion class surpassed other emotion classes, i.e., sadness, joy, fear, and anger, with the highest number of counts from the classification conducted on the constructed dataset. Additionally, the conducted correlation analysis shows a significant relationship between the emotion classes of “Joy” and “Fear”, which implies that the public is excited about the palliatives’ distribution but afraid of inequality and transparency in the distribution process due to reasons such as corruption. Conclusively, the results from this experiment clearly show that the public emotions on COVID-19 support and relief aid packages’ distribution in Nigeria were not satisfactory, considering that the negative emotions from the public outnumbered the public happiness.

Download Full-text

Stress Classification of ECG-Derived HRV Features Extracted from Wearable Devices

10.20944/preprints202103.0644.v1 ◽

2021 ◽

Author(s):

Kayisan Mary Dalmeida ◽

Giovanni Luca Masala

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Wearable Devices ◽

Mental Wellbeing ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Automobile Crashes ◽

Machine Learning Model ◽

Stress Classification

Stress has been identified as one of the major causes of automobile crashes which then lead to high rates of fatalities and injuries each year. Stress can be measured via physiological measurements and in this study the focus will be based on the features that can be extracted by common wearable devices. Hence the study will be mainly focusing on the heart rate variability (HRV). This study is aimed to develop a good predictive model that can accurately classify stress levels from ECG-derived HRV features, obtained from automobile drivers, testing different machine learning methodologies such as K-Nearest Neighbor (KNN), Support Vector Machines (SVM), Multilayer Perceptron (MLP), Random Forest (RF) and Gradient Boosting (GB). Moreover, the models obtained with highest predictive power will be used as reference for the development of a machine learning model that would be used to classify stress from HRV features derived from HRV measurements obtained from wearable devices. We demonstrate that MLP was the ideal stress classifier by achieving a Recall of 80%. The proposed method can be also used on all applications in which is important to monitor the stress level e. g. in physical rehabilitation, anxiety relief or mental wellbeing.

Download Full-text

Classification model for accuracy and intrusion detection using machine learning approach

PeerJ Computer Science ◽

10.7717/peerj-cs.437 ◽

2021 ◽

Vol 7 ◽

pp. e437

Author(s):

Arushi Agarwal ◽

Purushottam Sharma ◽

Mohammed Alshehri ◽

Ahmed A. Mohamed ◽

Osama Alfarraj

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Nearest Neighbor ◽

Performance Metrics ◽

Detection System ◽

Confusion Matrix ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector ◽

K Nearest Neighbor

In today’s cyber world, the demand for the internet is increasing day by day, increasing the concern of network security. The aim of an Intrusion Detection System (IDS) is to provide approaches against many fast-growing network attacks (e.g., DDoS attack, Ransomware attack, Botnet attack, etc.), as it blocks the harmful activities occurring in the network system. In this work, three different classification machine learning algorithms—Naïve Bayes (NB), Support Vector Machine (SVM), and K-nearest neighbor (KNN)—were used to detect the accuracy and reducing the processing time of an algorithm on the UNSW-NB15 dataset and to find the best-suited algorithm which can efficiently learn the pattern of the suspicious network activities. The data gathered from the feature set comparison was then applied as input to IDS as data feeds to train the system for future intrusion behavior prediction and analysis using the best-fit algorithm chosen from the above three algorithms based on the performance metrics found. Also, the classification reports (Precision, Recall, and F1-score) and confusion matrix were generated and compared to finalize the support-validation status found throughout the testing phase of the model used in this approach.

Download Full-text

Predicting heart failure using a wrapper-based feature selection

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v21.i3.pp1530-1539 ◽

2021 ◽

Vol 21 (3) ◽

pp. 1530

Author(s):

Minh Tuan Le ◽

Minh Thanh Vo ◽

Nhat Tan Pham ◽

Son V.T Dao

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Medical Practitioners ◽

Machine Learning Model ◽

Heart Contraction ◽

Artificial Neural Network Ann

In the current health system, it is very difficult for medical practitioners/physicians to diagnose the effectiveness of heart contraction. In this research, we proposed a machine learning model to predict heart contraction using an artificial neural network (ANN). We also proposed a novel wrapper-based feature selection utilizing a grey wolf optimization (GWO) to reduce the number of required input attributes. In this work, we compared the results achieved using our method and several conventional machine learning algorithms approaches such as support vector machine, decision tree, K-nearest neighbor, naïve bayes, random forest, and logistic regression. Computational results show not only that much fewer features are needed, but also higher prediction accuracy can be achieved around 87%. This work has the potential to be applicable to clinical practice and become a supporting tool for doctors/physicians.

Download Full-text

Minimum Mapping from EMG Signals at Human Elbow and Shoulder Movements into Two DoF Upper-Limb Robot with Machine Learning

Machines ◽

10.3390/machines9030056 ◽

2021 ◽

Vol 9 (3) ◽

pp. 56

Author(s):

Pringgo Widyo Laksono ◽

Takahide Kitamura ◽

Joseph Muguro ◽

Kojiro Matsushita ◽

Minoru Sasaki ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Nearest Neighbor ◽

Energy Operator ◽

Classification Model ◽

Robotic Arm ◽

Support Vector ◽

Classification Models ◽

Elbow Extension ◽

K Nearest Neighbor

This research focuses on the minimum process of classifying three upper arm movements (elbow extension, shoulder extension, combined shoulder and elbow extension) of humans with three electromyography (EMG) signals, to control a 2-degrees of freedom (DoF) robotic arm. The proposed minimum process consists of four parts: time divisions of data, Teager–Kaiser energy operator (TKEO), the conventional EMG feature extraction (i.e., the mean absolute value (MAV), zero crossings (ZC), slope-sign changes (SSC), and waveform length (WL)), and eight major machine learning models (i.e., decision tree (medium), decision tree (fine), k-Nearest Neighbor (KNN) (weighted KNN, KNN (fine), Support Vector Machine (SVM) (cubic and fine Gaussian SVM), Ensemble (bagged trees and subspace KNN). Then, we compare and investigate 48 classification models (i.e., 47 models are proposed, and 1 model is the conventional) based on five healthy subjects. The results showed that all the classification models achieved accuracies ranging between 74–98%, and the processing speed is below 40 ms and indicated acceptable controller delay for robotic arm control. Moreover, we confirmed that the classification model with no time division, with TKEO, and with ensemble (subspace KNN) had the best performance in accuracy rates at 96.67, recall rates at 99.66, and precision rates at 96.99. In short, the combination of the proposed TKEO and ensemble (subspace KNN) plays an important role to achieve the EMG classification.

Download Full-text

Perbandingan Naïve Bayes, SVM, dan k-NN untuk Analisis Sentimen Gadget Berbasis Aspek

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i6.3588 ◽

2021 ◽

Vol 5 (6) ◽

pp. 1120-1126

Author(s):

Jessica Widyadhana Iskandar ◽

Yessica Nataliani

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Brand Image ◽

Naïve Bayes ◽

Classification Model ◽

Support Vector ◽

K Nearest Neighbor ◽

The Public ◽

Unique Shape ◽

Design Aspect

The Samsung Galaxy Z Flip 3 is one of the gadgets that are currently popular among the public because of its unique shape and features. Youtube is one of the social media that can be accessed and enjoyed by the public, one of which is gadget review content on the GadgetIn channel. Youtube can provide information, whether people accept or are interested in this new gadget or not. This study aims to determine the sentiment of a gadget producer. Based on the results of the analysis and testing that has been carried out on the Youtube comments of the Samsung Galaxy Z Flip 3 gadget with a total of 9,597 comments, more users gave positive opinions in the design aspect and negative opinions on the price, specifications and brand image aspects. By using the CRISP-DM model and comparing the Naïve Bayes (NB), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN) classification methods, it is proven that the SVM classification model shows the best results. The average accuracy of SVM is 96.43% seen from four aspects, namely the design aspect of 94.40%, the price aspect of 97.44%, the specification aspect of 96.22%, and the brand image aspect of 97.63%.

Download Full-text

Processing Technique Selection for Steels Based on Mechanical Properties Using Machine Learning Framework

10.21203/rs.3.rs-336843/v1 ◽

2021 ◽

Author(s):

Amitava Choudhury

Keyword(s):

Machine Learning ◽

Mechanical Properties ◽

Nearest Neighbor ◽

Rolling Process ◽

Computational Method ◽

Classification Model ◽

Support Vector ◽

K Nearest Neighbor ◽

Process Data ◽

Kappa Score

Abstract Prediction of process route from materials with a desired set of property is one of the fundamental issues in the perspective of the materials design aspect. The parameter space is often too large to be bound since there are too many possibilities. However, in the areas with limited theoretical access, artificial learning techniques can be attempted by using the available data of the candidate materials. In this study, a computational method has been proposed to predict the different process routes of steel constituted taking composition and desired mechanical properties as inputs. First of all, historical data of the actual rolling process was collected, cleaned, and integrated. Further, the dataset is divided into four different classes based on the rolling process data. Then, to find out the essential characteristics of variables, feature correlation among various features has been calculated. A state-of-art machine learning prediction methods such as logistic regression, K- nearest neighbor, Support vector machine, and the random forest are studied to implement the prediction model. In order to avoid the overfitting of the model, k fold cross-validation is applied to the model, and achieve a realistic prediction result with an accuracy of 97%. The F1-score of the classification model is 0.86, and the kappa score is 0.95, which comply that the model has excellent learning and speculation ability and the precise forecast of steel process routes based on the given input parameters.

Download Full-text

K-Nearest Neighbor with K-Fold Cross Validation and Analytic Hierarchy Process on Data Classification

International Journal of Advances in Data and Information Systems ◽

10.25008/ijadis.v2i1.1204 ◽

2021 ◽

Vol 2 (1) ◽

Author(s):

Zoelkarnain Rinanda Tembusai ◽

Herman Mawengkang ◽

Muhammad Zarlis

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Analytic Hierarchy Process ◽

Cross Validation ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Analytic Hierarchy ◽

Machine Learning Model ◽

Hierarchy Process ◽

Fold Cross Validation

This study analyzes the performance of the k-Nearest Neighbor method with the k-Fold Cross Validation algorithm as an evaluation model and the Analytic Hierarchy Process method as feature selection for the data classification process in order to obtain the best level of accuracy and machine learning model. The best test results are in fold-3, which is getting an accuracy rate of 95%. Evaluation of the k-Nearest Neighbor model with k-Fold Cross Validation can get a good machine learning model and the Analytic Hierarchy Process as a feature selection also gets optimal results and can reduce the performance of the k-Nearest Neighbor method because it only uses features that have been selected based on the level of importance for decision making.

Download Full-text

Sentiment Analysis on Corona Virus Pandemic Using Machine Learning Algorithm

JOURNAL OF INFORMATICS AND TELECOMMUNICATION ENGINEERING ◽

10.31289/jite.v4i1.3798 ◽

2020 ◽

Vol 4 (1) ◽

pp. 86-96

Author(s):

Ricky Risnantoyo ◽

Arifin Nugroho ◽

Kresna Mandara

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Learning Algorithm ◽

Classification Model ◽

Support Vector ◽

K Nearest Neighbor ◽

The Public ◽

Machine Learning Classification ◽

Corona Virus

Corona virus outbreaks that occur in almost all countries in the world have an impact not only in the health sector, but also in other sectors such as tourism, finance, transportation, etc. This raises a variety of sentiments from the public with the emergence of corona virus as a trending topic on Twitter social media. Twitter was chosen by the public because it can disseminate information in real time and can see market reactions quickly. This research uses "tweet" data or public tweet related to "Corona Virus" to see how the sentiment polarity arises. Text mining techniques and three machine learning classification algorithms are used, including Naive Bayes, Support Vector Machine (SVM), K-Nearest Neighbor (K-NN) to build a tweet classification model of sentiments whether they have positive, negative, or neutral polarity. The highest test results are generated by the Support Vector Machine (SVM) algorithm with an accuracy value of 76.21%, a precision value of 78.04%, and a recall value of 71.42%.Keywords: Machine Learning, Corona Virus, Twitter, Sentiment Analysis.

Download Full-text

An integrated machine learning model for indoor network optimization to maximize coverage

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v24.i1.pp394-402 ◽

2021 ◽

Vol 24 (1) ◽

pp. 394

Author(s):

Ahmed Wasif Reza ◽

Abdullah Al Rifat ◽

Tanvir Ahmed

Keyword(s):

Machine Learning ◽

Network Optimization ◽

Nearest Neighbor ◽

Learning Model ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Simple Task ◽

Machine Learning Model ◽

Minimum Number

Indoor network optimization is not a simple task due to the obstacles, interference, and attenuation of the signal in an environment. Intense noises can affect the intelligibility of the signal and reduce the coverage strength significantly which results in a poor user experience. Most of the existing works are associated with finding the location of the devices via different mathematical and generic algorithmic approaches, but very few are focused on implying machine learning algorithms. The purpose of this research is to introduce an integrated machine learning model to find maximum indoor coverage with a minimum number of transmitters. The users in the indoor environment also have been allocated based on the most reliable signal strength and the system is also capable of allocating new users. K-means clustering, K-nearest neighbor (KNN), support vector machine (SVM), and Gaussian Naïve Bayes (GNB) have been used to provide an optimized solution. It is found that KNN, SVM, and GNB obtained maximum accuracy of 100% in some cases. However, among all the algorithms, KNN performed the best and provided an average accuracy of 93.33%. K-fold cross-validation (Kf-CV) technique has been added to validate the experimental simulations and re-evaluate the outcomes of the machine learning models.

Download Full-text