Prediction of Cardiovascular Disease on Different Parameters Using Machine Learning

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset218486 ◽

2021 ◽

pp. 52-61

Author(s):

Manoj D. Patil ◽

Dr. Harsh Mathur

Keyword(s):

Machine Learning ◽

False Negative ◽

Training Model ◽

Machine Learning Algorithms ◽

Proposed Model ◽

Collection Data ◽

Boosting Method ◽

Transformation Methods ◽

Result Analysis ◽

Bagging Method

The most common serious diseases affecting human health are cardiovascular diseases (CVDs). Early diagnosis can prevent or mitigate CVDs, which can reduce the rate of death. It's a promising approach to identify risk factors using machine learning models. We wish to propose a model with different methods to effectively predict heart disease. We have employed effective data collection, data pre-processing and data transformation methods for the precise information of our training model to make our proposed model a success. A combined dataset has been used (Cleveland, Long Beach VA, Switzerland, Hungarian and Stat log). The appropriate function is selected using AASSO (Advanced Absolute Shrinkage and Selection Operator techniques) and AASSO techniques. Appropriate features are selected. New hybrids are developed with integration of the traditional bagging and boosting methods, such as Decision Tree Bagger Method (DTBM), the Random Forest Bagging Method (RFBM), the K-Nearest Neighbour Bagging method (KNNBM), the AdaBoost Boosting Method (ABBM), and the GBBM. Our machine learning algorithms, along with Negative Predictive Value (NGR, false positive rates), and false negative flow rates, also were implemented to calculate accuracy of our model, sensitivity (SEN), error rate, accuracy of the model (FRE) and the F1 score (F1) (FNR). The results are shown for comparisons separately. Based on the result analysis, our proposed model produced the highest precision, Accuracy using RFBM and relief selection methods (99.05 percent).

Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010042 ◽

2021 ◽

Vol 10 (1) ◽

pp. 42

Author(s):

Kieu Anh Nguyen ◽

Walter Chen ◽

Bor-Shiun Lin ◽

Uma Seeboonruang

Keyword(s):

Machine Learning ◽

Soil Erosion ◽

Ensemble Methods ◽

Machine Learning Algorithms ◽

Multivariate Adaptive Regression Splines ◽

Gradient Boosting ◽

Support Vector ◽

Ensemble Machine Learning ◽

Boosting Method ◽

Bagging Method

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.

Prediction of Pest Insect Appearance Using Sensors and Machine Learning

Sensors ◽

10.3390/s21144846 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4846

Author(s):

Dušan Marković ◽

Dejan Vujičić ◽

Snežana Tanasković ◽

Borislav Đorđević ◽

Siniša Ranđić ◽

...

Keyword(s):

Machine Learning ◽

Relative Humidity ◽

Weather Conditions ◽

Daily Basis ◽

Machine Learning Algorithms ◽

Lower Percentage ◽

Timely Manner ◽

Proposed Model ◽

Set Up ◽

Accuracy Of Prediction

The appearance of pest insects can lead to a loss in yield if farmers do not respond in a timely manner to suppress their spread. Occurrences and numbers of insects can be monitored through insect traps, which include their permanent touring and checking of their condition. Another more efficient way is to set up sensor devices with a camera at the traps that will photograph the traps and forward the images to the Internet, where the pest insect’s appearance will be predicted by image analysis. Weather conditions, temperature and relative humidity are the parameters that affect the appearance of some pests, such as Helicoverpa armigera. This paper presents a model of machine learning that can predict the appearance of insects during a season on a daily basis, taking into account the air temperature and relative humidity. Several machine learning algorithms for classification were applied and their accuracy for the prediction of insect occurrence was presented (up to 76.5%). Since the data used for testing were given in chronological order according to the days when the measurement was performed, the existing model was expanded to take into account the periods of three and five days. The extended method showed better accuracy of prediction and a lower percentage of false detections. In the case of a period of five days, the accuracy of the affected detections was 86.3%, while the percentage of false detections was 11%. The proposed model of machine learning can help farmers to detect the occurrence of pests and save the time and resources needed to check the fields.

Modified Decision Tree Technique for Ransomware Detection at Runtime through API Calls

Scientific Programming ◽

10.1155/2020/8845833 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Faizan Ullah ◽

Qaisar Javaid ◽

Abdu Salam ◽

Masood Ahmad ◽

Nadeem Sarwar ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Feature Vector ◽

Machine Learning Algorithms ◽

The Novel ◽

Proposed Model ◽

Testing Accuracy ◽

Financial Losses

Ransomware (RW) is a distinctive variety of malware that encrypts the files or locks the user’s system by keeping and taking their files hostage, which leads to huge financial losses to users. In this article, we propose a new model that extracts the novel features from the RW dataset and performs classification of the RW and benign files. The proposed model can detect a large number of RW from various families at runtime and scan the network, registry activities, and file system throughout the execution. API-call series was reutilized to represent the behavior-based features of RW. The technique extracts fourteen-feature vector at runtime and analyzes it by applying online machine learning algorithms to predict the RW. To validate the effectiveness and scalability, we test 78550 recent malign and benign RW and compare with the random forest and AdaBoost, and the testing accuracy is extended at 99.56%.

An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques

Entropy ◽

10.3390/e23101258 ◽

2021 ◽

Vol 23 (10) ◽

pp. 1258

Author(s):

Taher Al-Shehari ◽

Rakan A. Alsowail

Keyword(s):

Machine Learning ◽

Detection System ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Sensitive Period ◽

Insider Threat ◽

Leakage Detection ◽

Insider Threats ◽

Insider Attack ◽

Proposed Model

Insider threats are malicious acts that can be carried out by an authorized employee within an organization. Insider threats represent a major cybersecurity challenge for private and public organizations, as an insider attack can cause extensive damage to organization assets much more than external attacks. Most existing approaches in the field of insider threat focused on detecting general insider attack scenarios. However, insider attacks can be carried out in different ways, and the most dangerous one is a data leakage attack that can be executed by a malicious insider before his/her leaving an organization. This paper proposes a machine learning-based model for detecting such serious insider threat incidents. The proposed model addresses the possible bias of detection results that can occur due to an inappropriate encoding process by employing the feature scaling and one-hot encoding techniques. Furthermore, the imbalance issue of the utilized dataset is also addressed utilizing the synthetic minority oversampling technique (SMOTE). Well known machine learning algorithms are employed to detect the most accurate classifier that can detect data leakage events executed by malicious insiders during the sensitive period before they leave an organization. We provide a proof of concept for our model by applying it on CMU-CERT Insider Threat Dataset and comparing its performance with the ground truth. The experimental results show that our model detects insider data leakage events with an AUC-ROC value of 0.99, outperforming the existing approaches that are validated on the same dataset. The proposed model provides effective methods to address possible bias and class imbalance issues for the aim of devising an effective insider data leakage detection system.

Machine-Learning-Based External Plagiarism Detecting Methodology From Monolingual Documents

Feature Dimension Reduction for Content-Based Image Identification - Advances in Multimedia and Interactive Technologies ◽

10.4018/978-1-5225-5775-3.ch007 ◽

2018 ◽

pp. 122-139

Author(s):

Saugata Bose ◽

Ritambhra Korpal

Keyword(s):

Machine Learning ◽

Language Processing ◽

Confusion Matrix ◽

False Negative ◽

False Negative Rate ◽

Search Space ◽

Machine Learning Algorithms ◽

C4.5 Decision Tree ◽

N Gram ◽

Four Levels

In this chapter, an initiative is proposed where natural language processing (NLP) techniques and supervised machine learning algorithms have been combined to detect external plagiarism. The major emphasis is on to construct a framework to detect plagiarism from monolingual texts by implementing n-gram frequency comparison approach. The framework is based on 120 characteristics which have been extracted during pre-processing steps using simple NLP approach. Afterward, filter metrics has been applied to select most relevant features and supervised classification learning algorithm has been used later to classify the documents in four levels of plagiarism. Then, confusion matrix was built to estimate the false positives and false negatives. Finally, the authors have shown C4.5 decision tree-based classifier's suitability on calculating accuracy over naive Bayes. The framework achieved 89% accuracy with low false positive and false negative rate and it shows higher precision and recall value comparing to passage similarities method, sentence similarity method, and search space reduction method.

Adaptive Ensemble Multi-Agent Based Intrusion Detection Model

Developing Advanced Web Services through P2P Computing and Autonomous Agents - Advances in Web Technologies and Engineering ◽

10.4018/978-1-61520-973-6.ch003 ◽

2010 ◽

pp. 36-48 ◽

Cited By ~ 1

Author(s):

Tarek Helmy

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Machine Learning Algorithms ◽

Agent Based ◽

Detection Model ◽

Detection Analysis ◽

Proposed Model ◽

Multi Agent

The system that monitors the events occurring in a computer system or a network and analyzes the events for sign of intrusions is known as intrusion detection system. The performance of the intrusion detection system can be improved by combing anomaly and misuse analysis. This chapter proposes an ensemble multi-agent-based intrusion detection model. The proposed model combines anomaly, misuse, and host-based detection analysis. The agents in the proposed model use rules to check for intrusions, and adopt machine learning algorithms to recognize unknown actions, to update or create new rules automatically. Each agent in the proposed model encapsulates a specific classification technique, and gives its belief about any packet event in the network. These agents collaborate to determine the decision about any event, have the ability to generalize, and to detect novel attacks. Empirical results indicate that the proposed model is efficient, and outperforms other intrusion detection models.

Keystroke dynamics Based Technique to Enhance the Security in Smart Devices

KIET Journal of Computing and Information Sciences ◽

10.51153/kjcis.v4i1.61 ◽

2021 ◽

Vol 4 (1) ◽

pp. 14

Author(s):

Farman Pirzado ◽

Shahzad Memon ◽

Lachman Das Dhomeja Dhomeja ◽

Awais Ahmed

Keyword(s):

Machine Learning ◽

User Authentication ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Smart Devices ◽

Mobile Banking ◽

Alternative Technique ◽

Keystroke Dynamics ◽

Data Set ◽

Proposed Model

Nowadays, smart devices have become a part of ourlives, hold our data, and are used for sensitive transactions likeinternet banking, mobile banking, etc. Therefore, it is crucial tosecure the data in these smart devices from theft or misplacement.The majority of the devices are secured with password/PINbaseduser authentication methods, which are already proveda less secure or easily guessable user authentication method.An alternative technique for securing smart devices is keystrokedynamics. Keystroke dynamics (KSD) is behavioral biometrics,which uses a natural typing pattern unique in every individualand difficult to fake or replicates that pattern. This paperproposes a user authentication model based on KSD as an additionalsecurity method for increasing the smart devices’ securitylevel. In order to analyze the proposed model, an android-basedapplication has been implemented for collecting data from fakeand genuine users. Six machine learning algorithms have beentested on the collected data set to study their suitability for usein the keystroke dynamics-based authentication model.

Machine Learning Assisted Cervical Cancer Detection

Frontiers in Public Health ◽

10.3389/fpubh.2021.788376 ◽

2021 ◽

Vol 9 ◽

Author(s):

Mavra Mehmood ◽

Muhammad Rizwan ◽

Michal Gregus ml ◽

Sidra Abbas

Keyword(s):

Machine Learning ◽

Cervical Cancer ◽

Mean Squared Error ◽

Medical Center ◽

Pearson Correlation ◽

False Negative ◽

Hybrid Approach ◽

False Negative Rate ◽

Machine Learning Algorithms ◽

Screening Programs

Cervical malignant growth is the fourth most typical reason for disease demise in women around the globe. Cervical cancer growth is related to human papillomavirus (HPV) contamination. Early screening made cervical cancer a preventable disease that results in minimizing the global burden of cervical cancer. In developing countries, women do not approach sufficient screening programs because of the costly procedures to undergo examination regularly, scarce awareness, and lack of access to the medical center. In this manner, the expectation of the individual patient's risk becomes very high. There are many risk factors relevant to malignant cervical formation. This paper proposes an approach named CervDetect that uses machine learning algorithms to evaluate the risk elements of malignant cervical formation. CervDetect uses Pearson correlation between input variables as well as with the output variable to pre-process the data. CervDetect uses the random forest (RF) feature selection technique to select significant features. Finally, CervDetect uses a hybrid approach by combining RF and shallow neural networks to detect Cervical Cancer. Results show that CervDetect accurately predicts cervical cancer, outperforms the state-of-the-art studies, and achieved an accuracy of 93.6%, mean squared error (MSE) error of 0.07111, false-positive rate (FPR) of 6.4%, and false-negative rate (FNR) of 100%.

Model and Algorithms for User Identification by Network Traffic

10.20948/graphicon-2021-3027-1017-1027 ◽

2021 ◽

Author(s):

Vasily Gai ◽

Irina Ephode ◽

Roman Barinov ◽

Igor Polyakov ◽

Vladimir Golubenko ◽

...

Keyword(s):

Machine Learning ◽

Network Traffic ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Feature Descriptor ◽

User Identification ◽

Redundant Data ◽

Boosting Method ◽

Object Feature ◽

Traffic Collection

This paper proposes a method of user identification by network traffic. We describe the information model created, as well as the implementation of each of the proposed problem solving stages. During the network traffic collection stage, a method of capturing network packets on the user's device using specialized software is used. The information obtained is further filtered by removing redundant data. During the object feature descriptor construction stage, we extract and describe the characteristics of network sessions from which the behavioral habits of users are derived. Classification of users according to the extracted characteristics of the network sessions is performed using machine learning techniques. When analyzing the test results, the most appropriate machine learning algorithms for solving the problem of user identification by network traffic were proposed, such as: logistic regression, decision trees, SVM with a linear hyperplane and the boosting method. The accuracy of the above methods was more than 95%. The results proved that it is possible to identify a particular user with a sufficiently high accuracy based on the characteristics of the data transmitted through the network, without examining the contents of the transmitted packets. Comparison of the developed model has shown that the proposed model of user identification by network traffic works as effectively as the existing analogues.

Novel Privacy Preserving Non-Invasive Sensing-Based Diagnoses of Pneumonia Disease Leveraging Deep Network Model

Sensors ◽

10.3390/s22020461 ◽

2022 ◽

Vol 22 (2) ◽

pp. 461

Author(s):

Mujeeb Ur Rehman ◽

Arslan Shafique ◽

Kashif Hesham Khan ◽

Sohail Khalid ◽

Abdullah Alhumaidi Alotaibi ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Medical Records ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

X Ray ◽

Non Invasive ◽

Proposed Model ◽

Pneumonia Diagnosis ◽

Better Than

This article presents non-invasive sensing-based diagnoses of pneumonia disease, exploiting a deep learning model to make the technique non-invasive coupled with security preservation. Sensing and securing healthcare and medical images such as X-rays that can be used to diagnose viral diseases such as pneumonia is a challenging task for researchers. In the past few years, patients’ medical records have been shared using various wireless technologies. The wireless transmitted data are prone to attacks, resulting in the misuse of patients’ medical records. Therefore, it is important to secure medical data, which are in the form of images. The proposed work is divided into two sections: in the first section, primary data in the form of images are encrypted using the proposed technique based on chaos and convolution neural network. Furthermore, multiple chaotic maps are incorporated to create a random number generator, and the generated random sequence is used for pixel permutation and substitution. In the second part of the proposed work, a new technique for pneumonia diagnosis using deep learning, in which X-ray images are used as a dataset, is proposed. Several physiological features such as cough, fever, chest pain, flu, low energy, sweating, shaking, chills, shortness of breath, fatigue, loss of appetite, and headache and statistical features such as entropy, correlation, contrast dissimilarity, etc., are extracted from the X-ray images for the pneumonia diagnosis. Moreover, machine learning algorithms such as support vector machines, decision trees, random forests, and naive Bayes are also implemented for the proposed model and compared with the proposed CNN-based model. Furthermore, to improve the CNN-based proposed model, transfer learning and fine tuning are also incorporated. It is found that CNN performs better than other machine learning algorithms as the accuracy of the proposed work when using naive Bayes and CNN is 89% and 97%, respectively, which is also greater than the average accuracy of the existing schemes, which is 90%. Further, K-fold analysis and voting techniques are also incorporated to improve the accuracy of the proposed model. Different metrics such as entropy, correlation, contrast, and energy are used to gauge the performance of the proposed encryption technology, while precision, recall, F1 score, and support are used to evaluate the effectiveness of the proposed machine learning-based model for pneumonia diagnosis. The entropy and correlation of the proposed work are 7.999 and 0.0001, respectively, which reflects that the proposed encryption algorithm offers a higher security of the digital data. Moreover, a detailed comparison with the existing work is also made and reveals that both the proposed models work better than the existing work.