Prediction of Spam Email using Machine Learning Classification Algorithm

P Sai Teja

doi:10.22214/ijraset.2021.35226

Using Random Forests on Real-World City Data for Urban Planning in a Visual Semantic Decision Support System

Sensors ◽

10.3390/s19102266 ◽

2019 ◽

Vol 19 (10) ◽

pp. 2266 ◽

Cited By ~ 1

Author(s):

Nikolaos Sideris ◽

Georgios Bardis ◽

Athanasios Voulodimos ◽

Georgios Miaoulis ◽

Djamchid Ghazanfarpour

Keyword(s):

Machine Learning ◽

Urban Planning ◽

Random Forests ◽

Real World ◽

Performance Metrics ◽

World City ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Real World Data

The constantly increasing amount and availability of urban data derived from varying sources leads to an assortment of challenges that include, among others, the consolidation, visualization, and maximal exploitation prospects of the aforementioned data. A preeminent problem affecting urban planning is the appropriate choice of location to host a particular activity (either commercial or common welfare service) or the correct use of an existing building or empty space. In this paper, we propose an approach to address these challenges availed with machine learning techniques. The proposed system combines, fuses, and merges various types of data from different sources, encodes them using a novel semantic model that can capture and utilize both low-level geometric information and higher level semantic information and subsequently feeds them to the random forests classifier, as well as other supervised machine learning models for comparisons. Our experimental evaluation on multiple real-world data sets comparing the performance of several classifiers (including Feedforward Neural Networks, Support Vector Machines, Bag of Decision Trees, k-Nearest Neighbors and Naïve Bayes), indicated the superiority of Random Forests in terms of the examined performance metrics (Accuracy, Specificity, Precision, Recall, F-measure and G-mean).

Download Full-text

Identification of Malignancies from Free-Text Histopathology Reports Using a Multi-Model Supervised Machine Learning Approach

Information ◽

10.3390/info11090455 ◽

2020 ◽

Vol 11 (9) ◽

pp. 455

Author(s):

Victor Olago ◽

Mazvita Muchengeti ◽

Elvira Singh ◽

Wenlong C. Chen

Keyword(s):

Machine Learning ◽

Cancer Registries ◽

Sub Saharan Africa ◽

Supervised Machine Learning ◽

Stochastic Gradient Descent ◽

Misclassification Rate ◽

Support Vector ◽

Free Text ◽

K Nearest Neighbor ◽

Adaptive Boosting

We explored various Machine Learning (ML) models to evaluate how each model performs in the task of classifying histopathology reports. We trained, optimized, and performed classification with Stochastic Gradient Descent (SGD), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), Adaptive Boosting (AB), Decision Trees (DT), Gaussian Naïve Bayes (GNB), Logistic Regression (LR), and Dummy classifier. We started with 60,083 histopathology reports, which reduced to 60,069 after pre-processing. The F1-scores for SVM, SGD KNN, RF, DT, LR, AB, and GNB were 97%, 96%, 96%, 96%, 92%, 96%, 84%, and 88%, respectively, while the misclassification rates were 3.31%, 5.25%, 4.39%, 1.75%, 3.5%, 4.26%, 23.9%, and 19.94%, respectively. The approximate run times were 2 h, 20 min, 40 min, 8 h, 40 min, 10 min, 50 min, and 4 min, respectively. RF had the longest run time but the lowest misclassification rate on the labeled data. Our study demonstrated the possibility of applying ML techniques in the processing of free-text pathology reports for cancer registries for cancer incidence reporting in a Sub-Saharan Africa setting. This is an important consideration for the resource-constrained environments to leverage ML techniques to reduce workloads and improve the timeliness of reporting of cancer statistics.

Download Full-text

Diabetes Prediction Using Machine Learning Techniques

Journal of Intelligent Systems with Applications ◽

10.54856/10.54856/jiswa.202112183 ◽

2021 ◽

pp. 150-152

Author(s):

Seyma Kiziltas Koc ◽

Mustafa Yeniad

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

High Performance ◽

Nearest Neighbor ◽

Classification Performance ◽

Machine Learning Techniques ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Machine Learning Classification

Technologies which are used in the healthcare industry are changing rapidly because the technology is evolving to improve people's lifestyles constantly. For instance, different technological devices are used for the diagnosis and treatment of diseases. It has been revealed that diagnosis of disease can be made by computer systems with developing technology.Machine learning algorithms are frequently used tools because of their high performance in the field of health as well as many field. The aim of this study is to investigate different machine learning classification algorithms that can be used in the diagnosis of diabetes and to make comparative analyzes according to the metrics in the literature. In the study, seven classification algorithms were used in the literature. These algorithms are Logistic Regression, K-Nearest Neighbor, Multilayer Perceptron, Random Forest, Decision Trees, Support Vector Machine and Naive Bayes. Firstly, classification performance of algorithms are compared. These comparisons are based on accuracy, sensitivity, precision, and F1-score. The results obtained showed that support vector machine algorithm had the highest accuracy with 78.65%.

Download Full-text

Classification model for accuracy and intrusion detection using machine learning approach

PeerJ Computer Science ◽

10.7717/peerj-cs.437 ◽

2021 ◽

Vol 7 ◽

pp. e437

Author(s):

Arushi Agarwal ◽

Purushottam Sharma ◽

Mohammed Alshehri ◽

Ahmed A. Mohamed ◽

Osama Alfarraj

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Nearest Neighbor ◽

Performance Metrics ◽

Detection System ◽

Confusion Matrix ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector ◽

K Nearest Neighbor

In today’s cyber world, the demand for the internet is increasing day by day, increasing the concern of network security. The aim of an Intrusion Detection System (IDS) is to provide approaches against many fast-growing network attacks (e.g., DDoS attack, Ransomware attack, Botnet attack, etc.), as it blocks the harmful activities occurring in the network system. In this work, three different classification machine learning algorithms—Naïve Bayes (NB), Support Vector Machine (SVM), and K-nearest neighbor (KNN)—were used to detect the accuracy and reducing the processing time of an algorithm on the UNSW-NB15 dataset and to find the best-suited algorithm which can efficiently learn the pattern of the suspicious network activities. The data gathered from the feature set comparison was then applied as input to IDS as data feeds to train the system for future intrusion behavior prediction and analysis using the best-fit algorithm chosen from the above three algorithms based on the performance metrics found. Also, the classification reports (Precision, Recall, and F1-score) and confusion matrix were generated and compared to finalize the support-validation status found throughout the testing phase of the model used in this approach.

Download Full-text

Application of machine learning techniques and empirical mode decomposition for the classification of analog modulated signals

ACTA IMEKO ◽

10.21014/acta_imeko.v9i2.800 ◽

2020 ◽

Vol 9 (2) ◽

pp. 66

Author(s):

Domenico Luca Carnì ◽

Eulalia Balestrieri ◽

Ioan Tudosa ◽

Francesco Lamonaca

Keyword(s):

Machine Learning ◽

Empirical Mode Decomposition ◽

Modulation Frequency ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Adaptive Boosting ◽

Numerical Tests ◽

Mode Decomposition

In this article, an automatic Analog Modulation Classifier based on Empirical mode decomposition and Machine learning approaches (AMC-EM) is proposed. The AMC-EM operates without a priori information and can recognise typical analog modulation schemes: amplitude modulation, phase modulation, frequency modulation, and single sideband modulation. The AMC-EM uses Empirical Mode Decomposition (EMD) to evaluate the features of the signal for the successive classification by using Machine Learning (ML). In the design of the AMC-EM, the selection of the specific ML technique is performed by comparing, with numerical tests, the performance of the (i) Support Vector Machine (SVM), (ii) k-nearest neighbor classifier, and (iii) adaptive boosting, since they are commonly used in the field of signal classification. The tests have highlighted that the SVM, specifically the quadratic SVM, permits the best possible performance concerning classification accuracy, by considering different noise intensities superimposed on the signal. To assess the advantages of the proposal, a comparison with other classifiers available in the literature has been undertaken through numerical tests. Finally, the AMC-EM is experimentally evaluated, and the experimental results agree with those of the simulation.

Download Full-text

A REVIEW ON MACHINE LEARNING TECHNIQUES FOR ADVANCED HEALTH CARE SYSTEMS

June-2020 - International Journal of Engineering Sciences & Research Technology ◽

10.29121/ijesrt.v9.i11.2020.1 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1-7

Keyword(s):

Machine Learning ◽

Health Care ◽

Logistic Regression ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor

Artificial intelligence is the technology that lets a machine mimic the thinking ability of a human being. Machine learning is the subset of AI, that makes this machine exhibit human behavior by making it learn from the known data, without the need of explicitly programming it. The health care sector has adopted this technology, for the development of medical procedures, maintaining huge patient’s records, assist physicians in the prediction, detection, and treatment of diseases and many more. In this paper, a comparative study of six supervised machine learning algorithms namely Logistic Regression(LR),support vector machine(SVM),Decision Tree(DT).Random Forest(RF),k-nearest neighbor(k-NN),Naive Bayes (NB) are made for the classification and prediction of diseases. Result shows out of compared supervised learning algorithms here, logistic regression is performing best with an accuracy of 81.4 % and the least performing is k-NN with just an accuracy of 69.01% in the classification and prediction of diseases.

Download Full-text

Cardiac Disease Prediction using Supervised Machine Learning Techniques.

Journal of Physics Conference Series ◽

10.1088/1742-6596/2161/1/012013 ◽

2022 ◽

Vol 2161 (1) ◽

pp. 012013

Author(s):

Chiradeep Gupta ◽

Athina Saha ◽

N V Subba Reddy ◽

U Dinesh Acharya

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Cardiac Disease ◽

Performance Metrics ◽

Confusion Matrix ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Ensemble Techniques ◽

Learning Techniques

Abstract Diagnosis of cardiac disease requires being more accurate, precise, and reliable. The number of death cases due to cardiac attacks is increasing exponentially day by day. Thus, practical approaches for earlier diagnosis of cardiac or heart disease are done to achieve prompt management of the disease. Various supervised machine learning techniques like K-Nearest Neighbour, Decision Tree, Logistic Regression, Naïve Bayes, and Support Vector Machine (SVM) model are used for predicting cardiac disease using a dataset that was collected from the repository of the University of California, Irvine (UCI). The results depict that Logistic Regression was better than all other supervised classifiers in terms of the performance metrics. The model is also less risky since the number of false negatives is low as compared to other models as per the confusion matrix of all the models. In addition, ensemble techniques can be approached for the accuracy improvement of the classifier. Jupyter notebook is the best tool, for the implementation of Python Programming having many types of libraries, header files, for accurate and precise work.

Download Full-text

Comparations of Supervised Machine Learning Techniques in Predicting the Classification of the Household’s Welfare Status

Journal Pekommas ◽

10.30818/jpkm.2019.2040105 ◽

2019 ◽

Vol 4 (1) ◽

pp. 43

Author(s):

Nfn Nofriani

Keyword(s):

Machine Learning ◽

Random Forest ◽

Social Assistance ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Random Forest Algorithm ◽

K Nearest Neighbor ◽

Learning Techniques

Poverty has been a major problem for most countries around the world, including Indonesia. One approach to eradicate poverty is through equitable distribution of social assistance for target households based on Integrated Database of social assistance. This study has compared several well-known supervised machine learning techniques, namely: Naïve Bayes Classifier, Support Vector Machines, K-Nearest Neighbor Classification, C4.5 Algorithm, and Random Forest Algorithm to predict household welfare status classification by using an Integrated Database as a study case. The main objective of this study was to choose the best-supervised machine learning approach in predicting the classification of household’s welfare status based on attributes in the Integrated Database. The results showed that the Random Forest Algorithm was the best.

Download Full-text

Sentiment Analysis on Corona Virus Pandemic Using Machine Learning Algorithm

JOURNAL OF INFORMATICS AND TELECOMMUNICATION ENGINEERING ◽

10.31289/jite.v4i1.3798 ◽

2020 ◽

Vol 4 (1) ◽

pp. 86-96

Author(s):

Ricky Risnantoyo ◽

Arifin Nugroho ◽

Kresna Mandara

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Learning Algorithm ◽

Classification Model ◽

Support Vector ◽

K Nearest Neighbor ◽

The Public ◽

Machine Learning Classification ◽

Corona Virus

Corona virus outbreaks that occur in almost all countries in the world have an impact not only in the health sector, but also in other sectors such as tourism, finance, transportation, etc. This raises a variety of sentiments from the public with the emergence of corona virus as a trending topic on Twitter social media. Twitter was chosen by the public because it can disseminate information in real time and can see market reactions quickly. This research uses "tweet" data or public tweet related to "Corona Virus" to see how the sentiment polarity arises. Text mining techniques and three machine learning classification algorithms are used, including Naive Bayes, Support Vector Machine (SVM), K-Nearest Neighbor (K-NN) to build a tweet classification model of sentiments whether they have positive, negative, or neutral polarity. The highest test results are generated by the Support Vector Machine (SVM) algorithm with an accuracy value of 76.21%, a precision value of 78.04%, and a recall value of 71.42%.Keywords: Machine Learning, Corona Virus, Twitter, Sentiment Analysis.

Download Full-text

Diabetes Prediction Using Machine Learning Techniques

Journal of Intelligent Systems with Applications ◽

10.54856/jiswa.202112183 ◽

2021 ◽

pp. 150-152

Author(s):

Seyma Kiziltas Koc ◽

Mustafa Yeniad

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

High Performance ◽

Nearest Neighbor ◽

Classification Performance ◽

Machine Learning Techniques ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Machine Learning Classification

Technologies which are used in the healthcare industry are changing rapidly because the technology is evolving to improve people's lifestyles constantly. For instance, different technological devices are used for the diagnosis and treatment of diseases. It has been revealed that diagnosis of disease can be made by computer systems with developing technology.Machine learning algorithms are frequently used tools because of their high performance in the field of health as well as many field. The aim of this study is to investigate different machine learning classification algorithms that can be used in the diagnosis of diabetes and to make comparative analyzes according to the metrics in the literature. In the study, seven classification algorithms were used in the literature. These algorithms are Logistic Regression, K-Nearest Neighbor, Multilayer Perceptron, Random Forest, Decision Trees, Support Vector Machine and Naive Bayes. Firstly, classification performance of algorithms are compared. These comparisons are based on accuracy, sensitivity, precision, and F1-score. The results obtained showed that support vector machine algorithm had the highest accuracy with 78.65%.

Download Full-text