Assessment of Acoustic Features and Machine Learning for Parkinson’s Detection

This article presents a machine learning approach for Parkinson’s disease detection. Potential multiple acoustic signal features of Parkinson’s and control subjects are ascertained. A collaborated feature bank is created through correlated feature selection, Fisher score feature selection, and mutual information-based feature selection schemes. A detection model on top of the feature bank has been developed using the traditional Naïve Bayes, which proved state of the art. The Naïve Bayes detector on collaborative acoustic features can detect the presence of Parkinson’s magnificently with a detection accuracy of 78.97% and precision of 0.926, under the hold-out cross validation. The collaborative feature bank on Naïve Bayes revealed distinguishable results as compared to many other recently proposed approaches. The simplicity of Naïve Bayes makes the system robust and effective throughout the detection process.

Download Full-text

Mitigating Webshell Attacks through Machine Learning Techniques

Future Internet ◽

10.3390/fi12010012 ◽

2020 ◽

Vol 12 (1) ◽

pp. 12 ◽

Cited By ~ 3

Author(s):

You Guo ◽

Hector Marco-Gisbert ◽

Paul Keir

Keyword(s):

Machine Learning ◽

Feature Matching ◽

Naive Bayes ◽

Web Server ◽

Naïve Bayes ◽

Malicious Code ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Detection Methods ◽

Detection Model

A webshell is a command execution environment in the form of web pages. It is often used by attackers as a backdoor tool for web server operations. Accurately detecting webshells is of great significance to web server protection. Most security products detect webshells based on feature-matching methods—matching input scripts against pre-built malicious code collections. The feature-matching method has a low detection rate for obfuscated webshells. However, with the help of machine learning algorithms, webshells can be detected more efficiently and accurately. In this paper, we propose a new PHP webshell detection model, the NB-Opcode (naïve Bayes and opcode sequence) model, which is a combination of naïve Bayes classifiers and opcode sequences. Through experiments and analysis on a large number of samples, the experimental results show that the proposed method could effectively detect a range of webshells. Compared with the traditional webshell detection methods, this method improves the efficiency and accuracy of webshell detection.

Download Full-text

Peningkatan Performa Pendeteksian Anomali Menggunakan Ensemble Learning dan Feature Selection

Creative Information Technology Journal ◽

10.24076/citec.2020v7i1.238 ◽

2021 ◽

Vol 7 (1) ◽

pp. 1

Author(s):

Ripto Sudiyarno ◽

Arief Setyanto ◽

Emha Taufiq Luthfi

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Ensemble Learning ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Detection Systems ◽

Learning Techniques ◽

Performance Results

Intrusion detection systems (IDS) atau Sistem pendeteksian intrusi dikenal sebagai teknik yang sangat menonjol dan terkemuka untuk menemukan malicious activities pada jaringan komputer, tidak seperti firewall konvensional, IDS berbeda dalam hal pengidentifikasian serangan secara cerdas dengan pendekatan analitik seperti data mining dan teknik machine learning. Dalam beberapa dekade terakhir, ensemble learning sangat memajukan penelitian pada machine learning dan klasifikasi pola, serta menunjukan peningkatan hasil kinerja dibandingkan single classifier. Pada Penelitian ini dilakukan percobaan peningkatan nilai akurasi terhadap sistem pendeteksian anomali, pertama dilakukan klasifikasi menggunakan single classifier untuk didapati hasil nilai akurasi yang nantinya dibandingkan dengan hasil dari ensemble learning dan feature selection. Penggunaan ensemble learning bertujuan untuk mendapatkan nilai akurasi yang terbaik dari single classifier. Hasil didapatkan dari nilai confusion matrix dan akan dilakukan pengujian dengan cara membandingkan nilai kedua metode diatas. Penelitian berhasil mendapatkan nilai akurasi single classifier (naïve bayes) yaitu 77,4% dan nilai ensemble learning 96,8%. Kata Kunci— ensemble learning, nsl-kdd, naïve bayes, anomali, feature selectionIntrusion detection systems (IDS) are known as very prominent and leading techniques for finding malicious activities on computer networks, unlike conventional firewalls, IDS differs in terms of identifying attacks intelligently with analytic approaches such as machine learning techniques. In the last few decades, ensemble learning has greatly advanced research in machine learning and pattern classification it has shown an improve in performance results compared to a single classifier. In this study an attempt was made to increase the accuracy of anomalous detection systems, first by classification using a single classifier to find the results of accuracy which will be compared with the results of ensemble learning and feature selection. The use of ensemble learning aims to get the best accuracy value from a single classifier. The results are obtained from the value of the confusion matrix and will be tested by comparing the values of the two methods above. The research succeeded in getting a single classifier accuracy value of 77,4% and ensemble learning 96,8%. Keywords— ensemble learning, nsl-kdd, naïve bayes, anomali, feature selection

Download Full-text

Intrusion Detection Model Using Chi Square Feature Selection and Modified Naïve Bayes Classifier

Proceedings of the 3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC – 16’) - Smart Innovation, Systems and Technologies ◽

10.1007/978-3-319-30348-2_7 ◽

2016 ◽

pp. 81-91 ◽

Cited By ~ 3

Author(s):

I. Sumaiya Thaseen ◽

Ch. Aswani Kumar

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Chi Square ◽

Detection Model

Download Full-text

Using Machine Learning to Build a Classification Model for IoT Networks to Detect Attack Signatures

International journal of Computer Networks & Communications ◽

10.5121/ijcnc.2020.12607 ◽

2020 ◽

Vol 12 (6) ◽

pp. 99-116

Author(s):

Mousa Al-Akhras ◽

Mohammed Alawairdhi ◽

Ali Alkoudari ◽

Samer Atawneh

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Denial Of Service ◽

Learning Algorithms ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Classification Model ◽

Security And Privacy ◽

K Nearest Neighbors ◽

Detection Model

Internet of things (IoT) has led to several security threats and challenges within society. Regardless of the benefits that it has brought with it to the society, IoT could compromise the security and privacy of individuals and companies at various levels. Denial of Service (DoS) and Distributed DoS (DDoS) attacks, among others, are the most common attack types that face the IoT networks. To counter such attacks, companies should implement an efficient classification/detection model, which is not an easy task. This paper proposes a classification model to examine the effectiveness of several machine-learning algorithms, namely, Random Forest (RF), k-Nearest Neighbors (KNN), and Naïve Bayes. The machine learning algorithms are used to detect attacks on the UNSW-NB15 benchmark dataset. The UNSW-NB15 contains normal network traffic and malicious traffic instants. The experimental results reveal that RF and KNN classifiers give the best performance with an accuracy of 100% (without noise injection) and 99% (with 10% noise filtering), while the Naïve Bayes classifier gives the worst performance with an accuracy of 95.35% and 82.77 without noise and with 10% noise, respectively. Other evaluation matrices, such as precision and recall, also show the effectiveness of RF and KNN classifiers over Naïve Bayes.

Download Full-text

Feature selection via computational intelligence techniques

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189090 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6205-6216

Author(s):

Ramazan Algin ◽

Ali Fuat Alkaya ◽

Mustafa Agaoglu

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Computational Intelligence ◽

Nearest Neighbor ◽

Naive Bayes ◽

Search Algorithms ◽

Naïve Bayes ◽

Feature Subset ◽

K Nearest Neighbor ◽

Migrating Birds Optimization

Feature selection (FS) has become an essential task in overcoming high dimensional and complex machine learning problems. FS is a process used for reducing the size of the dataset by separating or extracting unnecessary and unrelated properties from it. This process improves the performance of classification algorithms and reduces the evaluation time by enabling the use of small sized datasets with useful features during the classification process. FS aims to gain a minimal feature subset in a problem domain while retaining the accuracy of the original data. In this study, four computational intelligence techniques, namely, migrating birds optimization (MBO), simulated annealing (SA), differential evolution (DE) and particle swarm optimization (PSO) are implemented for the FS problem as search algorithms and compared on the 17 well-known datasets taken from UCI machine learning repository where the dimension of the tackled datasets vary from 4 to 500. This is the first time that MBO is applied for solving the FS problem. In order to judge the quality of the subsets generated by the search algorithms, two different subset evaluation methods are implemented in this study. These methods are probabilistic consistency-based FS (PCFS) and correlation-based FS (CFS). Performance comparison of the algorithms is done by using three well-known classifiers; k-nearest neighbor, naive bayes and decision tree (C4.5). As a benchmark, the accuracy values found by classifiers using the datasets with all features are used. Results of the experiments show that our MBO-based filter approach outperforms the other three approaches in terms of accuracy values. In the experiments, it is also observed that as a subset evaluator CFS outperforms PCFS and as a classifier C4.5 gets better results when compared to k-nearest neighbor and naive bayes.

Download Full-text

Using Fuzzy-Rough Subset Evaluation for Feature Selection and Naive Bayes to Classify the Parkinson’s Disease

10.21203/rs.3.rs-397301/v1 ◽

2021 ◽

Author(s):

Naiyer Mohammadi LANBARAN ◽

Ercan Çelik

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Naive Bayes ◽

Similarity Measures ◽

Naïve Bayes ◽

Training Data ◽

Fuzzy Rough Set ◽

Statistical Pattern ◽

Dataset Size ◽

Evaluation Algorithm

Abstract Feature selection is one of the issues in machine learning as well as statistical pattern recognition. This is important in many fields (such as classification) because there are many features in these areas, many of which are either unused or have little information load. Not eliminating these features does not make a problem in terms of information, but it does increase the computational burden for the intended application. Besides, it causes to store of so much useless information along with useful data. A problem for machine learning research occurs when there are many possible features with few attributes of training data. One way is to first specify the best attributes for prediction and then to classify features based on a measure of their dependence. In this study, the Fuzzy- Rough subset evaluation has been used to take features in core of similar features. Fuzzy-rough set-based feature selection (FS) has been demonstrated to be extremely advantageous at reducing dataset size but has various problems that yield it unproductive for big datasets. Fuzzy- Rough subset evaluation algorithm indicates that the techniques greatly decrease dimensionality while keeping classification accuracy. This paper considers classifying attributes by using fuzzy set similarity measures as well as the dependency degree as a relatedness measure. Here we use Artificial Neural Network, Naïve Bayes as classifiers, and the performance of these techniques are compared by accuracy, precision, recall, and F-measure metrics.

Download Full-text

Heart Disease Prediction Model Using Naïve Bayes Algorithm and Machine Learning Techniques

International Journal of Engineering & Technology ◽

10.14419/ijet.v10i1.31310 ◽

2021 ◽

Vol 10 (1) ◽

pp. 46

Author(s):

Maria Yousef ◽

Prof. Khaled Batiha

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Heart Disease ◽

Prediction Model ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Support Vector ◽

Disease Prediction ◽

Prediction Systems

These days, heart disease comes to be one of the major health problems which have affected the lives of people in the whole world. Moreover, death due to heart disease is increasing day by day. So the heart disease prediction systems play an important role in the prevention of heart problems. Where these prediction systems assist doctors in making the right decision to diagnose heart disease easily. The existing prediction systems suffering from the high dimensionality problem of selected features that increase the prediction time and decrease the performance accuracy of the prediction due to many redundant or irrelevant features. Therefore, this paper aims to provide a solution of the dimensionality problem by proposing a new mixed model for heart disease prediction based on (Naïve Bayes method, and machine learning classifiers).In this study, we proposed a new heart disease prediction model (NB-SKDR) based on the Naïve Bayes algorithm (NB) and several machine learning techniques including Support Vector Machine, K-Nearest Neighbors, Decision Tree, and Random Forest. This prediction model consists of three main phases which include: preprocessing, feature selection, and classification. The main objective of this proposed model is to improve the performance of the prediction system and finding the best subset of features. This proposed approach uses the Naïve Bayes technique based on the Bayes theorem to select the best subset of features for the next classification phase, also to handle the high dimensionality problem by avoiding unnecessary features and select only the important ones in an attempt to improve the efficiency and accuracy of classifiers. This method is able to reduce the number of features from 13 to 6 which are (age, gender, blood pressure, fasting blood sugar, cholesterol, exercise induce engine) by determining the dependency between a set of attributes. The dependent attributes are the attributes in which an attribute depends on the other attribute in deciding the value of the class attribute. The dependency between attributes is measured by the conditional probability, which can be easily computed by Bayes theorem. Moreover, in the classification phase, the proposed system uses different classification algorithms such as (DT Decision Tree, RF Random Forest, SVM Support Vector machine, KNN Nearest Neighbors) as a classifiers for predicting whether a patient has heart disease or not. The model is trained and evaluated using the Cleveland Heart Disease database, which contains 13 features and 303 samples.Different algorithms use different rules for producing different representations of knowledge. So, the selection of algorithms to build our model is based on their performance. In this work, we applied and compared several classification algorithms which are (DT, SVM, RF, and KNN) to identify the best-suited algorithm to achieve high accuracy in the prediction of heart disease. After combining the Naive Bayes method with each one of these previous classifiers the performance of these combines algorithms is evaluated by different performance metrics such as (Specificity, Sensitivity, and Accuracy). Where the experimental results show that out of these four classification models, the combination between the Naive Bayes feature selection approach and the SVM RBF classifier can predict heart disease with the highest accuracy of 98%. Finally, the proposed approach is compared with another two systems which developed based on two different approaches in the feature selection step. The first system, based on the Genetic Algorithm (GA) technique, and the second uses the Principal Component Analysis (PCA) technique. Consequently, the comparison proved that the Naive Bayes selection approach of the proposed system is better than the GA and PCA approach in terms of prediction accuracy.

Download Full-text

Analysis of Sentiment of Moving a National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1942 ◽

2020 ◽

Vol 4 (3) ◽

pp. 504-512

Author(s):

Faried Zamachsari ◽

Gabriel Vangeran Saragih ◽

Susafa'ati ◽

Windu Gata

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Feature Selection ◽

Public Opinion ◽

Naive Bayes ◽

Naïve Bayes ◽

Capital City ◽

Support Vector ◽

National Capital ◽

Bayes Algorithm

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.

Download Full-text