Feature Entropy Estimation (FEE) for Malicious IoT Traffic and Detection Using Machine Learning

Identification of anomaly and malicious traffic in the Internet of things (IoT) network is essential for IoT security. Tracking and blocking unwanted traffic flows in the IoT network is required to design a framework for the identification of attacks more accurately, quickly, and with less complexity. Many machine learning (ML) algorithms proved their efficiency to detect intrusion in IoT networks. But this ML algorithm suffers many misclassification problems due to inappropriate and irrelevant feature size. In this paper, an in-depth study is presented to address such issues. We have presented lightweight low-cost feature selection IoT intrusion detection techniques with low complexity and high accuracy due to their low computational time. A novel feature selection technique was proposed with the integration of rank-based chi-square, Pearson correlation, and score correlation to extract relevant features out of all available features from the dataset. Then, feature entropy estimation was applied to validate the relationship among all extracted features to identify malicious traffic in IoT networks. Finally, an extreme gradient ensemble boosting approach was used to classify the features in relevant attack types. The simulation is performed on three datasets, i.e., NSL-KDD, USNW-NB15, and CCIDS2017, and results are presented on different test sets. It was observed that on the NSL-KDD dataset, accuracy was approx. 97.48%. Similarly, the accuracy of USNW-NB15 and CCIDS2017 was approx. 99.96% and 99.93%, respectively. Along with that, state-of-the-art comparison is also presented with existing techniques.

Download Full-text

Opinion classification on a social network by a novel feature selection technique

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v20.i2.pp960-967 ◽

2020 ◽

Vol 20 (2) ◽

pp. 960

Author(s):

Atchara Choompol ◽

Panida Songram ◽

Phattahanaphong Chomphuwiset

Keyword(s):

Feature Selection ◽

Social Network ◽

Association Rules ◽

Information Gain ◽

Feature Selection Method ◽

Tuning Parameter ◽

Computational Time ◽

Chi Square ◽

Feature Selection Technique ◽

Filter Model

Most of the opinion comments on social networks are short and ambiguous. In general, opinion classification on the comments is difficult because of lacking dominant features. A feature extraction technique is therefore necessary for improving accuracy of the classification and computational time. This paper proposes an effective feature selection method for opinion classification on a social network. The proposed method selects features based on the concept of a filter model, together with association rules. Support and confidence are used to calculate the weights of features. The features with high weight are selected for classification. Unlike supports in association rules, supports in our method are normalized to 0-1 to remove outlier supports. Moreover, a tuning parameter is used to emphasize the degree of support or confidence. The experimental results show that the proposed method provides high classification efficiency. The proposed method outperforms Information Gain, Chi-Square, and Gini Index in both computational time and accuracy.

Download Full-text

FSDroid:- A feature selection technique to detect malware from Android using Machine Learning Techniques

Multimedia Tools and Applications ◽

10.1007/s11042-020-10367-w ◽

2021 ◽

Author(s):

Arvind Mahindru ◽

A.L. Sangal

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Machine Learning Techniques ◽

Feature Selection Technique ◽

Selection Technique ◽

Learning Techniques

Download Full-text

Diagnosis of COVID-19 Using CT image Radiomics Features: A Comprehensive Machine Learning Study Involving 26,307 Patients

10.1101/2021.12.07.21267367 ◽

2021 ◽

Author(s):

Isaac Shiri ◽

Yazdan Salimi ◽

Abdollah Saberi ◽

Masoumeh Pakbin ◽

Ghasem Hajianfar ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Strategies ◽

Lung Diseases ◽

Pearson Correlation ◽

Characteristic Curve ◽

Univariate Analysis ◽

Machine Learning Algorithms ◽

Recursive Feature Elimination ◽

Classifier Combination

AbstractPurposeTo derive and validate an effective radiomics-based model for differentiation of COVID-19 pneumonia from other lung diseases using a very large cohort of patients.MethodsWe collected 19 private and 5 public datasets, accumulating to 26,307 individual patient images (15,148 COVID-19; 9,657 with other lung diseases e.g. non-COVID-19 pneumonia, lung cancer, pulmonary embolism; 1502 normal cases). Images were automatically segmented using a validated deep learning (DL) model and the results carefully reviewed. Images were first cropped into lung-only region boxes, then resized to 296×216 voxels. Voxel dimensions was resized to 1×1×1mm3 followed by 64-bin discretization. The 108 extracted features included shape, first-order histogram and texture features. Univariate analysis was first performed using simple logistic regression. The thresholds were fixed in the training set and then evaluation performed on the test set. False discovery rate (FDR) correction was applied to the p-values. Z-Score normalization was applied to all features. For multivariate analysis, features with high correlation (R2>0.99) were eliminated first using Pearson correlation. We tested 96 different machine learning strategies through cross-combining 4 feature selectors or 8 dimensionality reduction techniques with 8 classifiers. We trained and evaluated our models using 3 different datasets: 1) the entire dataset (26,307 patients: 15,148 COVID-19; 11,159 non-COVID-19); 2) excluding normal patients in non-COVID-19, and including only RT-PCR positive COVID-19 cases in the COVID-19 class (20,697 patients including 12,419 COVID-19, and 8,278 non-COVID-19)); 3) including only non-COVID-19 pneumonia patients and a random sample of COVID-19 patients (5,582 patients: 3,000 COVID-19, and 2,582 non-COVID-19) to provide balanced classes. Subsequently, each of these 3 datasets were randomly split into 70% and 30% for training and testing, respectively. All various steps, including feature preprocessing, feature selection, and classification, were performed separately in each dataset. Classification algorithms were optimized during training using grid search algorithms. The best models were chosen by a one-standard-deviation rule in 10-fold cross-validation and then were evaluated on the test sets.ResultsIn dataset #1, Relief feature selection and RF classifier combination resulted in the highest performance (Area under the receiver operating characteristic curve (AUC) = 0.99, sensitivity = 0.98, specificity = 0.94, accuracy = 0.96, positive predictive value (PPV) = 0.96, and negative predicted value (NPV) = 0.96). In dataset #2, Recursive Feature Elimination (RFE) feature selection and Random Forest (RF) classifier combination resulted in the highest performance (AUC = 0.99, sensitivity = 0.98, specificity = 0.95, accuracy = 0.97, PPV = 0.96, and NPV = 0.98). In dataset #3, the ANOVA feature selection and RF classifier combination resulted in the highest performance (AUC = 0.98, sensitivity = 0.96, specificity = 0.93, accuracy = 0.94, PPV = 0.93, NPV = 0.96).ConclusionRadiomic features extracted from entire lung combined with machine learning algorithms can enable very effective, routine diagnosis of COVID-19 pneumonia from CT images without the use of any other diagnostic test.

Download Full-text

Dominant Feature Selection and Machine Learning-Based Hybrid Approach to Analyze Android Ransomware

Security and Communication Networks ◽

10.1155/2021/7035233 ◽

2021 ◽

Vol 2021 ◽

pp. 1-22

Author(s):

Tanya Gera ◽

Jaiteg Singh ◽

Abolfazl Mehbodniya ◽

Julian L. Webber ◽

Mohammad Shabaz ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Algorithms ◽

Hybrid Approach ◽

Machine Learning Algorithms ◽

Dynamic Monitoring ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Detection Techniques ◽

Dominant Feature

Ransomware is a special malware designed to extort money in return for unlocking the device and personal data files. Smartphone users store their personal as well as official data on these devices. Ransomware attackers found it bewitching for their financial benefits. The financial losses due to ransomware attacks are increasing rapidly. Recent studies witness that out of 87% reported cyber-attacks, 41% are due to ransomware attacks. The inability of application-signature-based solutions to detect unknown malware has inspired many researchers to build automated classification models using machine learning algorithms. Advanced malware is capable of delaying malicious actions on sensing the emulated environment and hence posing a challenge to dynamic monitoring of applications also. Existing hybrid approaches utilize a variety of features combination for detection and analysis. The rapidly changing nature and distribution strategies are possible reasons behind the deteriorated performance of primitive ransomware detection techniques. The limitations of existing studies include ambiguity in selecting the features set. Increasing the feature set may lead to freedom of adept attackers against learning algorithms. In this work, we intend to propose a hybrid approach to identify and mitigate Android ransomware. This study employs a novel dominant feature selection algorithm to extract the dominant feature set. The experimental results show that our proposed model can differentiate between clean and ransomware with improved precision. Our proposed hybrid solution confirms an accuracy of 99.85% with zero false positives while considering 60 prominent features. Further, it also justifies the feature selection algorithm used. The comparison of the proposed method with the existing frameworks indicates its better performance.

Download Full-text

Implementasi teknik seleksi fitur pada klasifikasi malware Android menggunakan support vector machine (SVM)

Repositor ◽

10.22219/repositor.v1i1.1 ◽

2019 ◽

Vol 1 (1) ◽

pp. 1

Author(s):

Hendra Saputra ◽

Setio Basuki ◽

Mahar Faiqurahman

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Chi Square ◽

Android Malware ◽

Correlation Based Feature Selection ◽

Selection Of

AbstrakPertumbuhan Malware Android telah meningkat secara signifikan seiring dengan majunya jaman dan meninggkatnya keragaman teknik dalam pengembangan Android. Teknik Machine Learning adalah metode yang saat ini bisa kita gunakan dalam memodelkan pola fitur statis dan dinamis dari Malware Android. Dalam tingkat keakurasian dari klasifikasi jenis Malware peneliti menghubungkan antara fitur aplikasi dengan fitur yang dibutuhkan dari setiap jenis kategori Malware. Kategori jenis Malware yang digunakan merupakan jenis Malware yang banyak beredar saat ini. Untuk mengklasifikasi jenis Malware pada penelitian ini digunakan Support Vector Machine (SVM). Jenis SVM yang akan digunakan adalah class SVM one against one menggunakan Kernel RBF. Fitur yang akan dipakai dalam klasifikasi ini adalah Permission dan Broadcast Receiver. Untuk meningkatkan akurasi dari hasil klasifikasi pada penelitian ini digunakan metode Seleksi Fitur. Seleksi Fitur yang digunakan ialah Correlation-based Feature Selection (CSF), Gain Ratio (GR) dan Chi-Square (CHI). Hasil dari Seleksi Fitur akan di evaluasi bersama dengan hasil yang tidak menggunakan Seleksi Fitur. Akurasi klasifikasi Seleksi Fitur CFS menghasilkan akurasi sebesar 90.83% , GR dan CHI sebesar 91.25% dan data yang tidak menggunakan Seleksi Fitur sebesar 91.67%. Hasil dari pengujian menunjukan bahwa Permission dan Broadcast Receiver bisa digunakan dalam mengklasifikasi jenis Malware, akan tetapi metode Seleksi Fitur yang digunakan mempunyai akurasi yang berada sedikit dibawah data yang tidak menggunakan Seleksi Fitur. Kata kunci: klasifikasi malware android, seleksi fitur, SVM dan multi class SVM one agains one Abstract Android Malware has growth significantly along with the advance of the times and the increasing variety of technique in the development of Android. Machine Learning technique is a method that now we can use in the modeling the pattern of a static and dynamic feature of Android Malware. In the level of accuracy of the Malware type classification, the researcher connect between the application feature with the feature required by each types of Malware category. The category of malware used is a type of Malware that many circulating today, to classify the type of Malware in this study used Support Vector Machine (SVM). The SVM type wiil be used is class SVM one against one using the RBF Kernel. The feature will be used in this classification are the Permission and Broadcast Receiver. To improve the accuracy of the classification result in this study used Feature Selection method. Selection of feature used are Correlation-based Feature Selection (CFS), Gain Ratio (GR) and Chi-Square (CHI). Result from Feature Selection will be evaluated together with result that not use Feature Selection. Accuracy Classification Feature Selection CFS result accuracy of 90.83%, GR and CHI of 91.25% and data that not use Feature Selection of 91.67%. The result of testing indicate that permission and broadcast receiver can be used in classyfing type of Malware, but the Feature Selection method that used have accuracy is a little below the data that are not using Feature Selection. Keywords: Classification Android Malware, Feature Selection, SVM and Multi Class SVM one against one

Download Full-text

Developing Machine Learning Models to Predict Roadway Traffic Noise: An Opportunity to Escape Conventional Techniques

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198119838514 ◽

2019 ◽

Vol 2673 (4) ◽

pp. 158-172 ◽

Cited By ~ 1

Author(s):

Mohamad Ali Khalil ◽

Khaled Hamad ◽

Abdallah Shanableh

Keyword(s):

Machine Learning ◽

Feature Selection ◽

United Arab Emirates ◽

Traffic Noise ◽

Free Field ◽

Model Performance ◽

Noise Model ◽

Support Vector ◽

Feature Selection Technique ◽

Performance Results

Accurate prediction of roadway traffic noise remains challenging. Many researchers continue to improve the performance of their models by either adding more variables or improving their modeling algorithms. In this research, machine learning (ML) modeling techniques were developed to predict roadway traffic noise accurately. The ML techniques applied were: regression decision trees, support vector machine, ensembles, and artificial neural network. The parameters of each of these models were fine-tuned to achieve the best performance results. In addition, a state-of-the-art hybrid feature-selection technique has been employed to select a minimum set of input features (variables) while maintaining the accuracy of the developed models. By optimizing the number of features used in the model, the resources needed to develop and utilize a model to predict roadway noise would be less, hence decreasing the development cost. The proposed approach has been applied to develop a free-field roadway traffic noise model for Sharjah City in the United Arab Emirates. The best developed ML model was compared with a conventional regression model which was developed earlier under the same conditions. The cross-validated results clearly indicate that the best ML model outperformed the regression modeling. The performance of the ML model was also assessed after reducing the number of its input features based on the outcome of the feature-selection algorithm; the model performance was slightly affected. This result emphasizes the importance of considering only features that greatly influence the roadway traffic noise.

Download Full-text

Text Classification of Cornell Movie Data using Data Mining with Feature Selection

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b2329.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2950-2955

Keyword(s):

Feature Selection ◽

Text Mining ◽

Text Classification ◽

Feature Subset Selection ◽

Support Vector ◽

Svm Classifier ◽

Feature Subset ◽

Chi Square ◽

Feature Selection Technique ◽

Data Set

Text Classification is branch of text mining through which we can analyze the sentiment of the movie data. In this research paper we have applied different preprocessing techniques to reduce the features from cornell movie data set. We have also applied the Correlation-based feature subset selection and chi-square feature selection technique for gathering most valuable words of each category in text mining processes. The new cornell movie data set formed after applying the preprocessing steps and feature selection techniques. We have classified the cornell movie data as positive or negative using various classifiers like Support Vector Machine (SVM), Multilayer Perceptron (MLP), Naive Bayes (NB), Bays Net (BN) and Random Forest (RF) classifier. We have also compared the classification accuracy among classifiers and achieved better accuracy i. e. 87% in case of SVM classifier with reduced number of features. The suggested classifier can be useful in opinion of movie review, analysis of any blog and documents etc.

Download Full-text

Tumor Grade and Overall Survival Prediction of Gliomas Using Radiomics

Scientific Programming ◽

10.1155/2021/9913466 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Jianming Ye ◽

He Huang ◽

Weiwei Jiang ◽

Xiaomei Xu ◽

Chun Xie ◽

...

Keyword(s):

Machine Learning ◽

Overall Survival ◽

Feature Selection ◽

Model Performance ◽

Predictive Performance ◽

Tumor Grade ◽

Survival Prediction ◽

Lower Grade ◽

Feature Selection Technique ◽

Mri Scans

Glioma is one of the most common and deadly malignant brain tumors originating from glial cells. For personalized treatment, an accurate preoperative prognosis for glioma patients is highly desired. Recently, various machine learning-based approaches have been developed to predict the prognosis based on preoperative magnetic resonance imaging (MRI) radiomics, which extract quantitative features from radiographic images. However, major challenges remain for methodologic developments to optimize feature extraction and provide rapid information flow in clinical settings. This study investigates two machine learning-based prognosis prediction tasks using radiomic features extracted from preoperative multimodal MRI brain data: (i) prediction of tumor grade (higher-grade vs. lower-grade gliomas) from preoperative MRI scans and (ii) prediction of patient overall survival (OS) in higher-grade gliomas (<12 months vs. > 12 months) from preoperative MRI scans. Specifically, these two tasks utilize the conventional machine learning-based models built with various classifiers. Moreover, feature selection methods are applied to increase model performance and decrease computational costs. In the experiments, models are evaluated in terms of their predictive performance and stability using a bootstrap approach. Experimental results show that classifier choice and feature selection technique plays a significant role in model performance and stability for both tasks; a variability analysis indicates that classification method choice is the most dominant source of performance variation for both tasks.

Download Full-text

Sentiment Analysis on E-commerce Product using Machine Learning and Combination of TF-IDF and Backward Elimination

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f7889.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 2862-2867

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Feature Selection ◽

Sentiment Analysis ◽

Opinion Mining ◽

Classification Performance ◽

Support Vector ◽

Product Reviews ◽

Feature Selection Technique ◽

Backward Elimination

E-commerce is a website or mobile application platform that help people to buy products. Before purchasing the product, customer will decide to buy it or not by reading the review from previous buyer. There is a problem that there are a lot of review so it will take a long time for customer to read it all. This research will be using sentiment analysis method to classify the review data. Sentiment analysis or opinion mining is a machine learning approach to classify and analyse texts or documents about human’s sentiments, emotions, and opinions. In this research, sentiment analysis was used to classify product reviews from e-commerce websites into positive or negative classes. The results could be processed further and be used to summarize customers' opinions about a certain product without reading every single review. The goal of this research is to optimize classification performance by using feature selection technique. Terms Frequency-Inverse Document Frequency (TF-IDF) feature extraction, Backward Elimination feature selection, and five different classifiers (Naïve Bayes, Support Vector Machine, K-Nearest Neighbour, Decision Tree, Random Forest) were used in analysing the sentiment of the reviews. In this research, the dataset used are Indonesian language and classified into two classes(positive and negative). The best accuracy is achieved by using TF-IDF, Backward Elimination and Support Vector Machine (SVM) with a score of 85.97%, which increases by 7.91% if compared to the process without feature selection. Based on the results, Backward Elimination feature selection succeeded in improving all performance for all classifiers used in this research.

Download Full-text

Android Malware Detection Using Machine Learning with Feature Selection Based on the Genetic Algorithm

Mathematics ◽

10.3390/math9212813 ◽

2021 ◽

Vol 9 (21) ◽

pp. 2813

Author(s):

Jaehyeong Lee ◽

Hyuk Jang ◽

Sungmin Ha ◽

Yourim Yoon

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Genetic Algorithms ◽

Feature Selection ◽

Malware Detection ◽

Feature Selection Method ◽

Machine Learning Algorithms ◽

Android Malware ◽

Detection Techniques ◽

Android Malware Detection

Since the discovery that machine learning can be used to effectively detect Android malware, many studies on machine learning-based malware detection techniques have been conducted. Several methods based on feature selection, particularly genetic algorithms, have been proposed to increase the performance and reduce costs. However, because they have yet to be compared with other methods and their many features have not been sufficiently verified, such methods have certain limitations. This study investigates whether genetic algorithm-based feature selection helps Android malware detection. We applied nine machine learning algorithms with genetic algorithm-based feature selection for 1104 static features through 5000 benign applications and 2500 malwares included in the Andro-AutoPsy dataset. Comparative experimental results show that the genetic algorithm performed better than the information gain-based method, which is generally used as a feature selection method. Moreover, machine learning using the proposed genetic algorithm-based feature selection has an absolute advantage in terms of time compared to machine learning without feature selection. The results indicate that incorporating genetic algorithms into Android malware detection is a valuable approach. Furthermore, to improve malware detection performance, it is useful to apply genetic algorithm-based feature selection to machine learning.

Download Full-text