Malicious web pages detection using feature selection techniques and machine learning

Wireless Sensor Networks (WSNs) continue to face two major challenges: energy and security. As a consequence, one of the WSN-related security tasks is to protect them from Denial of Service (DoS) and Distributed DoS (DDoS) attacks. Machine learning-based systems are the only viable option for these types of attacks, as traditional packet deep scan systems depend on open field inspection in transport layer security packets and the open field encryption trend. Moreover, network data traffic will become more complex due to increases in the amount of data transmitted between WSN nodes as a result of increasing usage in the future. Therefore, there is a need to use feature selection techniques with machine learning in order to determine which data in the DoS detection process are most important. This paper examined techniques for improving DoS anomalies detection along with power reservation in WSNs to balance them. A new clustering technique was introduced, called the CH_Rotations algorithm, to improve anomaly detection efficiency over a WSN’s lifetime. Furthermore, the use of feature selection techniques with machine learning algorithms in examining WSN node traffic and the effect of these techniques on the lifetime of WSNs was evaluated. The evaluation results showed that the Water Cycle (WC) feature selection displayed the best average performance accuracy of 2%, 5%, 3%, and 3% greater than Particle Swarm Optimization (PSO), Simulated Annealing (SA), Harmony Search (HS), and Genetic Algorithm (GA), respectively. Moreover, the WC with Decision Tree (DT) classifier showed 100% accuracy with only one feature. In addition, the CH_Rotations algorithm improved network lifetime by 30% compared to the standard LEACH protocol. Network lifetime using the WC + DT technique was reduced by 5% compared to other WC + DT-free scenarios.

Download Full-text

Automated Feature Selection and Classification for High-Dimensional Biomedical Data

10.21203/rs.3.rs-563410/v1 ◽

2021 ◽

Author(s):

Tammo P.A. Beishuizen ◽

Joaquin Vanschoren ◽

Peter A.J. Hilbers ◽

Dragan Bošnački

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Automated System ◽

Complex Data ◽

Biomedical Data ◽

Selection Methods ◽

Model Predictions ◽

Automated Machine Learning ◽

Feature Selection Techniques ◽

Best Fit

Abstract Background: Automated machine learning aims to automate the building of accurate predictive models, including the creation of complex data preprocessing pipelines. Although successful in many fields, they struggle to produce good results on biomedical datasets, especially given the high dimensionality of the data. Result: In this paper, we explore the automation of feature selection in these scenarios. We analyze which feature selection techniques are ideally included in an automated system, determine how to efficiently find the ones that best fit a given dataset, integrate this into an existing AutoML tool (TPOT), and evaluate it on four very different yet representative types of biomedical data: microarray, mass spectrometry, clinical and survey datasets. We focus on feature selection rather than latent feature generation since we often want to explain the model predictions in terms of the intrinsic features of the data. Conclusion: Our experiments show that for none of these datasets we need more than 200 features to accurately explain the output. Additional features did not increase the quality significantly. We also find that the automated machine learning results are significantly improved after adding additional feature selection methods and prior knowledge on how to select and tune them.

Download Full-text

Improve the Accuracy of Heart Disease Predictions Using Machine Learning and Feature Selection Techniques

Communications in Computer and Information Science - Machine Learning, Image Processing, Network Security and Data Sciences ◽

10.1007/978-981-15-6318-8_19 ◽

2020 ◽

pp. 214-228

Author(s):

Abdelmegeid Amin Ali ◽

Hassan Shaban Hassan ◽

Eman M. Anwar

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Heart Disease ◽

Feature Selection Techniques

Download Full-text

A Survey on Diagnosis and Analysis of Diabetic Retinopathy using Feature Selection

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset207132 ◽

2020 ◽

pp. 170-176

Author(s):

Amalu Michael ◽

Deepa S S

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Diabetic Retinopathy ◽

High Ratio ◽

Training Data ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Diabetic Eye Disease ◽

Selection Mechanisms ◽

Feature Selection Techniques

Diabetic retinopathy is one of the common forms of diabetic eye disease. DR occurs due to a high ratio of glucose in the blood, which causes alterations in the retinal vessels. Machine learning may be a broad multidisciplinary field that has its roots in statistics, algebra, data processing, and information analytics, etc. Machine learning is used to discover patterns from medical data and provide an efficient way to predict diseases.ML is an application of artificial intelligence it collects information from training data. There are several machine learning techniques are used for the diagnosis of diabetic retinopathy. This paper mainly focuses on the survey of such techniques and also various feature selection mechanisms. This study provides the basic categorization of feature selection techniques and discussing their use.

Download Full-text

Binary chemical reaction optimization based feature selection techniques for machine learning classification problems

Expert Systems with Applications ◽

10.1016/j.eswa.2020.114169 ◽

2020 ◽

pp. 114169

Author(s):

P.C. Srinivasa Rao ◽

A.J. Sravan Kumar ◽

Quamar Niyaz ◽

Paheding Sidike ◽

Vijay K Devabhaktuni

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Chemical Reaction ◽

Classification Problems ◽

Chemical Reaction Optimization ◽

Machine Learning Classification ◽

Reaction Optimization ◽

Feature Selection Techniques

Download Full-text

Phishing Detection Based on Machine Learning and Feature Selection Methods

International Journal of Interactive Mobile Technologies (iJIM) ◽

10.3991/ijim.v13i12.11411 ◽

2019 ◽

Vol 13 (12) ◽

pp. 171 ◽

Cited By ~ 1

Author(s):

Mohammad Almseidin ◽

AlMaha Abu Zuraiq ◽

Mouhammd Al-kasassbeh ◽

Nidal Alnidami

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Web Pages ◽

Selection Methods ◽

Random Forest Algorithm ◽

Phishing Detection ◽

Enormous Number

With increasing technology developments, the Internet has become everywhere and accessible by everyone. There are a considerable number of web-pages with different benefits. Despite this enormous number, not all of these sites are legitimate. There are so-called phishing sites that deceive users into serving their interests. This paper dealt with this problem using machine learning algorithms in addition to employing a novel dataset that related to phishing detection, which contains 5000 legitimate web-pages and 5000 phishing ones. In order to obtain the best results, various machine learning algorithms were tested. Then J48, Random forest, and Multilayer perceptron were chosen. Different feature selection tools were employed to the dataset in order to improve the efficiency of the models. The best result of the experiment achieved by utilizing 20 features out of 48 features and applying it to Random forest algorithm. The accuracy was 98.11%.

Download Full-text

Feature Selection for Web Page Classification

Web Technologies ◽

10.4018/978-1-60566-982-3.ch078 ◽

2011 ◽

pp. 1462-1477 ◽

Cited By ~ 1

Author(s):

K. Selvakuberan ◽

M. Indra Devi ◽

R. Rajaram

Keyword(s):

Feature Selection ◽

Financial Management ◽

Contextual Information ◽

Information Service ◽

Web Pages ◽

Web Page ◽

Customer Information ◽

Web Access ◽

Feature Selection Techniques ◽

The Web

The World Wide Web serves as a huge, widely distributed, global information service center for news, advertisements, customer information, financial management, education, government, e-commerce and many others. The Web contains a rich and dynamic collection of hyperlink information. The Web page access and usage information provide rich sources for data mining. Web pages are classified based on the content and/or contextual information embedded in them. As the Web pages contain many irrelevant, infrequent, and stop words that reduce the performance of the classifier, selecting relevant representative features from the Web page is the essential preprocessing step. This provides secured accessing of the required information. The Web access and usage information can be mined to predict the authentication of the user accessing the Web page. This information may be used to personalize the information needed for the users and to preserve the privacy of the users by hiding the personal details. The issue lies in selecting the features which represent the Web pages and processing the details of the user needed the details. In this article we focus on the feature selection, issues in feature selections, and the most important feature selection techniques described and used by researchers.

Download Full-text