Bio inspired Ensemble Feature Selection (BEFS) Model with Machine Learning and Data Mining Algorithms for Disease Risk Prediction

Phishing is a common attack on credulous people by making them disclose their unique information. It is a type of cyber-crime where false sites allure exploited people to give delicate data. This paper deals with methods for detecting phishing websites by analyzing various features of URLs by Machine learning techniques. This experimentation discusses the methods used for detection of phishing websites based on lexical features, host properties and page importance properties. We consider various data mining algorithms for evaluation of the features in order to get a better understanding of the structure of URLs that spread phishing. To protect end users from visiting these sites, we can try to identify the phishing URLs by analyzing their lexical and host-based features.A particular challenge in this domain is that criminals are constantly making new strategies to counter our defense measures. To succeed in this contest, we need Machine Learning algorithms that continually adapt to new examples and features of phishing URLs.

Download Full-text

A Survey of Feature Selection Techniques

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch289 ◽

2011 ◽

pp. 1888-1895 ◽

Cited By ~ 17

Author(s):

Barak Chizi ◽

Lior Rokach ◽

Oded Maimon

Keyword(s):

Data Mining ◽

Feature Selection ◽

Mining Method ◽

Data Set ◽

Data Mining Method ◽

Data Mining Algorithms ◽

Wrapper Approach ◽

Computationally Intensive ◽

Filter Approach ◽

Mining Algorithms

Dimensionality (i.e., the number of data set attributes or groups of attributes) constitutes a serious obstacle to the efficiency of most data mining algorithms (Maimon and Last, 2000). The main reason for this is that data mining algorithms are computationally intensive. This obstacle is sometimes known as the “curse of dimensionality” (Bellman, 1961). The objective of Feature Selection is to identify features in the data-set as important, and discard any other feature as irrelevant and redundant information. Since Feature Selection reduces the dimensionality of the data, data mining algorithms can be operated faster and more effectively by using Feature Selection. In some cases, as a result of feature selection, the performance of the data mining method can be improved. The reason for that is mainly a more compact, easily interpreted representation of the target concept. The filter approach (Kohavi , 1995; Kohavi and John ,1996) operates independently of the data mining method employed subsequently -- undesirable features are filtered out of the data before learning begins. These algorithms use heuristics based on general characteristics of the data to evaluate the merit of feature subsets. A sub-category of filter methods that will be refer to as rankers, are methods that employ some criterion to score each feature and provide a ranking. From this ordering, several feature subsets can be chosen by manually setting There are three main approaches for feature selection: wrapper, filter and embedded. The wrapper approach (Kohavi, 1995; Kohavi and John,1996), uses an inducer as a black box along with a statistical re-sampling technique such as cross-validation to select the best feature subset according to some predictive measure. The embedded approach (see for instance Guyon and Elisseeff, 2003) is similar to the wrapper approach in the sense that the features are specifically selected for a certain inducer, but it selects the features in the process of learning.

Download Full-text

Benchmarking Data Mining Algorithms

Data Warehousing and Web Engineering ◽

10.4018/978-1-931777-02-5.ch003 ◽

2011 ◽

pp. 77-99

Author(s):

Balaji Rajagopalan ◽

Ravi Krovi

Keyword(s):

Machine Learning ◽

Data Mining ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Successful Implementation ◽

Basic Premise ◽

Data Mining Algorithms ◽

External Data ◽

Mining Algorithms ◽

Careful Assessment

Data mining is the process of sifting through the mass of organizational (internal and external) data to identify patterns critical for decision support. Successful implementation of the data mining effort requires a careful assessment of the various tools and algorithms available. The basic premise of this study is that machine-learning algorithms, which are assumption free, should outperform their traditional counterparts when mining business databases. The objective of this study is to test this proposition by investigating the performance of the algorithms for several scenarios. The scenarios are based on simulations designed to reflect the extent to which typical statistical assumptions are violated in the business domain. The results of the computational experiments support the proposition that machine learning algorithms generally outperform their statistical counterparts under certain conditions. These can be used as prescriptive guidelines for the applicability of data mining techniques.

Download Full-text

Phishing websites blacklisting using machine learning algorithms

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.7.10646 ◽

2018 ◽

Vol 7 (1.7) ◽

pp. 179

Author(s):

Nivedhitha G ◽

Carmel Mary Belinda M.J ◽

Rupavathy N

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Extraction ◽

Learning Algorithms ◽

Source Code ◽

Machine Learning Algorithms ◽

Paper Machine ◽

Data Mining Algorithms ◽

Mining Algorithms ◽

The Web

The development of the phishing sites is by all accounts amazing. Despite the fact that the web clients know about these sorts of phishing assaults, part of clients move toward becoming casualty to these assaults. Quantities of assaults are propelled with the point of making web clients trust that they are speaking with a trusted entity. Phishing is one among them. Phishing is consistently developing since it is anything but difficult to duplicate a whole site utilizing the HTML source code. By rolling out slight improvements in the source code, it is conceivable to guide the victim to the phishing site. Phishers utilize part of strategies to draw the unsuspected web client. Consequently an efficient mechanism is required to recognize the phishing sites from the real sites keeping in mind the end goal to spare credential data. To detect the phishing websites and to identify it as information leaking sites, the system proposes data mining algorithms. In this paper, machine-learning algorithms have been utilized for modeling the prediction task. The process of identity extraction and feature extraction are discussed in this paper and the various experiments carried out to discover the performance of the models are demonstrated.

Download Full-text

Anemia Selection in Pregnant Women by using Random prediction (Rp) Classification Algorithm

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3016.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 2623-2630 ◽

Cited By ~ 1

Keyword(s):

Data Mining ◽

Feature Selection ◽

Classification Algorithm ◽

Computational Time ◽

The Novel ◽

Data Mining Algorithms ◽

Novel Method ◽

Mining Algorithms ◽

Median Vector ◽

Selection Of

Anemia is the global hematological disorder that occurs in pregnancy. The feature selection of unknown logical knowledge from the large dataset is capable with data mining techniques. The paper evaluates anemia features classes of Non-anemic, Mild and Severe or moderate in real time large-dimensional dataset. In the previous works, Anemia diseases can be classified in a selection of approaches, based on the Artificial Neural Networks (ANN), Gausnominal Classification and VectNeighbour classification. In these previous studies attains the proper feature selection with classification accuracy but it takes large time to predict the feature selection. So the current paper to overcome the feature selection, computational time process presents an improved Median vector feature selection (IMVFS) algorithm and new RandomPrediction (RP) classification algorithm to predict the anemia disease classes (Mild, Not anemic and Severe and moderate) based on the data mining algorithms. The results have shown that the performance of the novel method is effective compared with our previous Classification of ANN, Gausnominal and VectNeighbour classification algorithms. As the Experimental results show that proposed RandomPrediction (RP) classification with (IMVFS) feature selection methods clearly outperform than our previous methods

Download Full-text