Data Mining Using $\mathcal{MLC}++$ a Machine Learning Library in C++

1997 ◽  
Vol 06 (04) ◽  
pp. 537-566 ◽  
Author(s):  
Ron Kohavi ◽  
Dan Sommerfield ◽  
James Dougherty

Data mining algorithms including maching learning, statistical analysis, and pattern recognition techniques can greatly improve our understanding of data warehouses that are now becoming more widespread. In this paper, we focus on classification algorithms and review the need for multiple classification algorithms. We describe a system called [Formula: see text], which was designed to help choose the appropriate classification algorithm for a given dataset by making it easy to compare the utility of different algorithms on a specific dataset of interest. [Formula: see text] not only provides a workbench for such comparisons, but also provides a library of C++ classes to aid in the development of new algorithms, especially hybrid algorithms and multi-strategy algorithms. Such algorithms are generally hard to code from scratch. We discuss design issues, interfaces to other programs, and visualization of the resulting classifiers.

2021 ◽  
Vol 297 ◽  
pp. 01032
Author(s):  
Harish Kumar ◽  
Anshal Prasad ◽  
Ninad Rane ◽  
Nilay Tamane ◽  
Anjali Yeole

Phishing is a common attack on credulous people by making them disclose their unique information. It is a type of cyber-crime where false sites allure exploited people to give delicate data. This paper deals with methods for detecting phishing websites by analyzing various features of URLs by Machine learning techniques. This experimentation discusses the methods used for detection of phishing websites based on lexical features, host properties and page importance properties. We consider various data mining algorithms for evaluation of the features in order to get a better understanding of the structure of URLs that spread phishing. To protect end users from visiting these sites, we can try to identify the phishing URLs by analyzing their lexical and host-based features.A particular challenge in this domain is that criminals are constantly making new strategies to counter our defense measures. To succeed in this contest, we need Machine Learning algorithms that continually adapt to new examples and features of phishing URLs.


Author(s):  
Balaji Rajagopalan ◽  
Ravi Krovi

Data mining is the process of sifting through the mass of organizational (internal and external) data to identify patterns critical for decision support. Successful implementation of the data mining effort requires a careful assessment of the various tools and algorithms available. The basic premise of this study is that machine-learning algorithms, which are assumption free, should outperform their traditional counterparts when mining business databases. The objective of this study is to test this proposition by investigating the performance of the algorithms for several scenarios. The scenarios are based on simulations designed to reflect the extent to which typical statistical assumptions are violated in the business domain. The results of the computational experiments support the proposition that machine learning algorithms generally outperform their statistical counterparts under certain conditions. These can be used as prescriptive guidelines for the applicability of data mining techniques.


2018 ◽  
Vol 7 (1.7) ◽  
pp. 179
Author(s):  
Nivedhitha G ◽  
Carmel Mary Belinda M.J ◽  
Rupavathy N

The development of the phishing sites is by all accounts amazing. Despite the fact that the web clients know about these sorts of phishing assaults, part of clients move toward becoming casualty to these assaults. Quantities of assaults are propelled with the point of making web clients trust that they are speaking with a trusted entity. Phishing is one among them. Phishing is consistently developing since it is anything but difficult to duplicate a whole site utilizing the HTML source code. By rolling out slight improvements in the source code, it is conceivable to guide the victim to the phishing site. Phishers utilize part of strategies to draw the unsuspected web client. Consequently an efficient mechanism is required to recognize the phishing sites from the real sites keeping in mind the end goal to spare credential data. To detect the phishing websites and to identify it as information leaking sites, the system proposes data mining algorithms. In this paper, machine-learning algorithms have been utilized for modeling the prediction task. The process of identity extraction and feature extraction are discussed in this paper and the various experiments carried out to discover the performance of the models are demonstrated.


2018 ◽  
Vol 17 (04) ◽  
pp. 1850043
Author(s):  
Faisal Aburub ◽  
Wa’el Hadi

In this paper, we study the problem of predicting new locations of groundwater in Jordan through the application of a proposed new method, Groundwater Prediction using Associative Classification (GwPAC). We identify features that differentiate locations of groundwater wells according to whether or not they contain water. In addition, we survey intelligent-based methods related to groundwater exploration and management. Three experimental analyses were conducted with the objective to evaluate the capability of data mining algorithms using real groundwater data from the Ministry of Water and Irrigation. In the first experiment, we investigated the performance of GwPAC against three well-known associative classification algorithms, namely CBA, CMAR and FACA. Furthermore, three rule-based algorithms — C4.5, Random Forest and PBC4cip — were investigated in the second experiment; further, so as to generalise the capability of using data mining for solving the groundwater detection problem, four benchmark algorithms — SVMs, NB, KNN and ANNs — were evaluated in the third experiment. From all the experiments, the results indicated that all considered data mining algorithms predict locations of groundwater with acceptable classification rate (all classification accuracies [Formula: see text]%), and can be useful methods when seeking to address the problem of exploring new groundwater locations.


Author(s):  
kamel Ahsene Djaballah ◽  
Kamel Boukhalfa ◽  
Omar Boussaid ◽  
Yassine Ramdane

Social networks are used by terrorist groups and people who support them to propagate their ideas, ideologies, or doctrines and share their views on terrorism. To analyze tweets related to terrorism, several studies have been proposed in the literature. Some works rely on data mining algorithms; others use lexicon-based or machine learning sentiment analysis. Some recent works adopt other methods that combine multi-techniques. This paper proposes an improved approach for sentiment analysis of radical content related to terrorist activity on Twitter. Unlike other solutions, the proposed approach focuses on using a dictionary of weighted terms, the Word2vec method, and trigrams, with a classification based on fuzzy logic. The authors have conducted experiments with 600 manually annotated tweets and 200,000 automatically collected tweets in English and Arabic to evaluate this approach. The experimental results revealed that the new technique provides between 75% to 78% of precision for radicality detection and 61% to 64% to detect radicality degrees.


2011 ◽  
pp. 2096-2108
Author(s):  
Amandeep S. Sidhu ◽  
Paul J. Kennedy ◽  
Simeon Simoff ◽  
Tharam S. Dillon ◽  
Elizabeth Chang

In some real-world areas, it is important to enrich the data with external background knowledge so as to provide context and to facilitate pattern recognition. These areas may be described as data rich but knowledge poor. There are two challenges to incorporate this biological knowledge into the data mining cycle: (1) generating the ontologies; and (2) adapting the data mining algorithms to make use of the ontologies. This chapter presents the state-of-the-art in bringing the background ontology knowledge into the pattern recognition task for biomedical data.


Sign in / Sign up

Export Citation Format

Share Document