Improving Knowledge Discovery through the Integration of Data Mining Techniques - Advances in Data Mining and Database Management
Latest Publications


TOTAL DOCUMENTS

16
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

Published By IGI Global

9781466685130, 9781466685147

Author(s):  
Rizwan Aqeel ◽  
Saif Ur Rehman ◽  
Saira Gillani ◽  
Sohail Asghar

This chapter focuses on an Autonomous Ground Vehicle (AGV), also known as intelligent vehicle, which is a vehicle that can navigate without human supervision. AGV navigation over an unstructured road is a challenging task and is known research problem. This chapter is to detect road area from an unstructured environment by applying a proposed classification model. The Proposed model is sub divided into three stages: (1) - preprocessing has been performed in the initial stage; (2) - road area clustering has been done in the second stage; (3) - Finally, road pixel classification has been achieved. Furthermore, combination of classification as well as clustering is used in achieving our goals. K-means clustering algorithm is used to discover biggest cluster from road scene, second big cluster area has been classified as road or non road by using the well-known technique support vector machine. The Proposed approach is validated from extensive experiments carried out on RGB dataset, which shows that the successful detection of road area and is robust against diverse road conditions such as unstructured nature, different weather and lightening variations.


Author(s):  
Noureen Zafar ◽  
Saif Ur Rehman ◽  
Saira Gillani ◽  
Sohail Asghar

In this article, segmentation of weeds and crops has been investigated by using supervised learning based on feed forward neural network. The images have been taken from the satellite imaginary for a specified region on the geographical space in Pakistan and perform edge detection by classical image processing scheme. The obtained samples are classified by data mining, based on artificial neural network model based on linear activation function at the input and output layer while threshold ramp function at hidden layer. A scenario based results are obtained at a huge samples of the weeds of the corn field and crop in the form of the mean square error based fitness evaluation function. The given scheme has the perks on the existed schemes as applicability of the designed framework, ease in implementation and less hardware needed for implementation.


Author(s):  
Mohsin Iqbal ◽  
Saif Ur Rehman ◽  
Saira Gillani ◽  
Sohail Asghar

The key objective of the chapter would be to study the classification accuracy, using feature selection with machine learning algorithms. The dimensionality of the data is reduced by implementing Feature selection and accuracy of the learning algorithm improved. We test how an integrated feature selection could affect the accuracy of three classifiers by performing feature selection methods. The filter effects show that Information Gain (IG), Gain Ratio (GR) and Relief-f, and wrapper effect show that Bagging and Naive Bayes (NB), enabled the classifiers to give the highest escalation in classification accuracy about the average while reducing the volume of unnecessary attributes. The achieved conclusions can advise the machine learning users, which classifier and feature selection methods to use to optimize the classification accuracy, and this can be important, especially at risk-sensitive applying Machine Learning whereas in the one of the aim to reduce costs of collecting, processing and storage of unnecessary data.


Author(s):  
Tasawar Hussain ◽  
Sohail Asghar

The web based applications are maturing and gaining the confidence of their users gradually, however, www still lacks the mechanism to stop the hackers. The implementing the adhesive security measures such as intrusion deduction systems and firewalls, are no more useful breaker for online frauds. The Web Backtracking Technique (WBT) is proposed for fraud detection in online financial applications by applying the hierarchical sessionization technique on the web log file. The web log Hierarchical Sessionization enhances the focused groups of users from web log and paves the path for in-depth visualization for knowledge discovery. User clicks are compared with user profiles for change in previous user click records. Those transactions which do not conform to business rules are stopped from business activities. The WBT analyzes suspicious behavior and will produce reports for security and risk mitigation purposes Furthermore, suspicious transactions are mined for the up-gradation of business rules from hierarchical sessionization. The proposed WBT is validated against the university web log data.


Author(s):  
Umair Abdullah ◽  
Aftab Ahmed ◽  
Sohail Asghar ◽  
Kashif Zafar

Most of the data mining projects generate information (summarized in the form of graphs and charts) for business executives and decision makers; however it leaves to the choice of decision makers either to use it or disregard it. The manual use of the extracted knowledge limits the effectiveness of data mining technology considerably. This chapter proposes an architecture, in which data mining module is utilized to provide continuous supply of knowledge to a rule based expert system. Proposed approach solves the knowledge acquisition problem of rule based systems and also enhances effective utilization of data mining techniques (i.e. by supplying extracted knowledge to rule based system for automated use). The chapter describes the details of a data mining driven rule based expert system applied in medical billing domain. Main modules of the system along with the final analysis of performance of the system have also been presented.


Author(s):  
Negar Sadat Soleimani Zakeri ◽  
Saeid Pashazadeh

Active faults are sources of earthquakes and one of them is north fault of Tabriz in the northwest of Iran. The activation of faults can harm humans' life and constructions. The analysis of the seismic data in active regions can be helpful in dealing with earthquake hazards and devising prevention strategies. In this chapter, structure of earthquake events along with application of various intelligent data mining algorithms for earthquake prediction are studied. Main focus is on categorizing the seismic data of local regions according to the events' location using clustering algorithms for classification and then using intelligent artificial neural network for cluster prediction. As a result, the target data were clustered to six groups and proposed model with 10 fold cross validation yielded accuracy of 98.3%. Also, as a case study, the tectonic stress on concentration zones of Tabriz fault has been identified and five features of the events were used. Finally, the most important points have been proposed for evaluation of the nonlinear model predictions as future directions.


Author(s):  
Waseem Ahmad ◽  
Ajit Narayanan

In recent years, several artificial immune system (AIS) approaches have been proposed for unsupervised learning. Generally, in these approaches antibodies (or B-cells) are considered as clusters and antigens are data samples or instances. Moreover, antigens are trapped through free-floating antibodies or immunoglobulins. In all these approaches, hypermutation plays an important role. Hypermutation is responsible for producing mutated copies of stimulated antibodies/B-cells to capture similar antigens with higher affinity (similarity) measure and responsible to create diverse pool of solutions. Humoral-Mediated Artificial Immune System (HAIS) is an example of such algorithms. However, there is currently little understanding about the effectiveness of hypermutation operator in AIS approaches. In this chapter, we investigate the role of the hypermutation operator as well as affinity threshold (AT) parameters in order to achieve efficient clustering solutions. We propose a three-step methodology to examine the importance of hypermutation and the AT parameters in AIS approaches to clustering using basic concepts of HAIS algorithm. Here, the role of hypermutation in under-fitting and over-fitting the data will be discussed in the context of measure of entropy.


Author(s):  
Simon Fong ◽  
Jinyan Li ◽  
Xueyuan Gong ◽  
Athanasios V. Vasilakos

Metaheuristics have lately gained popularity among researchers. Their underlying designs are inspired by biological entities and their behaviors, e.g. schools of fish, colonies of insects, and other land animals etc. They have been used successfully in optimization applications ranging from financial modeling, image processing, resource allocations, job scheduling to bioinformatics. In particular, metaheuristics have been proven in many combinatorial optimization problems. So that it is not necessary to attempt all possible candidate solutions to a problem via exhaustive enumeration and evaluation which is computationally intractable. The aim of this paper is to highlight some recent research related to metaheuristics and to discuss how they can enhance the efficacy of data mining algorithms. An upmost challenge in Data Mining is combinatorial optimization that, often lead to performance degradation and scalability issues. Two case studies are presented, where metaheuristics improve the accuracy of classification and clustering by avoiding local optima.


Author(s):  
Simon Fong ◽  
Dong Han ◽  
Athanasios V. Vasilakos

Multi-dimensional outlier detection (MOD) over data streams is one of the most significant data stream mining techniques. When multivariate data are streaming in high speed, outliers are to be detected efficiently and accurately. Conventional outlier detection method is based on observing the full dataset and its statistical distribution. The data is assumed stationary. However, this conventional method has an inherent limitation—it always assumes the availability of the entire dataset. In modern applications, especially those that operate in the real time environment, the data arrive in the form of live data feed; they are dynamic and ever evolving in terms of their statistical distribution and concepts. Outlier detection should no longer be done in batches, but in incremental manner. In this chapter, we investigate into this important concept of MOD. In particular, we evaluate the effectiveness of a collection of incremental learning algorithms which are the underlying pattern recognition mechanisms for MOD. Specifically, we combine incremental learning algorithms into three types of MOD - Global Analysis, Cumulative Analysis and Lightweight Analysis with Sliding Window. Different classification algorithms are put under test for performance comparison.


Author(s):  
Ali H. Gazala ◽  
Waseem Ahmad

Multi-Relational Data Mining or MRDM is a growing research area focuses on discovering hidden patterns and useful knowledge from relational databases. While the vast majority of data mining algorithms and techniques look for patterns in a flat single-table data representation, the sub-domain of MRDM looks for patterns that involve multiple tables (relations) from a relational database. This sub-domain has received an increased research attention during the last two decades due to the wide range of possible applications. As a result of that growing attention, many successful multi-relational data mining algorithms and techniques were presented. This chapter presents a comprehensive review about multi-relational data mining. It discusses the different approaches researchers have followed to explore the relational search space while highlighting some of the most significant challenges facing researchers working in this sub-domain. The chapter also describes number of MRDM systems that have been developed during the last few years and discusses some future research directions in this sub-domain.


Sign in / Sign up

Export Citation Format

Share Document