Advances in Data Mining and Database Management - Biologically-Inspired Techniques for Knowledge Discovery and Data Mining
Latest Publications


TOTAL DOCUMENTS

15
(FIVE YEARS 0)

H-INDEX

3
(FIVE YEARS 0)

Published By IGI Global

9781466660786, 9781466660793

Author(s):  
Khaled Eskaf ◽  
Tim Ritchings ◽  
Osama Bedawy

Diabetes mellitus is one of the most common chronic diseases. The number of cases of diabetes in the world is likely to increase more than two fold in the next 30 years: from 115 million in 2000 to 284 million in 2030. This chapter is concerned with helping diabetic patients to manage themselves by developing a computer system that predicts their Blood Glucose Level (BGL) after 30 minutes on the basis of their current levels, so that they can administer insulin. This will enable the diabetic patient to continue living a normal daily life, as much as is possible. The prediction of BGLs based on the current levels BGLs become feasible through the advent of Continuous Glucose Monitoring (CGM) systems, which are able to sample patients' BGLs, typically 5 minutes, and computer systems that can process and analyse these samples. The approach taken in this chapter uses machine-learning techniques, specifically Genetic Algorithms (GA), to learn BGL patterns over an hour and the resulting value 30 minutes later, without questioning the patients about their food intake and activities. The GAs were invested using the raw BGLs as input and metadata derived from a Diabetic Dynamic Model of BGLs supplemented by the changes in patients' BGLs over the previous hour. The results obtained in a preliminary study including 4 virtual patients taken from the AIDA diabetes simulation software and 3 volunteers using the DexCom SEVEN system, show that the metadata approach gives more accurate predictions. Online learning, whereby new BGL patterns were incorporated into the prediction system as they were encountered, improved the results further.


Author(s):  
Dimitris Kalles ◽  
Alexis Kaporis ◽  
Vassiliki Mperoukli ◽  
Anthony Chatzinouskas

The authors in this chapter use simple local comparison and swap operators and demonstrate that their repeated application ends up in sorted sequences across a range of variants, most of which are also genetically evolved. They experimentally validate a square run-time behavior for emergent sorting, suggesting that not knowing in advance which direction to sort and allowing such direction to emerge imposes a n/logn penalty over conventional techniques. The authors validate the emergent sorting algorithms via genetically searching for the most favorable parameter configuration using a grid infrastructure.


Author(s):  
Miroslav Hudec ◽  
Miljan Vučetić ◽  
Mirko Vujošević

Data mining methods based on fuzzy logic have been developed recently and have become an increasingly important research area. In this chapter, the authors examine possibilities for discovering potentially useful knowledge from relational database by integrating fuzzy functional dependencies and linguistic summaries. Both methods use fuzzy logic tools for data analysis, acquiring, and representation of expert knowledge. Fuzzy functional dependencies could detect whether dependency between two examined attributes in the whole database exists. If dependency exists only between parts of examined attributes' domains, fuzzy functional dependencies cannot detect its characters. Linguistic summaries are a convenient method for revealing this kind of dependency. Using fuzzy functional dependencies and linguistic summaries in a complementary way could mine valuable information from relational databases. Mining intensities of dependencies between database attributes could support decision making, reduce the number of attributes in databases, and estimate missing values. The proposed approach is evaluated with case studies using real data from the official statistics. Strengths and weaknesses of the described methods are discussed. At the end of the chapter, topics for further research activities are outlined.


Author(s):  
Kesheng Wang ◽  
Zhenyou Zhang ◽  
Yi Wang

This chapter proposes a Self-Organizing Map (SOM) method for fault diagnosis and prognosis of manufacturing systems, machines, components, and processes. The aim of this work is to optimize the condition monitoring of the health of the system. With this method, manufacturing faults can be classified, and the degradations can be predicted very effectively and clearly. A good maintenance scheduling can then be created, and the number of corrective maintenance actions can be reduced. The results of the experiment show that the SOM method can be used to classify the fault and predict the degradation of machines, components, and processes effectively, clearly, and easily.


Author(s):  
Peng Cao ◽  
Osmar Zaiane ◽  
Dazhe Zhao

Class imbalance is one of the challenging problems for machine-learning in many real-world applications. Many methods have been proposed to address and attempt to solve the problem, including sampling and cost-sensitive learning. The latter has attracted significant attention in recent years to solve the problem, but it is difficult to determine the precise misclassification costs in practice. There are also other factors that influence the performance of the classification including the input feature subset and the intrinsic parameters of the classifier. This chapter presents an effective wrapper framework incorporating the evaluation measure (AUC and G-mean) into the objective function of cost sensitive learning directly to improve the performance of classification by simultaneously optimizing the best pair of feature subset, intrinsic parameters, and misclassification cost parameter. The optimization is based on Particle Swarm Optimization (PSO). The authors use two different common methods, support vector machine and feed forward neural networks, to evaluate the proposed framework. Experimental results on various standard benchmark datasets with different ratios of imbalance and a real-world problem show that the proposed method is effective in comparison with commonly used sampling techniques.


Author(s):  
Shafiq Alam ◽  
Gillian Dobbie ◽  
Yun Sing Koh ◽  
Saeed ur Rehman

Knowledge Discovery and Data (KDD) mining helps uncover hidden knowledge in huge amounts of data. However, recently, different researchers have questioned the capability of traditional KDD techniques to tackle the information extraction problem in an efficient way while achieving accurate results when the amount of data grows. One of the ways to overcome this problem is to treat data mining as an optimization problem. Recently, a huge increase in the use of Swarm Intelligence (SI)-based optimization techniques for KDD has been observed due to the flexibility, simplicity, and extendibility of these techniques to be used for different data mining tasks. In this chapter, the authors overview the use of Particle Swarm Optimization (PSO), one of the most cited SI-based techniques in three different application areas of KDD, data clustering, outlier detection, and recommender systems. The chapter shows that there is a tremendous potential in these techniques to revolutionize the process of extracting knowledge from big data using these techniques.


Author(s):  
Apurva Shah

Biologically inspired data mining techniques have been intensively used in different data mining applications. Ant Colony Optimization (ACO) has been applied for scheduling real-time distributed systems in the recent time. Real-time processing requires both parallel activities and fast response. It is required to complete the work and deliver services on a timely basis. In the presence of timing, a real-time system's performance does not always improve as processor and speed increases. ACO performs quite well for scheduling real-time distributed systems during overloaded conditions. Earliest Deadline First (EDF) is the optimal scheduling algorithm for single processor real-time systems during under-loaded conditions. This chapter proposes an adaptive algorithm that takes advantage of EDF- and ACO-based algorithms and overcomes their limitations.


Author(s):  
Tianxing Cai

Industrial and environmental research will always involve the study of the cause-effect relationship between the emissions and the surrounding environment. Qualitative and mixed methods researchers have employed a variety of Information and Communication Technology (ICT) tools, simulated or virtual environments, information systems, information devices, and data analysis tools in this field. Machine-enhanced analytics has enabled the identification of aspects of interest such as correlations and anomalies from large datasets. Chemical facilities have high risks to originate air emission events. Based on an available air-quality monitoring network, the data integration technologies are applied to identify the scenarios of the possible emission source and the dynamic pollutant monitor result, so as to timely and effectively support diagnostic and prognostic decisions. In this chapter, the application of artificial neural networks for such applications have been developed according to the real application purpose. It includes two stages of modeling and optimization work: 1) the determination of background normal emission rates from multiple emission sources and 2) single-objective or multi-objective optimization for impact scenario identification and quantification. They have the capability to identify the potential emission profile and spatial-temporal characterization of pollutant dispersion for a specific region, including reverse estimation of the air quality issues. The methodology provides valuable information for accidental investigations and root cause analysis for an emission event; meanwhile, it helps evaluate the regional air quality impact caused by such an emission event as well. Case studies are employed to demonstrate the efficacy of the developed methodology.


Author(s):  
Zuriani Mustaffa ◽  
Yuhanis Yusof ◽  
Siti Sakira Kamaruddin

As energy fuels play a significant role in many parts of human life, it is of great importance to have an effective price predictive analysis. In this chapter, the hybridization of Least Squares Support Vector Machines (LSSVM) with an enhanced Artificial Bee Colony (eABC) is proposed to meet the challenge. The eABC, which serves as an optimization tool for LSSVM, is enhanced by two types of mutations, namely the Levy mutation and the conventional mutation. The Levy mutation is introduced to keep the model from falling into local minimum while the conventional mutation prevents the model from over-fitting and/or under-fitting during learning. Later, the predictive analysis is followed by the LSSVM. Realized in predictive analysis of heating oil prices, the empirical findings not only manifest the superiority of eABC-LSSVM in prediction accuracy but also poses an advantage to escape from premature convergence.


Author(s):  
Fatai Anifowose ◽  
Jane Labadin ◽  
Abdulazeez Abdulraheem

Artificial Neural Networks (ANN) have been widely applied in petroleum reservoir characterization. Despite their wide use, they are very unstable in terms of performance. Ensemble machine learning is capable of improving the performance of such unstable techniques. One of the challenges of using ANN is choosing the appropriate number of hidden neurons. Previous studies have proposed ANN ensemble models with a maximum of 50 hidden neurons in the search space thereby leaving rooms for further improvement. This chapter presents extended versions of those studies with increased search spaces using a linear search and randomized assignment of the number of hidden neurons. Using standard model evaluation criteria and novel ensemble combination rules, the results of this study suggest that having a large number of “unbiased” randomized guesses of the number of hidden neurons beyond 50 performs better than very few occurrences of those that were optimally determined.


Sign in / Sign up

Export Citation Format

Share Document