A Study of Algorithm Selection in Data Mining using Meta-Learning

With the explosion of available data mining algorithms, a method for helping user to select the most appropriate algorithm or combination of algorithms to solve a given problem and reducing users’ cognitive overload due to the overloaded data mining algorithms is becoming increasingly important. This chapter presents a meta-learning approach to support users automatically selecting most suitable algorithms during data mining model building process. The authors discuss the meta-learning method in detail and present some empirical results that show the improvement that can be achieved with the hybrid model by combining meta-learning method and Rough Set feature reduction. The redundant properties of the dataset can be found. Thus, the ranking process can be sped up and accuracy can be increased by using the reduct of the properties of the dataset. With the reduced searching space, users’ cognitive load is reduced.

Download Full-text

Meta-Learning

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch188 ◽

2011 ◽

pp. 1207-1215 ◽

Cited By ~ 2

Author(s):

Christophe Giraud-Carrier ◽

Pavel Brazdil ◽

Carlos Soares ◽

Ricardo Vilalta

Keyword(s):

Data Mining ◽

Credit Rating ◽

Learning Task ◽

End Users ◽

Algorithm Selection ◽

Local Optima ◽

Base Level ◽

Meta Learning ◽

Automatic Mechanism ◽

User Friendly

The application of Machine Learning (ML) and Data Mining (DM) tools to classification and regression tasks has become a standard, not only in research but also in administrative agencies, commerce and industry (e.g., finance, medicine, engineering). Unfortunately, due in part to the number of available techniques and the overall complexity of the process, users facing a new data mining task must generally either resort to trialand- error or consultation of experts. Clearly, neither solution is completely satisfactory for the non-expert end-users who wish to access the technology more directly and cost-effectively. What is needed is an informed search process to reduce the amount of experimentation with different techniques while avoiding the pitfalls of local optima that may result from low quality models. Informed search requires meta-knowledge, that is, knowledge about the performance of those techniques. Metalearning provides a robust, automatic mechanism for building such meta-knowledge. One of the underlying goals of meta-learning is to understand the interaction between the mechanism of learning and the concrete contexts in which that mechanism is applicable. Metalearning differs from base-level learning in the scope of adaptation. Whereas learning at the base-level focuses on accumulating experience on a specific learning task (e.g., credit rating, medical diagnosis, mine-rock discrimination, fraud detection, etc.), learning at the meta-level is concerned with accumulating experience on the performance of multiple applications of a learning system. The meta-knowledge induced by meta-learning provides the means to inform decisions about the precise conditions under which a given algorithm, or sequence of algorithms, is better than others for a given task. While Data Mining software packages (e.g., SAS Enterprise Miner, SPSS Clementine, Insightful Miner, PolyAnalyst, KnowledgeStudio, Weka, Yale, Xelopes) provide user-friendly access to rich collections of algorithms, they generally offer no real decision support to non-expert end-users. Similarly, tools with emphasis on advanced visualization help users understand the data (e.g., to select adequate transformations) and the models (e.g., to tweak parameters, compare results, and focus on specific parts of the model), but treat algorithm selection as a post-processing activity driven by the users rather than the system. Data mining practitioners need systems that guide them by producing explicit advice automatically. This chapter shows how meta-learning can be leveraged to provide such advice in the context of algorithm selection.

Download Full-text

On the predictive power of meta-features in OpenML

International Journal of Applied Mathematics and Computer Science ◽

10.1515/amcs-2017-0048 ◽

2017 ◽

Vol 27 (4) ◽

pp. 697-712 ◽

Cited By ~ 7

Author(s):

Besim Bilalli ◽

Alberto Abelló ◽

Tomàs Aluja-Banet

Keyword(s):

Data Mining ◽

Predictive Power ◽

Algorithm Selection ◽

Meta Data ◽

Data Mining Algorithms ◽

Learning Platform ◽

Latent Features ◽

Meta Learning ◽

Mining Model ◽

Mining Algorithms

AbstractThe demand for performing data analysis is steadily rising. As a consequence, people of different profiles (i.e., nonexperienced users) have started to analyze their data. However, this is challenging for them. A key step that poses difficulties and determines the success of the analysis is data mining (model/algorithm selection problem). Meta-learning is a technique used for assisting non-expert users in this step. The effectiveness of meta-learning is, however, largely dependent on the description/characterization of datasets (i.e., meta-features used for meta-learning). There is a need for improving the effectiveness of meta-learning by identifying and designing more predictive meta-features. In this work, we use a method from exploratory factor analysis to study the predictive power of different meta-features collected in OpenML, which is a collaborative machine learning platform that is designed to store and organize meta-data about datasets, data mining algorithms, models and their evaluations. We first use the method to extract latent features, which are abstract concepts that group together meta-features with common characteristics. Then, we study and visualize the relationship of the latent features with three different performance measures of four classification algorithms on hundreds of datasets available in OpenML, and we select the latent features with the highest predictive power. Finally, we use the selected latent features to perform meta-learning and we show that our method improves the meta-learning process. Furthermore, we design an easy to use application for retrieving different meta-data from OpenML as the biggest source of data in this domain.

Download Full-text

Meta-learning Based Optimization of Metabolic Pathway Data-Mining Inference System

Lecture Notes in Computer Science - Modern Approaches in Applied Intelligence ◽

10.1007/978-3-642-21827-9_19 ◽

2011 ◽

pp. 183-192 ◽

Cited By ~ 1

Author(s):

Tomás V. Arredondo ◽

Wladimir O. Ormazábal ◽

Diego C. Candel ◽

Werner Creixell

Keyword(s):

Data Mining ◽

Metabolic Pathway ◽

Inference System ◽

Pathway Data ◽

Meta Learning

Download Full-text

Data mining for simulation algorithm selection

Proceedings of the Second International ICST Conference on Simulation Tools and Techniques ◽

10.4108/icst.simutools2009.5659 ◽

2009 ◽

Cited By ~ 5

Author(s):

Roland Ewald ◽

Adelinde M. Uhrmacher ◽

Kaustav Saha

Keyword(s):

Data Mining ◽

Simulation Algorithm ◽

Algorithm Selection

Download Full-text

Pattern Based Feature Construction in Semantic Data Mining

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2014010102 ◽

2014 ◽

Vol 10 (1) ◽

pp. 27-65 ◽

Cited By ~ 11

Author(s):

Agnieszka Ławrynowicz ◽

Jędrzej Potoniec

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Semantic Features ◽

Semantic Data ◽

Data Mining Approach ◽

Meta Learning ◽

New Type ◽

Domain Ontologies ◽

Semantic Data Mining

The authors propose a new method for mining sets of patterns for classification, where patterns are represented as SPARQL queries over RDFS. The method contributes to so-called semantic data mining, a data mining approach where domain ontologies are used as background knowledge, and where the new challenge is to mine knowledge encoded in domain ontologies, rather than only purely empirical data. The authors have developed a tool that implements this approach. Using this the authors have conducted an experimental evaluation including comparison of our method to state-of-the-art approaches to classification of semantic data and an experimental study within emerging subfield of meta-learning called semantic meta-mining. The most important research contributions of the paper to the state-of-art are as follows. For pattern mining research or relational learning in general, the paper contributes a new algorithm for discovery of new type of patterns. For Semantic Web research, it theoretically and empirically illustrates how semantic, structured data can be used in traditional machine learning methods through a pattern-based approach for constructing semantic features.

Download Full-text