scholarly journals A Study of Algorithm Selection in Data Mining using Meta-Learning

2017 ◽  
Vol 10 (2) ◽  
pp. 51-64 ◽  
Author(s):  
Murchhana Tripathy ◽  
◽  
Anita Panda ◽  
Author(s):  
Lisa Fan ◽  
Minxiao Lei

With the explosion of available data mining algorithms, a method for helping user to select the most appropriate algorithm or combination of algorithms to solve a given problem and reducing users’ cognitive overload due to the overloaded data mining algorithms is becoming increasingly important. This chapter presents a meta-learning approach to support users automatically selecting most suitable algorithms during data mining model building process. The authors discuss the meta-learning method in detail and present some empirical results that show the improvement that can be achieved with the hybrid model by combining meta-learning method and Rough Set feature reduction. The redundant properties of the dataset can be found. Thus, the ranking process can be sped up and accuracy can be increased by using the reduct of the properties of the dataset. With the reduced searching space, users’ cognitive load is reduced.


Author(s):  
Christophe Giraud-Carrier ◽  
Pavel Brazdil ◽  
Carlos Soares ◽  
Ricardo Vilalta

The application of Machine Learning (ML) and Data Mining (DM) tools to classification and regression tasks has become a standard, not only in research but also in administrative agencies, commerce and industry (e.g., finance, medicine, engineering). Unfortunately, due in part to the number of available techniques and the overall complexity of the process, users facing a new data mining task must generally either resort to trialand- error or consultation of experts. Clearly, neither solution is completely satisfactory for the non-expert end-users who wish to access the technology more directly and cost-effectively. What is needed is an informed search process to reduce the amount of experimentation with different techniques while avoiding the pitfalls of local optima that may result from low quality models. Informed search requires meta-knowledge, that is, knowledge about the performance of those techniques. Metalearning provides a robust, automatic mechanism for building such meta-knowledge. One of the underlying goals of meta-learning is to understand the interaction between the mechanism of learning and the concrete contexts in which that mechanism is applicable. Metalearning differs from base-level learning in the scope of adaptation. Whereas learning at the base-level focuses on accumulating experience on a specific learning task (e.g., credit rating, medical diagnosis, mine-rock discrimination, fraud detection, etc.), learning at the meta-level is concerned with accumulating experience on the performance of multiple applications of a learning system. The meta-knowledge induced by meta-learning provides the means to inform decisions about the precise conditions under which a given algorithm, or sequence of algorithms, is better than others for a given task. While Data Mining software packages (e.g., SAS Enterprise Miner, SPSS Clementine, Insightful Miner, PolyAnalyst, KnowledgeStudio, Weka, Yale, Xelopes) provide user-friendly access to rich collections of algorithms, they generally offer no real decision support to non-expert end-users. Similarly, tools with emphasis on advanced visualization help users understand the data (e.g., to select adequate transformations) and the models (e.g., to tweak parameters, compare results, and focus on specific parts of the model), but treat algorithm selection as a post-processing activity driven by the users rather than the system. Data mining practitioners need systems that guide them by producing explicit advice automatically. This chapter shows how meta-learning can be leveraged to provide such advice in the context of algorithm selection.


2017 ◽  
Vol 27 (4) ◽  
pp. 697-712 ◽  
Author(s):  
Besim Bilalli ◽  
Alberto Abelló ◽  
Tomàs Aluja-Banet

AbstractThe demand for performing data analysis is steadily rising. As a consequence, people of different profiles (i.e., nonexperienced users) have started to analyze their data. However, this is challenging for them. A key step that poses difficulties and determines the success of the analysis is data mining (model/algorithm selection problem). Meta-learning is a technique used for assisting non-expert users in this step. The effectiveness of meta-learning is, however, largely dependent on the description/characterization of datasets (i.e., meta-features used for meta-learning). There is a need for improving the effectiveness of meta-learning by identifying and designing more predictive meta-features. In this work, we use a method from exploratory factor analysis to study the predictive power of different meta-features collected in OpenML, which is a collaborative machine learning platform that is designed to store and organize meta-data about datasets, data mining algorithms, models and their evaluations. We first use the method to extract latent features, which are abstract concepts that group together meta-features with common characteristics. Then, we study and visualize the relationship of the latent features with three different performance measures of four classification algorithms on hundreds of datasets available in OpenML, and we select the latent features with the highest predictive power. Finally, we use the selected latent features to perform meta-learning and we show that our method improves the meta-learning process. Furthermore, we design an easy to use application for retrieving different meta-data from OpenML as the biggest source of data in this domain.


Author(s):  
Agnieszka Ławrynowicz ◽  
Jędrzej Potoniec

The authors propose a new method for mining sets of patterns for classification, where patterns are represented as SPARQL queries over RDFS. The method contributes to so-called semantic data mining, a data mining approach where domain ontologies are used as background knowledge, and where the new challenge is to mine knowledge encoded in domain ontologies, rather than only purely empirical data. The authors have developed a tool that implements this approach. Using this the authors have conducted an experimental evaluation including comparison of our method to state-of-the-art approaches to classification of semantic data and an experimental study within emerging subfield of meta-learning called semantic meta-mining. The most important research contributions of the paper to the state-of-art are as follows. For pattern mining research or relational learning in general, the paper contributes a new algorithm for discovery of new type of patterns. For Semantic Web research, it theoretically and empirically illustrates how semantic, structured data can be used in traditional machine learning methods through a pattern-based approach for constructing semantic features.


Author(s):  
Andre Luis Debiaso Rossi ◽  
Andre C.P.L.F. Carvalho ◽  
Carlos Soares

10.5772/36710 ◽  
2012 ◽  
Author(s):  
Laura Cruz-Reyes ◽  
Claudia Gmez-Santilln ◽  
Joaqun Prez-Ortega ◽  
Vanesa Landero ◽  
Marcela Quiroz ◽  
...  

2010 ◽  
Vol 2 (5) ◽  
pp. 215-230 ◽  
Author(s):  
Moez Ben Haj Hmida ◽  
Yahya Slimani
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document