Ontology-Based Construction of Grid Data Mining Workflows

Author(s):  
Peter Brezany ◽  
Ivan Janciak ◽  
A Min Tjoa

This chapter introduces an ontology-based framework for automated construction of complex interactive data mining workflows as a means of improving productivity of Grid-enabled data exploration systems. The authors first characterize existing manual and automated workflow composition approaches and then present their solution called GridMiner Assistant (GMA), which addresses the whole life cycle of the knowledge discovery process. GMA is specified in the OWL language and is being developed around a novel data mining ontology, which is based on concepts of industry standards like the predictive model markup language, cross industry standard process for data mining, and Java data mining API. The ontology introduces basic data mining concepts like data mining elements, tasks, services, and so forth. In addition, conceptual and implementation architectures of the framework are presented and its application to an example taken from the medical domain is illustrated. The authors hope that the further research and development of this framework can lead to productivity improvements, which can have significant impact on many real-life spheres. For example, it can be a crucial factor in achievement of scientific discoveries, optimal treatment of patients, productive decision making, cutting costs, and so forth.

2008 ◽  
pp. 913-941 ◽  
Author(s):  
Peter Brezany ◽  
Ivan Janciak ◽  
A. Min Tjoa

This chapter introduces an ontology-based framework for automated construction of complex interactive data mining workflows as a means of improving productivity of Grid-enabled data exploration systems. The authors first characterize existing manual and automated workflow composition approaches and then present their solution called GridMiner Assistant (GMA), which addresses the whole life cycle of the knowledge discovery process. GMA is specified in the OWL language and is being developed around a novel data mining ontology, which is based on concepts of industry standards like the Predictive Model Markup Language, Cross Industry Standard Process for Data Mining and Java Data Mining API. The ontology introduces basic data mining concepts like data mining elements, tasks, services, etc. In addition, conceptual and implementation architectures of the framework are presented and its application to an example taken from the medical domain is illustrated. The authors hope that the further research and development of this framework can lead to productivity improvements, which can have significant impact on many real-life spheres. For example, it can be a crucial factor in achievement of scientific discoveries, optimal treatment of patients, productive decision making, cutting costs, etc.


2011 ◽  
pp. 182-210 ◽  
Author(s):  
Peter Brezany ◽  
Ivan Janciak ◽  
A Min Tjoa

This chapter introduces an ontology-based framework for automated construction of complex interactive data mining workflows as a means of improving productivity of Grid-enabled data exploration systems. The authors first characterize existing manual and automated workflow composition approaches and then present their solution called GridMiner Assistant (GMA), which addresses the whole life cycle of the knowledge discovery process. GMA is specified in the OWL language and is being developed around a novel data mining ontology, which is based on concepts of industry standards like the Predictive Model Markup Language, Cross Industry Standard Process for Data Mining and Java Data Mining API. The ontology introduces basic data mining concepts like data mining elements, tasks, services, etc. In addition, conceptual and implementation architectures of the framework are presented and its application to an example taken from the medical domain is illustrated. The authors hope that the further research and development of this framework can lead to productivity improvements, which can have significant impact on many real-life spheres. For example, it can be a crucial factor in achievement of scientific discoveries, optimal treatment of patients, productive decision making, cutting costs, etc.


2011 ◽  
Vol 7 (1) ◽  
pp. 24-45 ◽  
Author(s):  
Roberto Trasarti ◽  
Fosca Giannotti ◽  
Mirco Nanni ◽  
Dino Pedreschi ◽  
Chiara Renso

The technologies of mobile communications and ubiquitous computing pervade society. Wireless networks sense the movement of people and vehicles, generating large volumes of mobility data, such as mobile phone call records and GPS tracks. This data can produce useful knowledge, supporting sustainable mobility and intelligent transportation systems, provided that a suitable knowledge discovery process is enacted for mining this mobility data. In this paper, the authors examine a formal framework, and the associated implementation, for a data mining query language for mobility data, created as a result of a European-wide research project called GeoPKDD (Geographic Privacy-Aware Knowledge Discovery and Delivery). The authors discuss how the system provides comprehensive support for the Mobility Knowledge Discovery process and illustrate its analytical power in unveiling the complexity of urban mobility in a large metropolitan area, based on a massive real life GPS dataset.


Author(s):  
Roberto Trasarti ◽  
Fosca Giannotti ◽  
Mirco Nanni ◽  
Dino Pedreschi ◽  
Chiara Renso

The technologies of mobile communications and ubiquitous computing pervade society. Wireless networks sense the movement of people and vehicles, generating large volumes of mobility data, such as mobile phone call records and GPS tracks. This data can produce useful knowledge, supporting sustainable mobility and intelligent transportation systems, provided that a suitable knowledge discovery process is enacted for mining this mobility data. In this paper, the authors examine a formal framework, and the associated implementation, for a data mining query language for mobility data, created as a result of a European-wide research project called GeoPKDD (Geographic Privacy-Aware Knowledge Discovery and Delivery). The authors discuss how the system provides comprehensive support for the Mobility Knowledge Discovery process and illustrate its analytical power in unveiling the complexity of urban mobility in a large metropolitan area, based on a massive real life GPS dataset.


2020 ◽  
Vol 10 (1) ◽  
pp. 12
Author(s):  
Ekka Pujo Ariesanto Akhmad

<strong> </strong>Bagian pemasaran bank sudah menampung data dari nasabah atau pelanggan bank dengan cara memasarkan atau mensosialisasikan kartu kredit lewat telepon (telemarketing). Evaluasi telemarketing kartu kredit yang sudah dilakukan bank masih kurang membawa hasil dan berdaya guna. Salah satu cara yang tepat untuk evaluasi laporan telemarketing kartu kredit bank adalah menggunakan teknik data mining. Tujuan penggunaan data mining untuk mengetahui kecenderungan dan pola nasabah yang berpeluang untuk berlangganan kartu kredit yang ditawarkan bank. Metode penelitian menggunakan Cross Industry Standard Process for Data Mining (CRISP-DM) dengan Algoritma Genetika untuk Seleksi Fitur (GAFS) dan Naive Bayes (NB). Hasil penelitian menunjukkan jumlah atribut pada dataset telemarketing kartu kredit bank sejumlah 15 atribut terdiri dari 14 atribut biasa dan 1 atribut spesial. Dataset telemarketing bank mengandung data berdimensi tinggi, sehingga diterapkan metode GAFS. Setelah menerapkan metode GAFS diperoleh 7 atribut optimal terdiri dari 6 atribut biasa dan 1 atribut spesial. Enam atribut biasa meliputi pekerjaan, balance, rumah, pinjaman, durasi, poutcome. Sedangkan atribut spesial adalah target. Hasil penelitian menunjukkan algoritma NB mempunyai nilai akurasi <em>86,71</em>%. Algoritma GAFS dan NB meningkatkan nilai akurasi menjadi <em>90,27</em>% untuk prediksi nasabah bank yang mengambil kartu kredit.


Author(s):  
Krzysztof Jurczuk ◽  
Marcin Czajkowski ◽  
Marek Kretowski

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.


Mathematics ◽  
2021 ◽  
Vol 9 (14) ◽  
pp. 1679
Author(s):  
Jacopo Giacomelli ◽  
Luca Passalacqua

The CreditRisk+ model is one of the industry standards for the valuation of default risk in credit loans portfolios. The calibration of CreditRisk+ requires, inter alia, the specification of the parameters describing the structure of dependence among default events. This work addresses the calibration of these parameters. In particular, we study the dependence of the calibration procedure on the sampling period of the default rate time series, that might be different from the time horizon onto which the model is used for forecasting, as it is often the case in real life applications. The case of autocorrelated time series and the role of the statistical error as a function of the time series period are also discussed. The findings of the proposed calibration technique are illustrated with the support of an application to real data.


Sign in / Sign up

Export Citation Format

Share Document