Ontology-Based Construction of Grid Data Mining Workflows

This chapter introduces an ontology-based framework for automated construction of complex interactive data mining workflows as a means of improving productivity of Grid-enabled data exploration systems. The authors first characterize existing manual and automated workflow composition approaches and then present their solution called GridMiner Assistant (GMA), which addresses the whole life cycle of the knowledge discovery process. GMA is specified in the OWL language and is being developed around a novel data mining ontology, which is based on concepts of industry standards like the predictive model markup language, cross industry standard process for data mining, and Java data mining API. The ontology introduces basic data mining concepts like data mining elements, tasks, services, and so forth. In addition, conceptual and implementation architectures of the framework are presented and its application to an example taken from the medical domain is illustrated. The authors hope that the further research and development of this framework can lead to productivity improvements, which can have significant impact on many real-life spheres. For example, it can be a crucial factor in achievement of scientific discoveries, optimal treatment of patients, productive decision making, cutting costs, and so forth.

Download Full-text

Ontology-Based Construction of Grid Data Mining Workflows

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch054 ◽

2008 ◽

pp. 913-941 ◽

Cited By ~ 1

Author(s):

Peter Brezany ◽

Ivan Janciak ◽

A. Min Tjoa

Keyword(s):

Data Mining ◽

Real Life ◽

Data Exploration ◽

Discovery Process ◽

Grid Data ◽

Industry Standard ◽

Industry Standards ◽

Interactive Data ◽

Workflow Composition ◽

Whole Life Cycle

This chapter introduces an ontology-based framework for automated construction of complex interactive data mining workflows as a means of improving productivity of Grid-enabled data exploration systems. The authors first characterize existing manual and automated workflow composition approaches and then present their solution called GridMiner Assistant (GMA), which addresses the whole life cycle of the knowledge discovery process. GMA is specified in the OWL language and is being developed around a novel data mining ontology, which is based on concepts of industry standards like the Predictive Model Markup Language, Cross Industry Standard Process for Data Mining and Java Data Mining API. The ontology introduces basic data mining concepts like data mining elements, tasks, services, etc. In addition, conceptual and implementation architectures of the framework are presented and its application to an example taken from the medical domain is illustrated. The authors hope that the further research and development of this framework can lead to productivity improvements, which can have significant impact on many real-life spheres. For example, it can be a crucial factor in achievement of scientific discoveries, optimal treatment of patients, productive decision making, cutting costs, etc.

Download Full-text

Ontology-Based Construction of Grid Data Mining Workflows

Data Mining with Ontologies ◽

10.4018/978-1-59904-618-1.ch010 ◽

2011 ◽

pp. 182-210 ◽

Cited By ~ 9

Author(s):

Peter Brezany ◽

Ivan Janciak ◽

A Min Tjoa

Keyword(s):

Data Mining ◽

Real Life ◽

Data Exploration ◽

Discovery Process ◽

Grid Data ◽

Industry Standard ◽

Industry Standards ◽

Interactive Data ◽

Workflow Composition ◽

Whole Life Cycle

This chapter introduces an ontology-based framework for automated construction of complex interactive data mining workflows as a means of improving productivity of Grid-enabled data exploration systems. The authors first characterize existing manual and automated workflow composition approaches and then present their solution called GridMiner Assistant (GMA), which addresses the whole life cycle of the knowledge discovery process. GMA is specified in the OWL language and is being developed around a novel data mining ontology, which is based on concepts of industry standards like the Predictive Model Markup Language, Cross Industry Standard Process for Data Mining and Java Data Mining API. The ontology introduces basic data mining concepts like data mining elements, tasks, services, etc. In addition, conceptual and implementation architectures of the framework are presented and its application to an example taken from the medical domain is illustrated. The authors hope that the further research and development of this framework can lead to productivity improvements, which can have significant impact on many real-life spheres. For example, it can be a crucial factor in achievement of scientific discoveries, optimal treatment of patients, productive decision making, cutting costs, etc.

Download Full-text

A Query Language for Mobility Data Mining

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2011010102 ◽

2011 ◽

Vol 7 (1) ◽

pp. 24-45 ◽

Cited By ~ 15

Author(s):

Roberto Trasarti ◽

Fosca Giannotti ◽

Mirco Nanni ◽

Dino Pedreschi ◽

Chiara Renso

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Mobile Communications ◽

Query Language ◽

Real Life ◽

Transportation Systems ◽

Phone Call ◽

Discovery Process ◽

Mobility Data ◽

Formal Framework

The technologies of mobile communications and ubiquitous computing pervade society. Wireless networks sense the movement of people and vehicles, generating large volumes of mobility data, such as mobile phone call records and GPS tracks. This data can produce useful knowledge, supporting sustainable mobility and intelligent transportation systems, provided that a suitable knowledge discovery process is enacted for mining this mobility data. In this paper, the authors examine a formal framework, and the associated implementation, for a data mining query language for mobility data, created as a result of a European-wide research project called GeoPKDD (Geographic Privacy-Aware Knowledge Discovery and Delivery). The authors discuss how the system provides comprehensive support for the Mobility Knowledge Discovery process and illustrate its analytical power in unveiling the complexity of urban mobility in a large metropolitan area, based on a massive real life GPS dataset.

Download Full-text

Context-Aware Automated Workflow Composition for Interactive Data Exploration

Service-Oriented Computing – ICSOC 2016 Workshops - Lecture Notes in Computer Science ◽

10.1007/978-3-319-68136-8_16 ◽

2017 ◽

pp. 152-157

Author(s):

Diego Serrano

Keyword(s):

Data Exploration ◽

Context Aware ◽

Interactive Data ◽

Workflow Composition

Download Full-text

A Query Language for Mobility Data Mining

Developments in Data Extraction, Management, and Analysis ◽

10.4018/978-1-4666-2148-0.ch002 ◽

2013 ◽

pp. 23-44

Author(s):

Roberto Trasarti ◽

Fosca Giannotti ◽

Mirco Nanni ◽

Dino Pedreschi ◽

Chiara Renso

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Mobile Communications ◽

Intelligent Transportation Systems ◽

Query Language ◽

Real Life ◽

Phone Call ◽

Discovery Process ◽

Mobility Data ◽

Gps Tracks

Download Full-text

Interactive Data Mining: A Short Background Study on Effective Interaction and Visualization by Association Rules

2nd International conference on Innovative Engineering Technologies (ICIET'2015) August 7-8, 2015 Bangkok (Thailand) ◽

10.15242/iie.e0815001 ◽

2015 ◽

Keyword(s):

Data Mining ◽

Association Rules ◽

Effective Interaction ◽

Interactive Data Mining ◽

Interactive Data

Download Full-text

Evaluasi Telemarketing Kartu Kredit Bank Menggunakan Algoritma Genetika untuk Seleksi Fitur dan Naive Bayes

Jurnal Aplikasi Pelayaran dan Kepelabuhanan ◽

10.30649/japk.v10i1.71 ◽

2020 ◽

Vol 10 (1) ◽

pp. 12

Author(s):

Ekka Pujo Ariesanto Akhmad

Keyword(s):

Data Mining ◽

Naive Bayes ◽

Naïve Bayes ◽

Standard Process ◽

Industry Standard

Bagian pemasaran bank sudah menampung data dari nasabah atau pelanggan bank dengan cara memasarkan atau mensosialisasikan kartu kredit lewat telepon (telemarketing). Evaluasi telemarketing kartu kredit yang sudah dilakukan bank masih kurang membawa hasil dan berdaya guna. Salah satu cara yang tepat untuk evaluasi laporan telemarketing kartu kredit bank adalah menggunakan teknik data mining. Tujuan penggunaan data mining untuk mengetahui kecenderungan dan pola nasabah yang berpeluang untuk berlangganan kartu kredit yang ditawarkan bank. Metode penelitian menggunakan Cross Industry Standard Process for Data Mining (CRISP-DM) dengan Algoritma Genetika untuk Seleksi Fitur (GAFS) dan Naive Bayes (NB). Hasil penelitian menunjukkan jumlah atribut pada dataset telemarketing kartu kredit bank sejumlah 15 atribut terdiri dari 14 atribut biasa dan 1 atribut spesial. Dataset telemarketing bank mengandung data berdimensi tinggi, sehingga diterapkan metode GAFS. Setelah menerapkan metode GAFS diperoleh 7 atribut optimal terdiri dari 6 atribut biasa dan 1 atribut spesial. Enam atribut biasa meliputi pekerjaan, balance, rumah, pinjaman, durasi, poutcome. Sedangkan atribut spesial adalah target. Hasil penelitian menunjukkan algoritma NB mempunyai nilai akurasi 86,71%. Algoritma GAFS dan NB meningkatkan nilai akurasi menjadi 90,27% untuk prediksi nasabah bank yang mengambil kartu kredit.

Download Full-text

Multi-GPU approach to global induction of classification trees for large-scale data mining

Applied Intelligence ◽

10.1007/s10489-020-01952-5 ◽

2021 ◽

Author(s):

Krzysztof Jurczuk ◽

Marcin Czajkowski ◽

Marek Kretowski

Keyword(s):

Data Mining ◽

Large Scale ◽

Real Life ◽

Population Based ◽

Tree Structure ◽

Global Approach ◽

Data Parallel ◽

Large Scale Data ◽

The Impact ◽

Scale Data

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.

Download Full-text

Calibrating the CreditRisk+ Model at Different Time Scales and in Presence of Temporal Autocorrelation †

Mathematics ◽

10.3390/math9141679 ◽

2021 ◽

Vol 9 (14) ◽

pp. 1679

Author(s):

Jacopo Giacomelli ◽

Luca Passalacqua

Keyword(s):

Time Series ◽

Default Risk ◽

Real Life ◽

Statistical Error ◽

Real Data ◽

Sampling Period ◽

Calibration Procedure ◽

Industry Standards ◽

Default Rate

The CreditRisk+ model is one of the industry standards for the valuation of default risk in credit loans portfolios. The calibration of CreditRisk+ requires, inter alia, the specification of the parameters describing the structure of dependence among default events. This work addresses the calibration of these parameters. In particular, we study the dependence of the calibration procedure on the sampling period of the default rate time series, that might be different from the time horizon onto which the model is used for forecasting, as it is often the case in real life applications. The case of autocorrelated time series and the role of the statistical error as a function of the time series period are also discussed. The findings of the proposed calibration technique are illustrated with the support of an application to real data.

Download Full-text

IDEBench: A Benchmark for Interactive Data Exploration

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data ◽

10.1145/3318464.3380574 ◽

2020 ◽

Author(s):

Philipp Eichmann ◽

Emanuel Zgraggen ◽

Carsten Binnig ◽

Tim Kraska

Keyword(s):

Data Exploration ◽

Interactive Data

Download Full-text