IMPROVING PERFORMANCE OF INDUCTIVE MODELS THROUGH AN ALGORITHM AND SAMPLE COMBINATION STRATEGY

Multiple approaches have been developed for improving predictive performance of a system by creating and combining various learned models. There are two main approaches to creating model ensembles. This first is to create a set of learned models by applying an algorithm repeatedly to different training sample data, the second applies various learning algorithms to the same sample data. The predictions of the models are then combined accordings to a voting scheme. This paper presents a method for combining models that were developed using numerous samples, modeling algorithms, and modelers and compares it with the alternate approaches. The presented results are based on findings from an ongoing operational data mining initiative with respect to selecting a model set that is best able to meet defined goals from among trained models. The operational goals to be attained in this initiative are to deploy data mining model(s) that maximizes specificity with minimal negative impact to sensitivity. The results of the model combination methods are evaluated with respect to sensitivity and false alarm rates and are then compared against other approaches.

Download Full-text

Probability Distribution and Deviation Information Fusion Driven Support Vector Regression Model and Its Application

Mathematical Problems in Engineering ◽

10.1155/2017/9650769 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11

Author(s):

Changhao Fan ◽

Xuefeng Yan

Keyword(s):

Probability Distribution ◽

Regression Model ◽

Support Vector Regression ◽

Predictive Performance ◽

Training Sample ◽

Support Vector ◽

Support Vector Regression Model ◽

Sample Data ◽

Error Coefficient ◽

Influence Of Noise

In modeling, only information from the deviation between the output of the support vector regression (SVR) model and the training sample is considered, whereas the other prior information of the training sample, such as probability distribution information, is ignored. Probabilistic distribution information describes the overall distribution of sample data in a training sample that contains different degrees of noise and potential outliers, as well as helping develop a high-accuracy model. To mine and use the probability distribution information of a training sample, a new support vector regression model that incorporates probability distribution information weight SVR (PDISVR) is proposed. In the PDISVR model, the probability distribution of each sample is considered as the weight and is then introduced into the error coefficient and slack variables of SVR. Thus, the deviation and probability distribution information of the training sample are both used in the PDISVR model to eliminate the influence of noise and outliers in the training sample and to improve predictive performance. Furthermore, examples with different degrees of noise were employed to demonstrate the performance of PDISVR, which was then compared with those of three SVR-based methods. The results showed that PDISVR performs better than the three other methods.

Download Full-text

General Data Mining Model System Based on Sample Data Division

2009 Second International Symposium on Knowledge Acquisition and Modeling ◽

10.1109/kam.2009.142 ◽

2009 ◽

Cited By ~ 2

Author(s):

Yan Chen ◽

Ming Yang ◽

Lin Zhang

Keyword(s):

Data Mining ◽

Model System ◽

Sample Data ◽

Data Division ◽

General Data ◽

Mining Model

Download Full-text

Changes in mean and extreme temperature and precipitation events from different weighted multi-model ensembles over the northern half of Morocco

Climate Dynamics ◽

10.1007/s00382-021-05910-w ◽

2021 ◽

Author(s):

Saloua Balhane ◽

Fatima Driouech ◽

Omar Chafki ◽

Rodrigo Manzanas ◽

Abdelghani Chehbouni ◽

...

Keyword(s):

Climate Change ◽

Climate Model ◽

Extreme Temperature ◽

Added Value ◽

Model Combination ◽

Combination Strategy ◽

Simple Arithmetic ◽

Wide Range ◽

Model Ensembles ◽

Model Weighting

AbstractInternal variability, multiple emission scenarios, and different model responses to anthropogenic forcing are ultimately behind a wide range of uncertainties that arise in climate change projections. Model weighting approaches are generally used to reduce the uncertainty related to the choice of the climate model. This study compares three multi-model combination approaches: a simple arithmetic mean and two recently developed weighting-based alternatives. One method takes into account models’ performance only and the other accounts for models’ performance and independence. The effect of these three multi-model approaches is assessed for projected changes of mean precipitation and temperature as well as four extreme indices over northern Morocco. We analyze different widely used high-resolution ensembles issued from statistical (NEXGDDP) and dynamical (Euro-CORDEX and bias-adjusted Euro-CORDEX) downscaling. For the latter, we also investigate the potential added value that bias adjustment may have over the raw dynamical simulations. Results show that model weighting can significantly reduce the spread of the future projections increasing their reliability. Nearly all model ensembles project a significant warming over the studied region (more intense inland than near the coasts), together with longer and more severe dry periods. In most cases, the different weighting methods lead to almost identical spatial patterns of climate change, indicating that the uncertainty due to the choice of multi-model combination strategy is nearly negligible.

Download Full-text

Research on Electricity Information Acquisition System Based on Sample Data Mining Model

Journal of Physics Conference Series ◽

10.1088/1742-6596/1634/1/012093 ◽

2020 ◽

Vol 1634 ◽

pp. 012093

Author(s):

Dang Zhongkui ◽

Fu Lei

Keyword(s):

Data Mining ◽

Information Acquisition ◽

Acquisition System ◽

Sample Data ◽

Mining Model

Download Full-text

Perancangan Aplikasi Prediksi Kelulusan Tepat Waktu Bagi Mahasiswa Baru Dengan Teknik Data Mining (Studi Kasus: Data Akademik Mahasiswa STMIK Dipanegara Makassar)

Creative Information Technology Journal ◽

10.24076/citec.2014v1i4.27 ◽

2015 ◽

Vol 1 (4) ◽

pp. 270

Author(s):

Muhammad Syukri Mustafa ◽

I. Wayan Simpen

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Test Results ◽

K Nearest Neighbor ◽

Accuracy Rate ◽

Sample Data ◽

New Students ◽

K Nearest Neighbor Algorithm ◽

Using Data ◽

Existing Data

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.

Download Full-text

A Novel Educational Data Mining Model using Classification Algorithm for evaluating Students E-learning Performance

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i5.616624 ◽

2019 ◽

Vol 7 (5) ◽

pp. 616-624

Author(s):

S. Arumugam ◽

A. Kovalan ◽

A.E. Narayanan

Keyword(s):

Data Mining ◽

Educational Data Mining ◽

Classification Algorithm ◽

Learning Performance ◽

E Learning ◽

Mining Model

Download Full-text

Research on Traditional Chinese Medicine Data Mining Model Based on Traditional Chinese Medicine Basic Theories and Knowledge Graphs

Proceedings of the 2020 International Symposium on Artificial Intelligence in Medical Sciences ◽

10.1145/3429889.3429909 ◽

2020 ◽

Author(s):

Rui Xiao ◽

Fengju Hu ◽

Wei Pei ◽

Minkun Bie

Keyword(s):

Data Mining ◽

Chinese Medicine ◽

Traditional Chinese Medicine ◽

Model Based ◽

Knowledge Graphs ◽

Mining Model

Download Full-text

A Novel Data Mining Model Based on SOAP in e-Commerce

2009 Ninth International Conference on Hybrid Intelligent Systems ◽

10.1109/his.2009.195 ◽

2009 ◽

Cited By ~ 1

Author(s):

Xiaofen Zhang ◽

Ben Niu ◽

Jing Zhao

Keyword(s):

Data Mining ◽

Model Based ◽

Mining Model

Download Full-text

A Three-stage Data Mining Model for Reject Inference

2012 Fifth International Conference on Business Intelligence and Financial Engineering ◽

10.1109/bife.2012.15 ◽

2012 ◽

Cited By ~ 1

Author(s):

Weimin Chen ◽

Youjin Liu ◽

Guocheng Xiang ◽

Yongqing Liu ◽

Kexi Wang

Keyword(s):

Data Mining ◽

Reject Inference ◽

Mining Model

Download Full-text

Research on Key Technology of Data Mining for Volleyball Game Based on Service System

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.543-547.4698 ◽

2014 ◽

Vol 543-547 ◽

pp. 4698-4701

Author(s):

Juan Wang

Keyword(s):

Data Mining ◽

High Precision ◽

Service System ◽

Processing System ◽

Machining Process ◽

Precision Machining ◽

Intelligent Processing ◽

Mining Model ◽

Processing Cycle ◽

Key Parameter

During the processing of aircraft and other high precision machinery workpieces, if using the traditional machining methods, it will consume a amount of machining costs, and the mechanical processing cycle is long. In this context, this paper designs a kind of robot intelligent processing system with high precision machinery. And it has realized the intelligent online control on the machining process by using the high precision machining intelligent online monitoring technology and the numerical simulation prediction technology. Finally, this system is introduced into the process of data mining for volleyball game, and designs the partial differential variational data mining model, which has realized the key parameter data mining of volleyball games service system, and has provided reliable parameters and technical support for the training of volleyball players.

Download Full-text