Efficient Sampling and Handling of Variance in Tuning Data Mining Models

Abstract Background Sampling a small number of participants from an entire country is not straightforward. In this case, researchers reluctantly sample from a single setting or few settings, which limits the generalizability of findings. Therefore, there is a need to design efficient sampling method for small sample size surveys that can produce generalizable results at the country level. Methods Data comprised of twenty proxy variables to measure health services demands, structures, and outcomes of 413 districts of Iran. We used two data mining methods (hierarchical clustering method (HCM) and model-based clustering method (MCM)) to create homogenous groups of districts, i.e., strata based on these variables. We compared the internal and stability validity of the methods by statistical indices. An expert group checked the face validity of the methods, particularly regarding the total number of strata and the combination of districts in each stratum. The efficiency of selected method, which is measured by the inverse of variance, was compared with a simple random sampling (SRS) through simulation. The sampling design was tested in a national study in Iran, which aimed to evaluate the quality and costs of medical care for eight selected diseases by only recruiting 300 participants per disease at the country level. Results MCM and HCM divided the districts into eight and two clusters, respectively. The measures of internal and stability validity showed that clusters created by MCM were more separated, compact, and stable, thus forming our optimum strata. The probability of death from stroke, chronic obstructive pulmonary disease, and in-hospital mortality rate were the most important indicators that distinguished the eight strata. Based on the simulation results, MCM increased the efficiency of the sampling design up to 1.7 times compared to SRS. Conclusions The use of data mining improved the efficiency of sampling up to 1.7 times greater than SRS and markedly reduced the number of strata to eight in the entire country. The proposed sampling design also identified key variables that could be used to classify districts in Iran for sampling from these target populations in the future studies.

Download Full-text

Data Mining and Machine Learning

10.1017/9781108564175 ◽

2020 ◽

Cited By ~ 2

Author(s):

Mohammed J. Zaki ◽

Wagner Meira, Jr

Keyword(s):

Machine Learning ◽

Data Mining

Download Full-text

The economics of selection of mail orders Drs. Zahavi and Levin are the masterminds behind the development of AMOS, a customized predictive modeling system for the Franklin Mint in Philadelphia, and GainSmarts, a general purpose data mining system that is the two-time winner of the KDD-CUP competition for the best data mining tools (1997 and 1998) sponsored by the American Association for Artificial Intelligence.

Journal of Interactive Marketing ◽

10.1002/dir.1016.abs ◽

2001 ◽

Vol 15 (3) ◽

pp. 53

Author(s):

Nissan Levin ◽

Jacob Zahavi

Keyword(s):

Artificial Intelligence ◽

Data Mining ◽

Predictive Modeling ◽

American Association ◽

General Purpose ◽

Mining System ◽

Data Mining System ◽

Mining Tools ◽

Selection Of

Download Full-text

Heart Rate Variability, Emotions, and Music

Journal of Psychophysiology ◽

10.1027/0269-8803/a000021 ◽

2010 ◽

Vol 24 (2) ◽

pp. 112-119 ◽

Cited By ~ 9

Author(s):

F. Riganello ◽

A. Candelieri ◽

M. Quintieri ◽

G. Dolce

Keyword(s):

Data Mining ◽

Heart Rate ◽

Heart Rate Variability ◽

Vegetative State ◽

Low Frequency ◽

Emotional Reactions ◽

Heart Beat ◽

Healthy Controls ◽

Frequency Spectra ◽

Emotional Value

The purpose of the study was to identify significant changes in heart rate variability (an emerging descriptor of emotional conditions; HRV) concomitant to complex auditory stimuli with emotional value (music). In healthy controls, traumatic brain injured (TBI) patients, and subjects in the vegetative state (VS) the heart beat was continuously recorded while the subjects were passively listening to each of four music samples of different authorship. The heart rate (parametric and nonparametric) frequency spectra were computed and the spectra descriptors were processed by data-mining procedures. Data-mining sorted the nu_lf (normalized parameter unit of the spectrum low frequency range) as the significant descriptor by which the healthy controls, TBI patients, and VS subjects’ HRV responses to music could be clustered in classes matching those defined by the controls and TBI patients’ subjective reports. These findings promote the potential for HRV to reflect complex emotional stimuli and suggest that residual emotional reactions continue to occur in VS. HRV descriptors and data-mining appear applicable in brain function research in the absence of consciousness.

Download Full-text

Post, Mine, and Be Disturbed: Social Media Data Mining

PsycCRITIQUES ◽

10.1037/a0040619 ◽

2016 ◽

Vol 61 (51) ◽

Author(s):

Daniel Keyes

Keyword(s):

Data Mining ◽

Social Media ◽

Social Media Data ◽

Media Data

Download Full-text

Application of data mining techniques for identifying the holistic athlete's characteristics

PsycEXTRA Dataset ◽

10.1037/e548052012-458 ◽

2007 ◽

Author(s):

Stavroula Psouni ◽

Dimitris Psounis

Keyword(s):

Data Mining ◽

Data Mining Techniques

Download Full-text

Teaching introductory data mining using problem-based Learning

PsycEXTRA Dataset ◽

10.1037/e605122012-006 ◽

2011 ◽

Author(s):

Matthew A. North

Keyword(s):

Data Mining ◽

Problem Based Learning

Download Full-text

RIS-gestütztes Data Mining von Expertenwissen mit graphischer Visualisierung und Möglichkeit einer ad-hoc Expertenkonsultation

RöFo - Fortschritte auf dem Gebiet der Röntgenstrahlen und der bildgebenden Verfahren ◽

10.1055/s-2007-977052 ◽

2007 ◽

Vol 179 (S 1) ◽

Author(s):

J Hohmann ◽

T Schaaf ◽

B Bühring ◽

J Bischof ◽

H Tepe ◽

...

Keyword(s):

Data Mining ◽

Ad Hoc

Download Full-text

Challenges and Cloud Computing Environments Towards Big Data

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset207277 ◽

2014 ◽

pp. 203-208

Author(s):

Kiran Kumar S V N Madupu

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Big Data ◽

Technology Development ◽

Computing Environments ◽

Modern Technologies

Big Data has terrific influence on scientific discoveries and also value development. This paper presents approaches in data mining and modern technologies in Big Data. Difficulties of data mining as well as data mining with big data are discussed. Some technology development of data mining as well as data mining with big data are additionally presented.

Download Full-text

Penerapan Data Mining Untuk Prediksi Penjualan Mobil Menggunakan Metode K-Means Clustering

Jurnal Nasional Komputasi dan Teknologi Informasi (JNKTI) ◽

10.32672/jnkti.v3i3.2428 ◽

2020 ◽

Vol 3 (3) ◽

pp. 187-201

Author(s):

Sufajar Butsianto ◽

Nindi Tya Mayangwulan

Keyword(s):

Data Mining ◽

Clustering Data ◽

Cluster 2

Penggunaan mobil di Indonesia setiap tahunnya selalu meningkat dan membuat perusahaan otomotif berlomba-lomba dalam peningkatan penjualannya. Tujuan dari penelitian ini untuk mengelompokan data penjualan kedalam sebuah cluster dengan metode Data Mining Algoritma K-Means Clustering. Data Penjualan nantinya akan dikelompokan berdasarkan kemiripan data tersebut sehingga data dengan karakteristik yang sama akan berada dalam satu cluster. Atribut yang digunakan adalah brand dan penjualan. Cluster yang terbentuk setelah dilakukan proses K-Means Clustering terbagi menjadi tiga cluster yaitu Cluster 0 jumlah anggota 235 dengan presentase 26% dikategorikan Laris, Cluster 1 jumlah anggota 604 dengan presentase 67% dikategorikan Kurang Laris, dan Cluster 2 jumlah angota 61 dengan presentase 7% dikategorikan Paling Laris, dari proses clustering diatas dapat diperoleh validasi DBI (Davies Bouldin Index) dengan nilai 0,341

Download Full-text