OVERSAMPLING METHOD TO HANDLING IMBALANCED DATASETS PROBLEM IN BINARY LOGISTIC REGRESSION ALGORITHM

Windyaning Ustyannie; S Suprapto

doi:10.22146/ijccs.37415

OVERSAMPLING METHOD TO HANDLING IMBALANCED DATASETS PROBLEM IN BINARY LOGISTIC REGRESSION ALGORITHM

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.37415 ◽

2020 ◽

Vol 14 (1) ◽

pp. 1

Author(s):

Windyaning Ustyannie ◽

S Suprapto

Keyword(s):

Logistic Regression ◽

Sampling Method ◽

Synthetic Data ◽

Class Imbalance ◽

Binary Logistic Regression ◽

Data Generation ◽

Synthetic Data Generation ◽

Logistic Regression Method ◽

Increase In Accuracy ◽

Logistic Regression Algorithm

The class imbalance is a condition when one class has a higher percentage than the other then it can affect the accuracy. One method in data mining that can be used to classification is logistic regression method. The method used in this research is RWO-sampling method using random replicate approach for synthetic data generation on descrete attribute. The result of the research can handle the problem of class imbalance, RWO-sampling method with random replicate approach shows better accuracy than RWO-sampling method with roulette and ROS approach. The accuracy value for RWO-Sampling method with roulette and RWO-Sampling approach with random replicate approach has increased to an average of 15.55% of each dataset. As for comparithem with the ROS method has increased an average of 3.7% of each dataset. Furthermore, for testing the underfitting problem in logistic regression, the oversampling method is better than non-oversampling with an increase in accuracy value reaching an average of 2.3% of each dataset.

Download Full-text

Machine learning based Synthetic Data Generation using Iterative Regression Analysis

2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA) ◽

10.1109/iceca49313.2020.9297491 ◽

2020 ◽

Author(s):

Sanskar Shah ◽

Darshan Gandhi ◽

Jil Kothari

Keyword(s):

Machine Learning ◽

Regression Analysis ◽

Synthetic Data ◽

Data Generation ◽

Synthetic Data Generation

Download Full-text

Synthetic data generation of high-resolution hyperspectral data using DIRSIG

10.1117/12.735264 ◽

2007 ◽

Cited By ~ 2

Author(s):

Marek K. Jakubowski ◽

David Pogorzala ◽

Timothy J. Hattenberger ◽

Scott D. Brown ◽

John R. Schott

Keyword(s):

High Resolution ◽

Synthetic Data ◽

Hyperspectral Data ◽

Data Generation ◽

Synthetic Data Generation

Download Full-text

Synthetic Data Generation to Support Irregular Sampling in Sensor Networks

GeoSensor Networks ◽

10.1201/9780203356869.ch12 ◽

2004 ◽

pp. 211-234 ◽

Cited By ~ 2

Author(s):

Lewis Girod ◽

Ramesh Govindan ◽

Deepak Ganesan ◽

Deborah Estrin ◽

Yan Yu

Keyword(s):

Sensor Networks ◽

Synthetic Data ◽

Irregular Sampling ◽

Data Generation ◽

Synthetic Data Generation

Download Full-text

Instance Segmentation in CARLA: Methodology and Analysis for Pedestrian-oriented Synthetic Data Generation in Crowded Scenes

10.1109/iccvw54120.2021.00115 ◽

2021 ◽

Author(s):

Maria Lyssenko ◽

Christoph Gladisch ◽

Christian Heinzemann ◽

Matthias Woehrle ◽

Rudolph Triebel

Keyword(s):

Synthetic Data ◽

Data Generation ◽

Synthetic Data Generation ◽

Crowded Scenes ◽

Instance Segmentation

Download Full-text

Faktor Keberhasilan Usaha UMKM Jajanan Asing Kaki Lima di Kota Serang

MANAJEMEN IKM: Jurnal Manajemen Pengembangan Industri Kecil Menengah ◽

10.29244/mikm.12.2.187-193 ◽

2018 ◽

Vol 12 (2) ◽

pp. 187

Author(s):

Fauzan Anggi Prasatya ◽

Tjahja Muhandri ◽

Eko Ruddy Cahyadi

Keyword(s):

Logistic Regression ◽

Analytical Method ◽

Success Factors ◽

Sampling Method ◽

Descriptive Analysis ◽

Binary Logistic Regression ◽

Success Factor ◽

Street Food ◽

Product Innovations ◽

Start Up

The competition of food business is currently very strict and diverse product innovations. To achieve the market share and win the business competition needs to know the affecting success factors. This study has two main objectives that include the following to: (1) mapping the characteristics of non traditional street food entrepreneur in Serang City, (2) identify the most affected success factor of non traditional street food business. Sampling method was used by purposive sampling 100 respondents. The analytical method used descriptive analysis and binary logistic regression. This research showed most of successful vendor are woman, because they are very conscientious than mens and tend to avoid risk. Affecting success factors on non traditional street food business were price of the product, business name and start up capital.

Download Full-text

Synthetic Data Generation Capabilties for Testing Data Mining Tools

MILCOM 2006 ◽

10.1109/milcom.2006.302440 ◽

2006 ◽

Cited By ~ 7

Author(s):

Daniel Jeske ◽

Pengyue Lin ◽

Carlos Rendon ◽

Rui Xiao ◽

Behrokh Samadi

Keyword(s):

Data Mining ◽

Synthetic Data ◽

Data Generation ◽

Synthetic Data Generation ◽

Testing Data ◽

Mining Tools

Download Full-text

When does Synthetic Data Generation Work?

2021 29th Signal Processing and Communications Applications Conference (SIU) ◽

10.1109/siu53274.2021.9477956 ◽

2021 ◽

Author(s):

Ahmet Topal ◽

Mehmet Fatih Amasyali

Keyword(s):

Synthetic Data ◽

Data Generation ◽

Synthetic Data Generation

Download Full-text

A Synthetic Data Generation Model for Diabetic Foot Treatment

Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications - Communications in Computer and Information Science ◽

10.1007/978-981-33-4370-2_18 ◽

2020 ◽

pp. 249-264

Author(s):

Jayun Hyun ◽

Seo Hu Lee ◽

Ha Min Son ◽

Ji-Ung Park ◽

Tai-Myoung Chung

Keyword(s):

Diabetic Foot ◽

Synthetic Data ◽

Generation Model ◽

Data Generation ◽

Synthetic Data Generation

Download Full-text

Ground penetrating radar measurements: Applications to synthetic data generation and target characterization

2010 IEEE International Geoscience and Remote Sensing Symposium ◽

10.1109/igarss.2010.5650683 ◽

2010 ◽

Cited By ~ 1

Author(s):

Naomi R. Schwartz ◽

Amir I. Zaghloul

Keyword(s):

Ground Penetrating Radar ◽

Synthetic Data ◽

Data Generation ◽

Synthetic Data Generation ◽

Target Characterization ◽

Radar Measurements ◽

Ground Penetrating

Download Full-text

Spectral density-based and measure-preserving ABC for partially observed diffusion processes. An illustration on Hamiltonian SDEs

Statistics and Computing ◽

10.1007/s11222-019-09909-6 ◽

2019 ◽

Vol 30 (3) ◽

pp. 627-648 ◽

Cited By ~ 1

Author(s):

Evelyn Buckwar ◽

Massimiliano Tamborrino ◽

Irene Tubikanec

Keyword(s):

Numerical Methods ◽

Spectral Density ◽

Diffusion Processes ◽

Model Simulation ◽

Broad Class ◽

Synthetic Data ◽

Summary Statistics ◽

Data Generation ◽

Synthetic Data Generation ◽

Partially Observed

Abstract Approximate Bayesian computation (ABC) has become one of the major tools of likelihood-free statistical inference in complex mathematical models. Simultaneously, stochastic differential equations (SDEs) have developed to an established tool for modelling time-dependent, real-world phenomena with underlying random effects. When applying ABC to stochastic models, two major difficulties arise: First, the derivation of effective summary statistics and proper distances is particularly challenging, since simulations from the stochastic process under the same parameter configuration result in different trajectories. Second, exact simulation schemes to generate trajectories from the stochastic model are rarely available, requiring the derivation of suitable numerical methods for the synthetic data generation. To obtain summaries that are less sensitive to the intrinsic stochasticity of the model, we propose to build up the statistical method (e.g. the choice of the summary statistics) on the underlying structural properties of the model. Here, we focus on the existence of an invariant measure and we map the data to their estimated invariant density and invariant spectral density. Then, to ensure that these model properties are kept in the synthetic data generation, we adopt measure-preserving numerical splitting schemes. The derived property-based and measure-preserving ABC method is illustrated on the broad class of partially observed Hamiltonian type SDEs, both with simulated data and with real electroencephalography data. The derived summaries are particularly robust to the model simulation, and this fact, combined with the proposed reliable numerical scheme, yields accurate ABC inference. In contrast, the inference returned using standard numerical methods (Euler–Maruyama discretisation) fails. The proposed ingredients can be incorporated into any type of ABC algorithm and directly applied to all SDEs that are characterised by an invariant distribution and for which a measure-preserving numerical method can be derived.

Download Full-text