imbalanced class distribution Latest Research Papers

A Deep Learning Technique for Classification of Breast Cancer Disease

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a3119.1011121 ◽

2021 ◽

Vol 11 (1) ◽

pp. 9-14

Author(s):

Dr.Yelepi Usha Rani ◽

◽

Lakshmi Sowmya Kotturi ◽

Dr. G. Sudhakar ◽

◽

...

Keyword(s):

Breast Cancer ◽

Breast Cancers ◽

Cancer Disease ◽

Cancer Breast ◽

Performance Metric ◽

Subsequent Decrease ◽

Learning Technique ◽

Proper Diagnosis ◽

Imbalanced Class Distribution

In recent years researchers are intensely using machine learning and employing AI techniques in the medical field particularly in the domain of cancer. Breast cancer is one such example and many studies have proposed CAD systems and algorithms to efficiently detect cancer cells and tumors. Breast cancer is one of the dreadful cancers accounting for a large portion of deaths caused due to cancer worldwide mostly affecting women, needs early detection for proper diagnosis, and subsequent decrease in death rate. Thus, for efficient classification, we implemented different ML techniques on Wisconsin dataset [1] namely SVM, KNN, Decision Tree, Random Forest, Naive Bayes using accuracy as a performance metric, and as per observance, SVM has shown better results when compared to other algorithms. Also, we worked on Breast Histopathology Images [2] scanned at 40x which had images of IDC which is one of the most common types of breast cancers. And to work with the image dataset along with EDA we used high-end techniques like a mobile net where smote a resampling was used to handle imbalanced class distribution, CNN, SVC, InceptionResNetV2 where frameworks like Tensor Flow, Keras were loaded for supporting the environment and smoothly implement the algorithms.

Cohort Characteristics and Factors Associated With Cannabis Use Among Adolescents in Canada Using Pattern Discovery and Disentanglement Method

10.21203/rs.3.rs-928545/v1 ◽

2021 ◽

Author(s):

Peiyuan Zhou ◽

Andrew K.C. Wong ◽

Yang Yang ◽

Scott T. Leatherdale ◽

Kate Battista ◽

...

Keyword(s):

Population Health ◽

Common Sense ◽

Pattern Discovery ◽

Health Data ◽

Cannabis Use ◽

Minority Class ◽

Factors Associated ◽

Special Cases ◽

Significant Patterns ◽

Imbalanced Class Distribution

Abstract Background: COMPASS is a longitudinal, prospective cohort study collecting data annually from students attending high school in jurisdictions across Canada. We aimed to discover significant frequent/rare associations of behavioral factors among Canadian adolescents related to cannabis use.Methods: We use a subset of the COMPASS dataset which contains 18,761 records of students in grades 9 to 12 with 31 selected features (attributes) involving various characteristics, from living habits to academic performance. We then used the Pattern Discovery and Disentanglement (PDD) algorithm to detect strong and rare (yet statistically significant) associations from the dataset.Results: Cohort characteristics and factors associated with cannabis use and other associations detected by PDD show consistent results with common sense and literature surveys. In addition, PDD outperformed methods using other criteria (i.e. support and confidence) popular as reported in the literature. Association results showed that PDD could discover: i) a smaller set of succinct significant associations in clusters; ii) frequent and rare, yet significant, patterns supported by population health relevant study; iii) patterns from a dataset with extremely imbalanced groups (majority class (None-user): minority class (Regular) = 88.3%: 11.7%). Conclusions: Results on the COMPASS dataset have validated PDD’s efficacy in discovering succinct interpretable frequent associations with comprehensive coverage and rare yet significant associations from datasets with extremely imbalanced class distribution without relying on any balancing process. The frequent associations show consistent results with common sense and literature surveys, while the rare patterns show very special cases. The success of PDD on this project indicates that PDD has great potential for population health data analysis.

INVESTIGATIONS ON FEATURE SIMILARITY AND THE IMPACT OF TRAINING DATA FOR LAND COVER CLASSIFICATION

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-3-2021-181-2021 ◽

2021 ◽

Vol V-3-2021 ◽

pp. 181-189

Author(s):

M. Voelsen ◽

D. Lobo Torres ◽

R. Q. Feitosa ◽

F. Rottensteiner ◽

C. Heipke

Keyword(s):

Land Cover ◽

Feature Space ◽

Land Cover Classification ◽

Classification Performance ◽

Training Data ◽

Label Noise ◽

Feature Vectors ◽

Real World Datasets ◽

Imbalanced Class Distribution ◽

The Impact

Abstract. Fully convolutional neural networks (FCN) are successfully used for pixel-wise land cover classification - the task of identifying the physical material of the Earth’s surface for every pixel in an image. The acquisition of large training datasets is challenging, especially in remote sensing, but necessary for a FCN to perform well. One way to circumvent manual labelling is the usage of existing databases, which usually contain a certain amount of label noise when combined with another data source. As a first part of this work, we investigate the impact of training data on a FCN. We experiment with different amounts of training data, varying w.r.t. the covered area, the available acquisition dates and the amount of label noise. We conclude that the more data is used for training, the better is the generalization performance of the model, and the FCN is able to mitigate the effect of label noise to a high degree. Another challenge is the imbalanced class distribution in most real-world datasets, which can cause the classifier to focus on the majority classes, leading to poor classification performance for minority classes. To tackle this problem, in this paper, we use the cosine similarity loss to force feature vectors of the same class to be close to each other in feature space. Our experiments show that the cosine loss helps to obtain more similar feature vectors, but the similarity of the cluster centers also increases.

Deep ConvLSTM Network with Dataset Resampling for Upper Body Activity Recognition Using Minimal Number of IMU Sensors

Applied Sciences ◽

10.3390/app11083543 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3543

Author(s):

Xiang Yang Lim ◽

Kok Beng Gan ◽

Noor Azah Abd Aziz

Keyword(s):

Loss Function ◽

Real Life ◽

Sensor Data ◽

Upper Body ◽

Imbalanced Dataset ◽

Model Accuracy ◽

Class Distribution ◽

Resampling Method ◽

Imbalanced Class ◽

Imbalanced Class Distribution

Human activity recognition (HAR) is the study of the identification of specific human movement and action based on images, accelerometer data and inertia measurement unit (IMU) sensors. In the sensor based HAR application, most of the researchers used many IMU sensors to get an accurate HAR classification. The use of many IMU sensors not only limits the deployment phase but also increase the difficulty and discomfort for users. As reported in the literature, the original model used 19 sensor data consisting of accelerometers and IMU sensors. The imbalanced class distribution is another challenge to the recognition of human activity in real-life. This is a real-life scenario, and the classifier may predict some of the imbalanced classes with very high accuracy. When a model is trained using an imbalanced dataset, it can degrade model’s performance. In this paper, two approaches, namely resampling and multiclass focal loss, were used to address the imbalanced dataset. The resampling method was used to reconstruct the imbalanced class distribution of the IMU sensor dataset prior to model development and learning using the cross-entropy loss function. A deep ConvLSTM network with a minimal number of IMU sensor data was used to develop the upper-body HAR model. On the other hand, the multiclass focal loss function was used in the HAR model and classified minority classes without the need to resample the imbalanced dataset. Based on the experiments results, the developed HAR model using a cross-entropy loss function and reconstructed dataset achieved a good performance of 0.91 in the model accuracy and F1-score. The HAR model with a multiclass focal loss function and imbalanced dataset has a slightly lower model accuracy and F1-score in both 1% difference from the resampling method. In conclusion, the upper body HAR model using a minimal number of IMU sensors and proper handling of imbalanced class distribution by the resampling method is useful for the assessment of home-based rehabilitation involving activities of daily living.

Developing Crash Severity Model Handling Class Imbalance and Implementing Ordered Nature: Focusing on Elderly Drivers

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18041966 ◽

2021 ◽

Vol 18 (4) ◽

pp. 1966

Author(s):

Seunghoon Kim ◽

Youngbin Lym ◽

Ki-Jung Kim

Keyword(s):

Random Forest ◽

Class Imbalance ◽

Crash Severity ◽

Crash Data ◽

Elderly Drivers ◽

Class Distribution ◽

Severity Class ◽

Imbalanced Classes ◽

Imbalanced Class Distribution

Along with the rapid demographic change, there has been increased attention to the risk of vehicle crashes relative to older drivers. Due to senior involvement and their physical vulnerability, it is crucial to develop models that accurately predict the severity of senior-involved crashes. However, the challenge is how to cope with an imbalanced severity class distribution and the ordered nature of crash severities, as these can complicate the classification of the severity of crashes. In that regard, this study investigates the influence of implementing ordinal nature and handling imbalanced class distribution on the prediction performance. Using vehicle crash data in Ohio, U.S., as an example, the eight machine learning classifiers (logistic and ordered logistic regressions and random forest and ordered random forest with or without handling imbalanced classes) are suggested and then compared with their respective performances. The analysis outcomes show that balancing strategy enhances performance in predicting severe crashes. In contrast, the effects of implementing ordinal nature vary across models. Specifically, the ordered random forest classifier without balancing appears to be superior in terms of overall prediction accuracy, and the ordered random forest with balancing outperforms others in predicting severer crashes.

Impact of Balancing Techniques for Imbalanced Class Distribution on Twitter Data for Emotion Analysis

Data Preprocessing, Active Learning, and Cost Perceptive Approaches for Resolving Data Imbalance - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-7371-6.ch012 ◽

2021 ◽

pp. 211-231

Author(s):

Shivani Vasantbhai Vora ◽

Rupa G. Mehta ◽

Shreyas Kishorkumar Patel

Keyword(s):

Data Augmentation ◽

Imbalanced Data ◽

Training Data ◽

Human Needs ◽

Class Distribution ◽

Redundant Data ◽

Imbalanced Class ◽

Understanding Emotions ◽

Imbalanced Class Distribution

Continuously growing technology enhances creativity and simplifies humans' lives and offers the possibility to anticipate and satisfy their unmet needs. Understanding emotions is a crucial part of human behavior. Machines must deeply understand emotions to be able to predict human needs. Most tweets have sentiments of the user. It inherits the imbalanced class distribution. Most machine learning (ML) algorithms are likely to get biased towards the majority classes. The imbalanced distribution of classes gained extensive attention as it has produced many research challenges. It demands efficient approaches to handle the imbalanced data set. Strategies used for balancing the distribution of classes in the case study are handling redundant data, resampling training data, and data augmentation. Six methods related to these techniques have been examined in a case study. Upon conducting experiments on the Twitter dataset, it is seen that merging minority classes and shuffle sentence methods outperform other techniques.

Online federated learning with imbalanced class distribution

24th Pan-Hellenic Conference on Informatics ◽

10.1145/3437120.3437282 ◽

2020 ◽

Author(s):

Konstantinos Giorgas ◽

Iraklis Varlamis

Keyword(s):

Class Distribution ◽

Imbalanced Class ◽

Imbalanced Class Distribution

The Use of Hellinger Distance Undersampling Model to Improve the Classification of Disease Class in Imbalanced Medical Datasets

Applied Bionics and Biomechanics ◽

10.1155/2020/8824625 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Zina Z. R. Al-Shamaa ◽

Sefer Kurnaz ◽

Adil Deniz Duru ◽

Nadia Peppa ◽

Alex H. Mirnezami ◽

...

Keyword(s):

State Of The Art ◽

Hellinger Distance ◽

Baseline Model ◽

Discrimination Power ◽

Minority Class ◽

Disease Class ◽

Imbalanced Class Distribution ◽

Medical Dataset ◽

Better Than

Imbalanced class distribution in the medical dataset is a challenging task that hinders classifying disease correctly. It emerges when the number of healthy class instances being much larger than the disease class instances. To solve this problem, we proposed undersampling the healthy class instances to improve disease class classification. This model is named Hellinger Distance Undersampling (HDUS). It employs the Hellinger Distance to measure the resemblance between majority class instance and its neighbouring minority class instances to separate classes effectively and boost the discrimination power for each class. An extensive experiment has been conducted on four imbalanced medical datasets using three classifiers to compare HDUS with a baseline model and three state-of-the-art undersampling models. The outcomes display that HDUS can perform better than other models in terms of sensitivity, F1 measure, and balanced accuracy.

A proactive decision support system for predicting traffic crash events: A critical analysis of imbalanced class distribution

Knowledge-Based Systems ◽

10.1016/j.knosys.2020.106314 ◽

2020 ◽

Vol 205 ◽

pp. 106314 ◽

Cited By ~ 2

Author(s):

Zouhair Elamrani Abou Elassad ◽

Hajar Mousannif ◽

Hassan Al Moatassime

Keyword(s):

Decision Support ◽

Decision Support System ◽

Support System ◽

Critical Analysis ◽

Traffic Crash ◽

Class Distribution ◽

Imbalanced Class ◽

Imbalanced Class Distribution

A Detailed Analysis on Classification Algorithms for Imbalanced Class Distribution on Credit Score Datasets

Adalya Journal ◽

10.37896/aj9.7/023 ◽

2020 ◽

Vol 9 (7) ◽

Keyword(s):

Detailed Analysis ◽

Classification Algorithms ◽

Credit Score ◽

Class Distribution ◽

Imbalanced Class ◽

Imbalanced Class Distribution

imbalanced class distribution
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Deep Learning Technique for Classification of Breast Cancer Disease

Cohort Characteristics and Factors Associated With Cannabis Use Among Adolescents in Canada Using Pattern Discovery and Disentanglement Method

INVESTIGATIONS ON FEATURE SIMILARITY AND THE IMPACT OF TRAINING DATA FOR LAND COVER CLASSIFICATION

Deep ConvLSTM Network with Dataset Resampling for Upper Body Activity Recognition Using Minimal Number of IMU Sensors

Developing Crash Severity Model Handling Class Imbalance and Implementing Ordered Nature: Focusing on Elderly Drivers

Impact of Balancing Techniques for Imbalanced Class Distribution on Twitter Data for Emotion Analysis

Online federated learning with imbalanced class distribution

The Use of Hellinger Distance Undersampling Model to Improve the Classification of Disease Class in Imbalanced Medical Datasets

A proactive decision support system for predicting traffic crash events: A critical analysis of imbalanced class distribution

A Detailed Analysis on Classification Algorithms for Imbalanced Class Distribution on Credit Score Datasets

Export Citation Format

imbalanced class distributionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Deep Learning Technique for Classification of Breast Cancer Disease

Cohort Characteristics and Factors Associated With Cannabis Use Among Adolescents in Canada Using Pattern Discovery and Disentanglement Method

INVESTIGATIONS ON FEATURE SIMILARITY AND THE IMPACT OF TRAINING DATA FOR LAND COVER CLASSIFICATION

Deep ConvLSTM Network with Dataset Resampling for Upper Body Activity Recognition Using Minimal Number of IMU Sensors

Developing Crash Severity Model Handling Class Imbalance and Implementing Ordered Nature: Focusing on Elderly Drivers

Impact of Balancing Techniques for Imbalanced Class Distribution on Twitter Data for Emotion Analysis

Online federated learning with imbalanced class distribution

The Use of Hellinger Distance Undersampling Model to Improve the Classification of Disease Class in Imbalanced Medical Datasets

A proactive decision support system for predicting traffic crash events: A critical analysis of imbalanced class distribution

A Detailed Analysis on Classification Algorithms for Imbalanced Class Distribution on Credit Score Datasets

imbalanced class distribution
Recently Published Documents