COPING WITH CLASS IMBALANCE IN CLASSIFICATION OF TRAFFIC CRASH SEVERITY BASED ON SENSOR AND ROAD DATA: A FEATURE SELECTION AND DATA AUGMENTATION APPROACH

Developing Crash Severity Model Handling Class Imbalance and Implementing Ordered Nature: Focusing on Elderly Drivers

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18041966 ◽

2021 ◽

Vol 18 (4) ◽

pp. 1966

Author(s):

Seunghoon Kim ◽

Youngbin Lym ◽

Ki-Jung Kim

Keyword(s):

Random Forest ◽

Class Imbalance ◽

Crash Severity ◽

Crash Data ◽

Elderly Drivers ◽

Class Distribution ◽

Severity Class ◽

Imbalanced Classes ◽

Imbalanced Class Distribution

Along with the rapid demographic change, there has been increased attention to the risk of vehicle crashes relative to older drivers. Due to senior involvement and their physical vulnerability, it is crucial to develop models that accurately predict the severity of senior-involved crashes. However, the challenge is how to cope with an imbalanced severity class distribution and the ordered nature of crash severities, as these can complicate the classification of the severity of crashes. In that regard, this study investigates the influence of implementing ordinal nature and handling imbalanced class distribution on the prediction performance. Using vehicle crash data in Ohio, U.S., as an example, the eight machine learning classifiers (logistic and ordered logistic regressions and random forest and ordered random forest with or without handling imbalanced classes) are suggested and then compared with their respective performances. The analysis outcomes show that balancing strategy enhances performance in predicting severe crashes. In contrast, the effects of implementing ordinal nature vary across models. Specifically, the ordered random forest classifier without balancing appears to be superior in terms of overall prediction accuracy, and the ordered random forest with balancing outperforms others in predicting severer crashes.

Download Full-text

Fine-Grained Multi-label Sexism Classification Using a Semi-Supervised Multi-level Neural Approach

Data Science and Engineering ◽

10.1007/s41019-021-00168-y ◽

2021 ◽

Author(s):

Harika Abburi ◽

Pulkit Parikh ◽

Niyati Chhaya ◽

Vasudeva Varma

Keyword(s):

Data Augmentation ◽

Class Imbalance ◽

Neural Model ◽

Problem Formulation ◽

Fine Grained ◽

Training Approach ◽

Multi Level ◽

Level Training ◽

Different Levels

AbstractSexism, a permeate form of oppression, causes profound suffering through various manifestations. Given the increasing number of experiences of sexism shared online, categorizing these recollections automatically can support the battle against sexism, since it can promote successful evaluations by gender studies researchers and government representatives engaged in policy making. In this paper, we examine the fine-grained, multi-label classification of accounts (reports) of sexism. To the best of our knowledge, we consider substantially more categories of sexism than any related prior work through our 23-class problem formulation. Moreover, we present the first semi-supervised work for the multi-label classification of accounts describing any type(s) of sexism. We devise self-training-based techniques tailor-made for the multi-label nature of the problem to utilize unlabeled samples for augmenting the labeled set. We identify high textual diversity with respect to the existing labeled set as a desirable quality for candidate unlabeled instances and develop methods for incorporating it into our approach. We also explore ways of infusing class imbalance alleviation for multi-label classification into our semi-supervised learning, independently and in conjunction with the method involving diversity. In addition to data augmentation methods, we develop a neural model which combines biLSTM and attention with a domain-adapted BERT model in an end-to-end trainable manner. Further, we formulate a multi-level training approach in which models are sequentially trained using categories of sexism of different levels of granularity. Moreover, we devise a loss function that exploits any label confidence scores associated with the data. Several proposed methods outperform various baselines on a recently released dataset for multi-label sexism categorization across several standard metrics.

Download Full-text

Comparison of Nonlinear Spatial Correlation Models by the Influence of the Data Augmentation to the Classification Risk

Nonlinear Analysis Modelling and Control ◽

10.15388/na.2002.7.1.15200 ◽

2002 ◽

Vol 7 (1) ◽

pp. 31-42

Author(s):

J. Šaltytė ◽

K. Dučinskas

Keyword(s):

Spatial Correlation ◽

Random Fields ◽

Data Augmentation ◽

Gaussian Random Fields ◽

Classification Rule ◽

Numerical Comparison ◽

First Order ◽

Bayesian Risk ◽

Correlation Models

The Bayesian classification rule used for the classification of the observations of the (second-order) stationary Gaussian random fields with different means and common factorised covariance matrices is investigated. The influence of the observed data augmentation to the Bayesian risk is examined for three different nonlinear widely applicable spatial correlation models. The explicit expression of the Bayesian risk for the classification of augmented data is derived. Numerical comparison of these models by the variability of Bayesian risk in case of the first-order neighbourhood scheme is performed.

Download Full-text

Integrating Improved U-Net and Continuous Maximum Flow Algorithm for 3D Brain Tumor Image Segmentation

Journal of Imaging Science and Technology ◽

10.2352/j.imagingsci.technol.2020.64.4.040412 ◽

2020 ◽

Vol 64 (4) ◽

pp. 40412-1-40412-11

Author(s):

Kexin Bai ◽

Qiang Li ◽

Ching-Hsin Wang

Keyword(s):

Brain Tumor ◽

Data Augmentation ◽

A Priori ◽

Class Imbalance ◽

Maximum Flow ◽

Magnetic Resonance Images ◽

Tumor Segmentation ◽

Similarity Coefficients ◽

Segmentation Algorithms ◽

Flow Algorithm

Abstract To address the issues of the relatively small size of brain tumor image datasets, severe class imbalance, and low precision in existing segmentation algorithms for brain tumor images, this study proposes a two-stage segmentation algorithm integrating convolutional neural networks (CNNs) and conventional methods. Four modalities of the original magnetic resonance images were first preprocessed separately. Next, preliminary segmentation was performed using an improved U-Net CNN containing deep monitoring, residual structures, dense connection structures, and dense skip connections. The authors adopted a multiclass Dice loss function to deal with class imbalance and successfully prevented overfitting using data augmentation. The preliminary segmentation results subsequently served as the a priori knowledge for a continuous maximum flow algorithm for fine segmentation of target edges. Experiments revealed that the mean Dice similarity coefficients of the proposed algorithm in whole tumor, tumor core, and enhancing tumor segmentation were 0.9072, 0.8578, and 0.7837, respectively. The proposed algorithm presents higher accuracy and better stability in comparison with some of the more advanced segmentation algorithms for brain tumor images.

Download Full-text

An Imbalanced Image Classification Method for the Cell Cycle Phase

Information ◽

10.3390/info12060249 ◽

2021 ◽

Vol 12 (6) ◽

pp. 249

Author(s):

Xin Jin ◽

Yuanwen Zou ◽

Zhongbing Huang

Keyword(s):

Cell Cycle ◽

Deep Learning ◽

Image Classification ◽

Classification Accuracy ◽

Data Augmentation ◽

Cycle Phase ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Cellular Life

The cell cycle is an important process in cellular life. In recent years, some image processing methods have been developed to determine the cell cycle stages of individual cells. However, in most of these methods, cells have to be segmented, and their features need to be extracted. During feature extraction, some important information may be lost, resulting in lower classification accuracy. Thus, we used a deep learning method to retain all cell features. In order to solve the problems surrounding insufficient numbers of original images and the imbalanced distribution of original images, we used the Wasserstein generative adversarial network-gradient penalty (WGAN-GP) for data augmentation. At the same time, a residual network (ResNet) was used for image classification. ResNet is one of the most used deep learning classification networks. The classification accuracy of cell cycle images was achieved more effectively with our method, reaching 83.88%. Compared with an accuracy of 79.40% in previous experiments, our accuracy increased by 4.48%. Another dataset was used to verify the effect of our model and, compared with the accuracy from previous results, our accuracy increased by 12.52%. The results showed that our new cell cycle image classification system based on WGAN-GP and ResNet is useful for the classification of imbalanced images. Moreover, our method could potentially solve the low classification accuracy in biomedical images caused by insufficient numbers of original images and the imbalanced distribution of original images.

Download Full-text

Using class-based feature selection for the classification of hyperspectral data

International Journal of Remote Sensing ◽

10.1080/01431161.2010.486416 ◽

2011 ◽

Vol 32 (15) ◽

pp. 4311-4326 ◽

Cited By ~ 19

Author(s):

Yasser Maghsoudi ◽

Mohammad Javad Valadan Zoej ◽

Michael Collins

Keyword(s):

Feature Selection ◽

Hyperspectral Data ◽

Selection For

Download Full-text

UAV Image Multi-Labeling with Data-Efficient Transformers

Applied Sciences ◽

10.3390/app11093974 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3974

Author(s):

Laila Bashmal ◽

Yakoub Bazi ◽

Mohamad Mahmoud Al Rahhal ◽

Haikel Alhichri ◽

Naif Al Ajlan

Keyword(s):

Data Augmentation ◽

Feature Representation ◽

Aerial Image ◽

Remote Sensing Images ◽

Training Set ◽

Proposed Model ◽

Class Labels ◽

Using Data ◽

Uav Image

In this paper, we present an approach for the multi-label classification of remote sensing images based on data-efficient transformers. During the training phase, we generated a second view for each image from the training set using data augmentation. Then, both the image and its augmented version were reshaped into a sequence of flattened patches and then fed to the transformer encoder. The latter extracts a compact feature representation from each image with the help of a self-attention mechanism, which can handle the global dependencies between different regions of the high-resolution aerial image. On the top of the encoder, we mounted two classifiers, a token and a distiller classifier. During training, we minimized a global loss consisting of two terms, each corresponding to one of the two classifiers. In the test phase, we considered the average of the two classifiers as the final class labels. Experiments on two datasets acquired over the cities of Trento and Civezzano with a ground resolution of two-centimeter demonstrated the effectiveness of the proposed model.

Download Full-text

Automatic Evaluation of the Lung Condition of COVID-19 Patients Using X-ray Images and Convolutional Neural Networks

Journal of Personalized Medicine ◽

10.3390/jpm11010028 ◽

2021 ◽

Vol 11 (1) ◽

pp. 28

Author(s):

Ivan Lorencin ◽

Sandi Baressi Šegota ◽

Nikola Anđelić ◽

Anđela Blagojević ◽

Tijana Šušteršić ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Clinical Picture ◽

Data Augmentation ◽

Lower Amount ◽

Training Procedure ◽

X Ray ◽

Severe Clinical Picture ◽

Lung Condition

COVID-19 represents one of the greatest challenges in modern history. Its impact is most noticeable in the health care system, mostly due to the accelerated and increased influx of patients with a more severe clinical picture. These facts are increasing the pressure on health systems. For this reason, the aim is to automate the process of diagnosis and treatment. The research presented in this article conducted an examination of the possibility of classifying the clinical picture of a patient using X-ray images and convolutional neural networks. The research was conducted on the dataset of 185 images that consists of four classes. Due to a lower amount of images, a data augmentation procedure was performed. In order to define the CNN architecture with highest classification performances, multiple CNNs were designed. Results show that the best classification performances can be achieved if ResNet152 is used. This CNN has achieved AUCmacro¯ and AUCmicro¯ up to 0.94, suggesting the possibility of applying CNN to the classification of the clinical picture of COVID-19 patients using an X-ray image of the lungs. When higher layers are frozen during the training procedure, higher AUCmacro¯ and AUCmicro¯ values are achieved. If ResNet152 is utilized, AUCmacro¯ and AUCmicro¯ values up to 0.96 are achieved if all layers except the last 12 are frozen during the training procedure.

Download Full-text

Artificial Intelligence, 3D Documentation, and Rock Art—Approaching and Reflecting on the Automation of Identification and Classification of Rock Art Images

Journal of Archaeological Method and Theory ◽

10.1007/s10816-021-09518-6 ◽

2021 ◽

Author(s):

Christian Horn ◽

Oscar Ivarsson ◽

Cecilia Lindhé ◽

Rich Potter ◽

Ashely Green ◽

...

Keyword(s):

Artificial Intelligence ◽

Bronze Age ◽

Data Augmentation ◽

Rock Art ◽

Training Data ◽

Future Research ◽

Rock Surface ◽

Southern Scandinavia ◽

Bounding Boxes

AbstractRock art carvings, which are best described as petroglyphs, were produced by removing parts of the rock surface to create a negative relief. This tradition was particularly strong during the Nordic Bronze Age (1700–550 BC) in southern Scandinavia with over 20,000 boats and thousands of humans, animals, wagons, etc. This vivid and highly engaging material provides quantitative data of high potential to understand Bronze Age social structures and ideologies. The ability to provide the technically best possible documentation and to automate identification and classification of images would help to take full advantage of the research potential of petroglyphs in southern Scandinavia and elsewhere. We, therefore, attempted to train a model that locates and classifies image objects using faster region-based convolutional neural network (Faster-RCNN) based on data produced by a novel method to improve visualizing the content of 3D documentations. A newly created layer of 3D rock art documentation provides the best data currently available and has reduced inscribed bias compared to older methods. Several models were trained based on input images annotated with bounding boxes produced with different parameters to find the best solution. The data included 4305 individual images in 408 scans of rock art sites. To enhance the models and enrich the training data, we used data augmentation and transfer learning. The successful models perform exceptionally well on boats and circles, as well as with human figures and wheels. This work was an interdisciplinary undertaking which led to important reflections about archaeology, digital humanities, and artificial intelligence. The reflections and the success represented by the trained models open novel avenues for future research on rock art.

Download Full-text

An Ensemble Feature Selection Approach to Identify Relevant Features from EEG Signals

Applied Sciences ◽

10.3390/app11156983 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6983

Author(s):

Maritza Mera-Gaona ◽

Diego M. López ◽

Rubiel Vargas-Canas

Keyword(s):

Feature Selection ◽

Sensitivity And Specificity ◽

Eeg Signals ◽

Specificity And Sensitivity ◽

Selection Approach ◽

Feature Selection Approach ◽

The Stability ◽

Selection Algorithms ◽

Epileptiform Events

Identifying relevant data to support the automatic analysis of electroencephalograms (EEG) has become a challenge. Although there are many proposals to support the diagnosis of neurological pathologies, the current challenge is to improve the reliability of the tools to classify or detect abnormalities. In this study, we used an ensemble feature selection approach to integrate the advantages of several feature selection algorithms to improve the identification of the characteristics with high power of differentiation in the classification of normal and abnormal EEG signals. Discrimination was evaluated using several classifiers, i.e., decision tree, logistic regression, random forest, and Support Vecctor Machine (SVM); furthermore, performance was assessed by accuracy, specificity, and sensitivity metrics. The evaluation results showed that Ensemble Feature Selection (EFS) is a helpful tool to select relevant features from the EEGs. Thus, the stability calculated for the EFS method proposed was almost perfect in most of the cases evaluated. Moreover, the assessed classifiers evidenced that the models improved in performance when trained with the EFS approach’s features. In addition, the classifier of epileptiform events built using the features selected by the EFS method achieved an accuracy, sensitivity, and specificity of 97.64%, 96.78%, and 97.95%, respectively; finally, the stability of the EFS method evidenced a reliable subset of relevant features. Moreover, the accuracy, sensitivity, and specificity of the EEG detector are equal to or greater than the values reported in the literature.

Download Full-text