scholarly journals COPING WITH CLASS IMBALANCE IN CLASSIFICATION OF TRAFFIC CRASH SEVERITY BASED ON SENSOR AND ROAD DATA: A FEATURE SELECTION AND DATA AUGMENTATION APPROACH

Author(s):  
Deepti Lamba ◽  
Majed Alsadhan ◽  
William Hsu ◽  
Eric Fitzsimmons
Author(s):  
Seunghoon Kim ◽  
Youngbin Lym ◽  
Ki-Jung Kim

Along with the rapid demographic change, there has been increased attention to the risk of vehicle crashes relative to older drivers. Due to senior involvement and their physical vulnerability, it is crucial to develop models that accurately predict the severity of senior-involved crashes. However, the challenge is how to cope with an imbalanced severity class distribution and the ordered nature of crash severities, as these can complicate the classification of the severity of crashes. In that regard, this study investigates the influence of implementing ordinal nature and handling imbalanced class distribution on the prediction performance. Using vehicle crash data in Ohio, U.S., as an example, the eight machine learning classifiers (logistic and ordered logistic regressions and random forest and ordered random forest with or without handling imbalanced classes) are suggested and then compared with their respective performances. The analysis outcomes show that balancing strategy enhances performance in predicting severe crashes. In contrast, the effects of implementing ordinal nature vary across models. Specifically, the ordered random forest classifier without balancing appears to be superior in terms of overall prediction accuracy, and the ordered random forest with balancing outperforms others in predicting severer crashes.


Author(s):  
Harika Abburi ◽  
Pulkit Parikh ◽  
Niyati Chhaya ◽  
Vasudeva Varma

AbstractSexism, a permeate form of oppression, causes profound suffering through various manifestations. Given the increasing number of experiences of sexism shared online, categorizing these recollections automatically can support the battle against sexism, since it can promote successful evaluations by gender studies researchers and government representatives engaged in policy making. In this paper, we examine the fine-grained, multi-label classification of accounts (reports) of sexism. To the best of our knowledge, we consider substantially more categories of sexism than any related prior work through our 23-class problem formulation. Moreover, we present the first semi-supervised work for the multi-label classification of accounts describing any type(s) of sexism. We devise self-training-based techniques tailor-made for the multi-label nature of the problem to utilize unlabeled samples for augmenting the labeled set. We identify high textual diversity with respect to the existing labeled set as a desirable quality for candidate unlabeled instances and develop methods for incorporating it into our approach. We also explore ways of infusing class imbalance alleviation for multi-label classification into our semi-supervised learning, independently and in conjunction with the method involving diversity. In addition to data augmentation methods, we develop a neural model which combines biLSTM and attention with a domain-adapted BERT model in an end-to-end trainable manner. Further, we formulate a multi-level training approach in which models are sequentially trained using categories of sexism of different levels of granularity. Moreover, we devise a loss function that exploits any label confidence scores associated with the data. Several proposed methods outperform various baselines on a recently released dataset for multi-label sexism categorization across several standard metrics.


2002 ◽  
Vol 7 (1) ◽  
pp. 31-42
Author(s):  
J. Šaltytė ◽  
K. Dučinskas

The Bayesian classification rule used for the classification of the observations of the (second-order) stationary Gaussian random fields with different means and common factorised covariance matrices is investigated. The influence of the observed data augmentation to the Bayesian risk is examined for three different nonlinear widely applicable spatial correlation models. The explicit expression of the Bayesian risk for the classification of augmented data is derived. Numerical comparison of these models by the variability of Bayesian risk in case of the first-order neighbourhood scheme is performed.


2020 ◽  
Vol 64 (4) ◽  
pp. 40412-1-40412-11
Author(s):  
Kexin Bai ◽  
Qiang Li ◽  
Ching-Hsin Wang

Abstract To address the issues of the relatively small size of brain tumor image datasets, severe class imbalance, and low precision in existing segmentation algorithms for brain tumor images, this study proposes a two-stage segmentation algorithm integrating convolutional neural networks (CNNs) and conventional methods. Four modalities of the original magnetic resonance images were first preprocessed separately. Next, preliminary segmentation was performed using an improved U-Net CNN containing deep monitoring, residual structures, dense connection structures, and dense skip connections. The authors adopted a multiclass Dice loss function to deal with class imbalance and successfully prevented overfitting using data augmentation. The preliminary segmentation results subsequently served as the a priori knowledge for a continuous maximum flow algorithm for fine segmentation of target edges. Experiments revealed that the mean Dice similarity coefficients of the proposed algorithm in whole tumor, tumor core, and enhancing tumor segmentation were 0.9072, 0.8578, and 0.7837, respectively. The proposed algorithm presents higher accuracy and better stability in comparison with some of the more advanced segmentation algorithms for brain tumor images.


Information ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 249
Author(s):  
Xin Jin ◽  
Yuanwen Zou ◽  
Zhongbing Huang

The cell cycle is an important process in cellular life. In recent years, some image processing methods have been developed to determine the cell cycle stages of individual cells. However, in most of these methods, cells have to be segmented, and their features need to be extracted. During feature extraction, some important information may be lost, resulting in lower classification accuracy. Thus, we used a deep learning method to retain all cell features. In order to solve the problems surrounding insufficient numbers of original images and the imbalanced distribution of original images, we used the Wasserstein generative adversarial network-gradient penalty (WGAN-GP) for data augmentation. At the same time, a residual network (ResNet) was used for image classification. ResNet is one of the most used deep learning classification networks. The classification accuracy of cell cycle images was achieved more effectively with our method, reaching 83.88%. Compared with an accuracy of 79.40% in previous experiments, our accuracy increased by 4.48%. Another dataset was used to verify the effect of our model and, compared with the accuracy from previous results, our accuracy increased by 12.52%. The results showed that our new cell cycle image classification system based on WGAN-GP and ResNet is useful for the classification of imbalanced images. Moreover, our method could potentially solve the low classification accuracy in biomedical images caused by insufficient numbers of original images and the imbalanced distribution of original images.


2011 ◽  
Vol 32 (15) ◽  
pp. 4311-4326 ◽  
Author(s):  
Yasser Maghsoudi ◽  
Mohammad Javad Valadan Zoej ◽  
Michael Collins

2021 ◽  
Vol 11 (9) ◽  
pp. 3974
Author(s):  
Laila Bashmal ◽  
Yakoub Bazi ◽  
Mohamad Mahmoud Al Rahhal ◽  
Haikel Alhichri ◽  
Naif Al Ajlan

In this paper, we present an approach for the multi-label classification of remote sensing images based on data-efficient transformers. During the training phase, we generated a second view for each image from the training set using data augmentation. Then, both the image and its augmented version were reshaped into a sequence of flattened patches and then fed to the transformer encoder. The latter extracts a compact feature representation from each image with the help of a self-attention mechanism, which can handle the global dependencies between different regions of the high-resolution aerial image. On the top of the encoder, we mounted two classifiers, a token and a distiller classifier. During training, we minimized a global loss consisting of two terms, each corresponding to one of the two classifiers. In the test phase, we considered the average of the two classifiers as the final class labels. Experiments on two datasets acquired over the cities of Trento and Civezzano with a ground resolution of two-centimeter demonstrated the effectiveness of the proposed model.


2021 ◽  
Vol 11 (1) ◽  
pp. 28
Author(s):  
Ivan Lorencin ◽  
Sandi Baressi Šegota ◽  
Nikola Anđelić ◽  
Anđela Blagojević ◽  
Tijana Šušteršić ◽  
...  

COVID-19 represents one of the greatest challenges in modern history. Its impact is most noticeable in the health care system, mostly due to the accelerated and increased influx of patients with a more severe clinical picture. These facts are increasing the pressure on health systems. For this reason, the aim is to automate the process of diagnosis and treatment. The research presented in this article conducted an examination of the possibility of classifying the clinical picture of a patient using X-ray images and convolutional neural networks. The research was conducted on the dataset of 185 images that consists of four classes. Due to a lower amount of images, a data augmentation procedure was performed. In order to define the CNN architecture with highest classification performances, multiple CNNs were designed. Results show that the best classification performances can be achieved if ResNet152 is used. This CNN has achieved AUCmacro¯ and AUCmicro¯ up to 0.94, suggesting the possibility of applying CNN to the classification of the clinical picture of COVID-19 patients using an X-ray image of the lungs. When higher layers are frozen during the training procedure, higher AUCmacro¯ and AUCmicro¯ values are achieved. If ResNet152 is utilized, AUCmacro¯ and AUCmicro¯ values up to 0.96 are achieved if all layers except the last 12 are frozen during the training procedure.


Author(s):  
Christian Horn ◽  
Oscar Ivarsson ◽  
Cecilia Lindhé ◽  
Rich Potter ◽  
Ashely Green ◽  
...  

AbstractRock art carvings, which are best described as petroglyphs, were produced by removing parts of the rock surface to create a negative relief. This tradition was particularly strong during the Nordic Bronze Age (1700–550 BC) in southern Scandinavia with over 20,000 boats and thousands of humans, animals, wagons, etc. This vivid and highly engaging material provides quantitative data of high potential to understand Bronze Age social structures and ideologies. The ability to provide the technically best possible documentation and to automate identification and classification of images would help to take full advantage of the research potential of petroglyphs in southern Scandinavia and elsewhere. We, therefore, attempted to train a model that locates and classifies image objects using faster region-based convolutional neural network (Faster-RCNN) based on data produced by a novel method to improve visualizing the content of 3D documentations. A newly created layer of 3D rock art documentation provides the best data currently available and has reduced inscribed bias compared to older methods. Several models were trained based on input images annotated with bounding boxes produced with different parameters to find the best solution. The data included 4305 individual images in 408 scans of rock art sites. To enhance the models and enrich the training data, we used data augmentation and transfer learning. The successful models perform exceptionally well on boats and circles, as well as with human figures and wheels. This work was an interdisciplinary undertaking which led to important reflections about archaeology, digital humanities, and artificial intelligence. The reflections and the success represented by the trained models open novel avenues for future research on rock art.


2021 ◽  
Vol 11 (15) ◽  
pp. 6983
Author(s):  
Maritza Mera-Gaona ◽  
Diego M. López ◽  
Rubiel Vargas-Canas

Identifying relevant data to support the automatic analysis of electroencephalograms (EEG) has become a challenge. Although there are many proposals to support the diagnosis of neurological pathologies, the current challenge is to improve the reliability of the tools to classify or detect abnormalities. In this study, we used an ensemble feature selection approach to integrate the advantages of several feature selection algorithms to improve the identification of the characteristics with high power of differentiation in the classification of normal and abnormal EEG signals. Discrimination was evaluated using several classifiers, i.e., decision tree, logistic regression, random forest, and Support Vecctor Machine (SVM); furthermore, performance was assessed by accuracy, specificity, and sensitivity metrics. The evaluation results showed that Ensemble Feature Selection (EFS) is a helpful tool to select relevant features from the EEGs. Thus, the stability calculated for the EFS method proposed was almost perfect in most of the cases evaluated. Moreover, the assessed classifiers evidenced that the models improved in performance when trained with the EFS approach’s features. In addition, the classifier of epileptiform events built using the features selected by the EFS method achieved an accuracy, sensitivity, and specificity of 97.64%, 96.78%, and 97.95%, respectively; finally, the stability of the EFS method evidenced a reliable subset of relevant features. Moreover, the accuracy, sensitivity, and specificity of the EEG detector are equal to or greater than the values reported in the literature.


Sign in / Sign up

Export Citation Format

Share Document