scholarly journals Classification of imbalanced cloud image data using deep neural networks: performance improvement through a data science competition

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Daisuke Matsuoka

AbstractImage data classification using machine learning is an effective method for detecting atmospheric phenomena. However, extreme weather events with a small number of cases cause a decrease in classification prediction accuracy owing to the imbalance in data between the target class and the other classes. To build a highly accurate classification model, I held a data analysis competition to determine the best classification performance for two classes of cloud image data, specifically tropical cyclones including precursors and other classes. For the top models in the competition, minority data oversampling, majority data undersampling, ensemble learning, deep layer neural networks, and cost-effective loss functions were used to improve the classification performance of the imbalanced data. In particular, the best model of 209 submissions succeeded in improving the classification capability by 65.4% over similar conventional methods in a measure of the low false alarm ratio.

2021 ◽  
Author(s):  
Daisuke Matsuoka

Abstract Image data classification using machine learning is one of the effective methods for detecting atmospheric phenomena. However, extreme weather events with a small number of cases cause a decrease in classification prediction accuracy owing to the imbalance of data between the target class and the other classes. In order to build a highly accurate classification model, we held a data analysis competition to determine the best classification performance for two classes of cloud image data: tropical cyclones including precursors and other classes. For the top models in the competition, minority data oversampling, majority data undersampling, ensemble learning, deep layer neural networks, and cost-effective loss functions were used to improve the imbalanced classification performance. In particular, the best model out of 209 submissions succeeded in improving the classification capability by 65.4% over similar conventional methods in a measure of low false alarm ratio.


Mathematics ◽  
2021 ◽  
Vol 9 (6) ◽  
pp. 624
Author(s):  
Stefan Rohrmanstorfer ◽  
Mikhail Komarov ◽  
Felix Mödritscher

With the always increasing amount of image data, it has become a necessity to automatically look for and process information in these images. As fashion is captured in images, the fashion sector provides the perfect foundation to be supported by the integration of a service or application that is built on an image classification model. In this article, the state of the art for image classification is analyzed and discussed. Based on the elaborated knowledge, four different approaches will be implemented to successfully extract features out of fashion data. For this purpose, a human-worn fashion dataset with 2567 images was created, but it was significantly enlarged by the performed image operations. The results show that convolutional neural networks are the undisputed standard for classifying images, and that TensorFlow is the best library to build them. Moreover, through the introduction of dropout layers, data augmentation and transfer learning, model overfitting was successfully prevented, and it was possible to incrementally improve the validation accuracy of the created dataset from an initial 69% to a final validation accuracy of 84%. More distinct apparel like trousers, shoes and hats were better classified than other upper body clothes.


2019 ◽  
Vol 14 (1) ◽  
pp. 124-134 ◽  
Author(s):  
Shuai Zhang ◽  
Yong Chen ◽  
Xiaoling Huang ◽  
Yishuai Cai

Online feedback is an effective way of communication between government departments and citizens. However, the daily high number of public feedbacks has increased the burden on government administrators. The deep learning method is good at automatically analyzing and extracting deep features of data, and then improving the accuracy of classification prediction. In this study, we aim to use the text classification model to achieve the automatic classification of public feedbacks to reduce the work pressure of administrator. In particular, a convolutional neural network model combined with word embedding and optimized by differential evolution algorithm is adopted. At the same time, we compared it with seven common text classification models, and the results show that the model we explored has good classification performance under different evaluation metrics, including accuracy, precision, recall, and F1-score.


2020 ◽  
Vol 34 (04) ◽  
pp. 6680-6687
Author(s):  
Jian Yin ◽  
Chunjing Gan ◽  
Kaiqi Zhao ◽  
Xuan Lin ◽  
Zhe Quan ◽  
...  

Recently, imbalanced data classification has received much attention due to its wide applications. In the literature, existing researches have attempted to improve the classification performance by considering various factors such as the imbalanced distribution, cost-sensitive learning, data space improvement, and ensemble learning. Nevertheless, most of the existing methods focus on only part of these main aspects/factors. In this work, we propose a novel imbalanced data classification model that considers all these main aspects. To evaluate the performance of our proposed model, we have conducted experiments based on 14 public datasets. The results show that our model outperforms the state-of-the-art methods in terms of recall, G-mean, F-measure and AUC.


2021 ◽  
Vol 5 (6) ◽  
pp. 1153-1160
Author(s):  
Mayanda Mega Santoni ◽  
Nurul Chamidah ◽  
Desta Sandya Prasvita ◽  
Helena Nurramdhani Irmanda ◽  
Ria Astriratma ◽  
...  

One of efforts by the Indonesian people to defend the country is to preserve and to maintain the regional languages. The current era of modernity makes the regional language image become old-fashioned, so that most them are no longer spoken.  If it is ignored, then there will be a cultural identity crisis that causes regional languages to be vulnerable to extinction. Technological developments can be used as a way to preserve regional languages. Digital image-based artificial intelligence technology using machine learning methods such as machine translation can be used to answer the problems. This research will use Deep Learning method, namely Convolutional Neural Networks (CNN). Data of this research were 1300 alphabetic images, 5000 text images and 200 vocabularies of Minangkabau regional language. Alphabetic image data is used for the formation of the CNN classification model. This model is used for text image recognition, the results of which will be translated into regional languages. The accuracy of the CNN model is 98.97%, while the accuracy for text image recognition (OCR) is 50.72%. This low accuracy is due to the failure of segmentation on the letters i and j. However, the translation accuracy increases after the implementation of the Leveinstan Distance algorithm which can correct text classification errors, with an accuracy value of 75.78%. Therefore, this research has succeeded in implementing the Convolutional Neural Networks (CNN) method in identifying text in text images and the Leveinstan Distance method in translating Indonesian text into regional language texts.  


2015 ◽  
Vol 2015 ◽  
pp. 1-12 ◽  
Author(s):  
Wei Hu ◽  
Yangyu Huang ◽  
Li Wei ◽  
Fan Zhang ◽  
Hengchao Li

Recently, convolutional neural networks have demonstrated excellent performance on various visual tasks, including the classification of common two-dimensional images. In this paper, deep convolutional neural networks are employed to classify hyperspectral images directly in spectral domain. More specifically, the architecture of the proposed classifier contains five layers with weights which are the input layer, the convolutional layer, the max pooling layer, the full connection layer, and the output layer. These five layers are implemented on each spectral signature to discriminate against others. Experimental results based on several hyperspectral image data sets demonstrate that the proposed method can achieve better classification performance than some traditional methods, such as support vector machines and the conventional deep learning-based methods.


2020 ◽  
Author(s):  
Ishanu Chattopadhyay ◽  
Yi Huang ◽  
James Evans

Abstract Complex phenomena of societal interest such as weather, seismic activity and urban crime, are often punctuated by rare and extreme events, which are difficult to model and predict. Evidence of long-range persistence of such events has underscored the need to learn deep stochastic structures in data for effective forecasts. Recently neural networks (NN) have emerged as a defacto standard for deep learning. However, key problems remain with NN inference, including a high sample complexity, a general lack of transparency, and a limited ability to directly model stochastic phenomena. In this study we suggest that deep learning and the NN paradigm are conceptually distinct -- and that it is possible to learn ``deep' associations without invoking the ubiquitous NN strategy of global optimization via back-propagation. We show that deep learning of stochastic phenomena is related to uncovering the emergent self-similarities in data, which avoids the NN pitfalls offering crucial insights into underlying mechanisms. Using the Fractal Net (FN) architecture introduced here, we actionably forecast various categories of rare weather and seismic events, and property and violent crimes in major US cities. Compared to carefully tuned NNs, we boost recall at 90% precision by 161.9% for extreme weather events, 191.3% for light-to-severe seismic events with magnitudes above the local third quartile, and 50.8% - 404.9% for urban crime, demonstrating applicability in diverse systems of societal interest. This study opens the door to precise prediction of rare events in spatio-temporal phenomena, adding a new tool to the data science revolution.


Author(s):  
Jingzhao Hu ◽  
Hao Zhang ◽  
Yang Liu ◽  
Richard Sutcliffe ◽  
Jun Feng

AbstractIn recent years, Deep Neural Networks (DNNs) have achieved excellent performance on many tasks, but it is very difficult to train good models from imbalanced datasets. Creating balanced batches either by majority data down-sampling or by minority data up-sampling can solve the problem in certain cases. However, it may lead to learning process instability and overfitting. In this paper, we propose the Batch Balance Wrapper (BBW), a novel framework which can adapt a general DNN to be well trained from extremely imbalanced datasets with few minority samples. In BBW, two extra network layers are added to the start of a DNN. The layers prevent overfitting of minority samples and improve the expressiveness of the sample distribution of minority samples. Furthermore, Batch Balance (BB), a class-based sampling algorithm, is proposed to make sure the samples in each batch are always balanced during the learning process. We test BBW on three well-known extremely imbalanced datasets with few minority samples. The maximum imbalance ratio reaches 1167:1 with only 16 positive samples. Compared with existing approaches, BBW achieves better classification performance. In addition, BBW-wrapped DNNs are 16.39 times faster, relative to unwrapped DNNs. Moreover, BBW does not require data preprocessing or additional hyper-parameter tuning, operations that may require additional processing time. The experiments prove that BBW can be applied to common applications of extremely imbalanced data with few minority samples, such as the classification of EEG signals, medical images and so on.


Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2592
Author(s):  
Xuemin Cheng ◽  
Yong Ren ◽  
Kaichang Cheng ◽  
Jie Cao ◽  
Qun Hao

In this study, we propose a method for training convolutional neural networks to make them identify and classify images with higher classification accuracy. By combining the Cartesian and polar coordinate systems when describing the images, the method of recognition and classification for plankton images is discussed. The optimized classification and recognition networks are constructed. They are available for in situ plankton images, exploiting the advantages of both coordinate systems in the network training process. Fusing the two types of vectors and using them as the input for conventional machine learning models for classification, support vector machines (SVMs) are selected as the classifiers to combine these two features of vectors, coming from different image coordinate descriptions. The accuracy of the proposed model was markedly higher than those of the initial classical convolutional neural networks when using the in situ plankton image data, with the increases in classification accuracy and recall rate being 5.3% and 5.1% respectively. In addition, the proposed training method can improve the classification performance considerably when used on the public CIFAR-10 dataset.


2019 ◽  
Vol 277 ◽  
pp. 02001 ◽  
Author(s):  
Qiwei Yin ◽  
Ruixun Zhang ◽  
XiuLi Shao

In this paper, we propose a CNN(Convolutional neural networks) and RNN(recurrent neural networks) mixed model for image classification, the proposed network, called CNN-RNN model. Image data can be viewed as two-dimensional wave data, and convolution calculation is a filtering process. It can filter non-critical band information in an image, leaving behind important features of image information. The CNN-RNN model can use the RNN to Calculate the Dependency and Continuity Features of the Intermediate Layer Output of the CNN Model, connect the characteristics of these middle tiers to the final full-connection network for classification prediction, which will result in better classification accuracy. At the same time, in order to satisfy the restriction of the length of the input sequence by the RNN model and prevent the gradient explosion or gradient disappearing in the network, this paper combines the wavelet transform (WT) method in the Fourier transform to filter the input data. We will test the proposed CNN-RNN model on a widely-used datasets CIFAR-10. The results prove the proposed method has a better classification effect than the original CNN network, and that further investigation is needed.


Sign in / Sign up

Export Citation Format

Share Document