scholarly journals Enhance Image Classification Performance Via Unsupervised Pre-trained Transformers Language Models

2020 ◽  
Author(s):  
Dezhou Shen

Abstract Image classification and categorization are essential to the capability of telling the difference between images for a machine. As Bidirectional Encoder Representations from Transformers became popular in many tasks of natural language processing recent years, it is intuitive to use these pre-trained language models for enhancing the computer vision tasks, \eg image classification. In this paper, by encoding image pixels using pre-trained transformers, then connect to a fully connected layer, the classification model outperforms the Wide ResNet model and the linear-probe iGPT-L model, and achieved accuracy of 99.60%~99.74% on the CIFAR-10 image set and accuracy of 99.10%~99.76% on the CIFAR-100 image set.

2020 ◽  
Author(s):  
Dezhou Shen

Abstract An accurate and efficient image classification algorithm used in the COVID-19 detection for lung tomography can be of great help for doctors working in places without advance equipments. The machine with high accuracy COVID-19 classification model can relief the burden by making testing and checking thousands of people’s tomography images easy for a specific region which suffers from the COVID-19 outbreak incidents. By encoding image pixels and meta-data using the pre-trained language models of Bidirectional Encoder Representations from Transformers, then connect to a fully connected layer, the classification model outperforms the ResNet model and the DenseNet image classification model, and achieved accuracy of 99.51% ∼ 100.00% on the COVID-19 tomography image test set.


2021 ◽  
Vol 11 (1) ◽  
pp. 428
Author(s):  
Donghoon Oh ◽  
Jeong-Sik Park ◽  
Ji-Hwan Kim ◽  
Gil-Jin Jang

Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Yong Liang ◽  
Qi Cui ◽  
Xing Luo ◽  
Zhisong Xie

Rock classification is a significant branch of geology which can help understand the formation and evolution of the planet, search for mineral resources, and so on. In traditional methods, rock classification is usually done based on the experience of a professional. However, this method has problems such as low efficiency and susceptibility to subjective factors. Therefore, it is of great significance to establish a simple, fast, and accurate rock classification model. This paper proposes a fine-grained image classification network combining image cutting method and SBV algorithm to improve the classification performance of a small number of fine-grained rock samples. The method uses image cutting to achieve data augmentation without adding additional datasets and uses image block voting scoring to obtain richer complementary information, thereby improving the accuracy of image classification. The classification accuracy of 32 images is 75%, 68.75%, and 75%. The results show that the method proposed in this paper has a significant improvement in the accuracy of image classification, which is 34.375%, 18.75%, and 43.75% higher than that of the original algorithm. It verifies the effectiveness of the algorithm in this paper and at the same time proves that deep learning has great application value in the field of geology.


2012 ◽  
Vol 5 (11) ◽  
pp. 2881-2892 ◽  
Author(s):  
M. S. Ghonima ◽  
B. Urquhart ◽  
C. W. Chow ◽  
J. E. Shields ◽  
A. Cazorla ◽  
...  

Abstract. Digital images of the sky obtained using a total sky imager (TSI) are classified pixel by pixel into clear sky, optically thin and optically thick clouds. A new classification algorithm was developed that compares the pixel red-blue ratio (RBR) to the RBR of a clear sky library (CSL) generated from images captured on clear days. The difference, rather than the ratio, between pixel RBR and CSL RBR resulted in more accurate cloud classification. High correlation between TSI image RBR and aerosol optical depth (AOD) measured by an AERONET photometer was observed and motivated the addition of a haze correction factor (HCF) to the classification model to account for variations in AOD. Thresholds for clear and thick clouds were chosen based on a training image set and validated with set of manually annotated images. Misclassifications of clear and thick clouds into the opposite category were less than 1%. Thin clouds were classified with an accuracy of 60%. Accurate cloud detection and opacity classification techniques will improve the accuracy of short-term solar power forecasting.


2012 ◽  
Vol 5 (4) ◽  
pp. 4535-4569 ◽  
Author(s):  
M. S. Ghonima ◽  
B. Urquhart ◽  
C. W. Chow ◽  
J. E. Shields ◽  
A. Cazorla ◽  
...  

Abstract. Digital images of the sky obtained using a total sky imager (TSI) are classified pixel by pixel into clear sky, optically thin and optically thick clouds. A new classification algorithm was developed that compares the pixel red-blue ratio (RBR) to the RBR of a clear sky library (CSL) generated from images captured on clear days. The difference, rather than the ratio, between pixel RBR and CSL RBR resulted in more accurate cloud classification. High correlation between TSI image RBR and aerosol optical depth (AOD) measured by an AERONET photometer was observed and motivated the addition of a haze correction factor (HCF) to the classification model to account for variations in AOD. Thresholds for clear and thick clouds were chosen based on a training image set and validated with set of manually annotated images. Misclassifications of clear and thick clouds into the opposite category were less than 1%. Thin clouds were classified with an accuracy of 60%. Accurate cloud detection and opacity classification techniques will improve the accuracy of short-term solar power forecasting.


Author(s):  
Ivana Clairine Irsan ◽  
Masayu Leylia Khodra

Automatic news categorization is essential to automatically handle the classification of multi-label news articles in online portal. This research employs some potential methods to improve performance of hierarchical multi-label classifier for Indonesian news article. First potential method is using Convolutional Neural Network (CNN) to build the top level classifier. The second method could improve the classification performance by calculating the average of the word vectors obtained from distributed semantic model. The third method combines lexical and semantic method to extract documents features, which multiplied word term frequency (lexical) with word vector average (semantic). Model build using Calibrated Label Ranking as multi-label classification method, and trained using Naïve Bayes algorithm has the best F1-measure of 0.7531. Multiplication of word term frequency and the average of word vectors were also used to build this classifiers. This configuration improved multi-label classification performance by 4.25%, compared to the baseline. The distributed semantic model that gave best performance in this experiment obtained from 300-dimension word2vec of Wikipedia’s articles. The multi-label classification model performance is also influenced by news’ released date. The difference period between training and testing data would also decrease models’ performance.


2021 ◽  
pp. 016555152098550
Author(s):  
Alaettin Uçan ◽  
Murat Dörterler ◽  
Ebru Akçapınar Sezer

Emotion classification is a research field that aims to detect the emotions in a text using machine learning methods. In traditional machine learning (TML) methods, feature engineering processes cause the loss of some meaningful information, and classification performance is negatively affected. In addition, the success of modelling using deep learning (DL) approaches depends on the sample size. More samples are needed for Turkish due to the unique characteristics of the language. However, emotion classification data sets in Turkish are quite limited. In this study, the pretrained language model approach was used to create a stronger emotion classification model for Turkish. Well-known pretrained language models were fine-tuned for this purpose. The performances of these fine-tuned models for Turkish emotion classification were comprehensively compared with the performances of TML and DL methods in experimental studies. The proposed approach provides state-of-the-art performance for Turkish emotion classification.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Alireza Roshanzamir ◽  
Hamid Aghajan ◽  
Mahdieh Soleymani Baghshah

Abstract Background We developed transformer-based deep learning models based on natural language processing for early risk assessment of Alzheimer’s disease from the picture description test. Methods The lack of large datasets poses the most important limitation for using complex models that do not require feature engineering. Transformer-based pre-trained deep language models have recently made a large leap in NLP research and application. These models are pre-trained on available large datasets to understand natural language texts appropriately, and are shown to subsequently perform well on classification tasks with small training sets. The overall classification model is a simple classifier on top of the pre-trained deep language model. Results The models are evaluated on picture description test transcripts of the Pitt corpus, which contains data of 170 AD patients with 257 interviews and 99 healthy controls with 243 interviews. The large bidirectional encoder representations from transformers (BERTLarge) embedding with logistic regression classifier achieves classification accuracy of 88.08%, which improves the state-of-the-art by 2.48%. Conclusions Using pre-trained language models can improve AD prediction. This not only solves the problem of lack of sufficiently large datasets, but also reduces the need for expert-defined features.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1751
Author(s):  
Xiang Hu ◽  
Wenjing Yang ◽  
Hao Wen ◽  
Yu Liu ◽  
Yuanxi Peng

Hyperspectral image (HSI) classification is the subject of intense research in remote sensing. The tremendous success of deep learning in computer vision has recently sparked the interest in applying deep learning in hyperspectral image classification. However, most deep learning methods for hyperspectral image classification are based on convolutional neural networks (CNN). Those methods require heavy GPU memory resources and run time. Recently, another deep learning model, the transformer, has been applied for image recognition, and the study result demonstrates the great potential of the transformer network for computer vision tasks. In this paper, we propose a model for hyperspectral image classification based on the transformer, which is widely used in natural language processing. Besides, we believe we are the first to combine the metric learning and the transformer model in hyperspectral image classification. Moreover, to improve the model classification performance when the available training samples are limited, we use the 1-D convolution and Mish activation function. The experimental results on three widely used hyperspectral image data sets demonstrate the proposed model’s advantages in accuracy, GPU memory cost, and running time.


2021 ◽  
Author(s):  
Arousha Haghighian Roudsari ◽  
Jafar Afshar ◽  
Wookey Lee ◽  
Suan Lee

AbstractPatent classification is an expensive and time-consuming task that has conventionally been performed by domain experts. However, the increase in the number of filed patents and the complexity of the documents make the classification task challenging. The text used in patent documents is not always written in a way to efficiently convey knowledge. Moreover, patent classification is a multi-label classification task with a large number of labels, which makes the problem even more complicated. Hence, automating this expensive and laborious task is essential for assisting domain experts in managing patent documents, facilitating reliable search, retrieval, and further patent analysis tasks. Transfer learning and pre-trained language models have recently achieved state-of-the-art results in many Natural Language Processing tasks. In this work, we focus on investigating the effect of fine-tuning the pre-trained language models, namely, BERT, XLNet, RoBERTa, and ELECTRA, for the essential task of multi-label patent classification. We compare these models with the baseline deep-learning approaches used for patent classification. We use various word embeddings to enhance the performance of the baseline models. The publicly available USPTO-2M patent classification benchmark and M-patent datasets are used for conducting experiments. We conclude that fine-tuning the pre-trained language models on the patent text improves the multi-label patent classification performance. Our findings indicate that XLNet performs the best and achieves a new state-of-the-art classification performance with respect to precision, recall, F1 measure, as well as coverage error, and LRAP.


Sign in / Sign up

Export Citation Format

Share Document