scholarly journals Fine-grained Image Classification by Visual-Semantic Embedding

Author(s):  
Huapeng Xu ◽  
Guilin Qi ◽  
Jingjing Li ◽  
Meng Wang ◽  
Kang Xu ◽  
...  

This paper investigates a challenging problem,which is known as fine-grained image classification(FGIC). Different from conventional computer visionproblems, FGIC suffers from the large intraclassdiversities and subtle inter-class differences.Existing FGIC approaches are limited to exploreonly the visual information embedded in the images.In this paper, we present a novel approachwhich can use handy prior knowledge from eitherstructured knowledge bases or unstructured text tofacilitate FGIC. Specifically, we propose a visual-semanticembedding model which explores semanticembedding from knowledge bases and text, andfurther trains a novel end-to-end CNN frameworkto linearly map image features to a rich semanticembedding space. Experimental results on a challenginglarge-scale UCSD Bird-200-2011 datasetverify that our approach outperforms several state-of-the-art methods with significant advances.

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yang He ◽  
Ling Tian ◽  
Lizong Zhang ◽  
Xi Zeng

Autonomous object detection powered by cutting-edge artificial intelligent techniques has been an essential component for sustaining complex smart city systems. Fine-grained image classification focuses on recognizing subcategories of specific levels of images. As a result of the high similarity between images in the same category and the high dissimilarity in the same subcategories, it has always been a challenging problem in computer vision. Traditional approaches usually rely on exploring only the visual information in images. Therefore, this paper proposes a novel Knowledge Graph Representation Fusion (KGRF) framework to introduce prior knowledge into fine-grained image classification task. Specifically, the Graph Attention Network (GAT) is employed to learn the knowledge representation from the constructed knowledge graph modeling the categories-subcategories and subcategories-attributes associations. By introducing the Multimodal Compact Bilinear (MCB) module, the framework can fully integrate the knowledge representation and visual features for learning the high-level image features. Extensive experiments on the Caltech-UCSD Birds-200-2011 dataset verify the superiority of our proposed framework over several existing state-of-the-art methods.


Author(s):  
Xiangteng He ◽  
Yuxin Peng ◽  
Junjie Zhao

Fine-grained visual categorization (FGVC) is the discrimination of similar subcategories, whose main challenge is to localize the quite subtle visual distinctions between similar subcategories. There are two pivotal problems: discovering which region is discriminative and representative, and determining how many discriminative regions are necessary to achieve the best performance. Existing methods generally solve these two problems relying on the prior knowledge or experimental validation, which extremely restricts the usability and scalability of FGVC. To address the "which" and "how many" problems adaptively and intelligently, this paper proposes a stacked deep reinforcement learning approach (StackDRL). It adopts a two-stage learning architecture, which is driven by the semantic reward function. Two-stage learning localizes the object and its parts in sequence ("which"), and determines the number of discriminative regions adaptively ("how many"), which is quite appealing in FGVC. Semantic reward function drives StackDRL to fully learn the discriminative and conceptual visual information, via jointly combining the attention-based reward and category-based reward. Furthermore, unsupervised discriminative localization avoids the heavy labor consumption of labeling, and extremely strengthens the usability and scalability of our StackDRL approach. Comparing with ten state-of-the-art methods on CUB-200-2011 dataset, our StackDRL approach achieves the best categorization accuracy.


2021 ◽  
Author(s):  
Gianni Brauwers ◽  
Flavius Frasincar

With the constantly growing number of reviews and other sentiment-bearing texts on the Web, the demand for automatic sentiment analysis algorithms continues to expand. Aspect-based sentiment classification (ABSC) allows for the automatic extraction of highly fine-grained sentiment information from text documents or sentences. In this survey, the rapidly evolving state of the research on ABSC is reviewed. A novel taxonomy is proposed that categorizes the ABSC models into three major categories: knowledge-based, machine learning, and hybrid models. This taxonomy is accompanied with summarizing overviews of the reported model performances, and both technical and intuitive explanations of the various ABSC models. State-of-the-art ABSC models are discussed, such as models based on the transformer model, and hybrid deep learning models that incorporate knowledge bases. Additionally, various techniques for representing the model inputs and evaluating the model outputs are reviewed. Furthermore, trends in the research on ABSC are identified and a discussion is provided on the ways in which the field of ABSC can be advanced in the future.


2021 ◽  
Vol 11 (18) ◽  
pp. 8694
Author(s):  
Mehak Maqbool Memon ◽  
Manzoor Ahmed Hashmani ◽  
Aisha Zahid Junejo ◽  
Syed Sajjad Rizvi ◽  
Adnan Ashraf Arain

Image classification of a visual scene based on visibility is significant due to the rise in readily available automated solutions. Currently, there are only two known spectrums of image visibility i.e., dark, and bright. However, normal environments include semi-dark scenarios. Hence, visual extremes that will lead to the accurate extraction of image features should be duly discarded. Fundamentally speaking there are two broad methods to perform visual scene-based image classification, i.e., machine learning (ML) methods and computer vision methods. In ML, the issues of insufficient data, sophisticated hardware and inadequate image classifier training time remain significant problems to be handled. These techniques fail to classify the visual scene-based images with high accuracy. The other alternative is computer vision (CV) methods, which also have major issues. CV methods do provide some basic procedures which may assist in such classification but, to the best of our knowledge, no CV algorithm exists to perform such classification, i.e., these do not account for semi-dark images in the first place. Moreover, these methods do not provide a well-defined protocol to calculate images’ content visibility and thereby classify images. One of the key algorithms for calculation of images’ content visibility is backed by the HSL (hue, saturation, lightness) color model. The HSL color model allows the visibility calculation of a scene by calculating the lightness/luminance of a single pixel. Recognizing the high potential of the HSL color model, we propose a novel framework relying on the simple approach of the statistical manipulation of an entire image’s pixel intensities, represented by HSL color model. The proposed algorithm, namely, Relative Perceived Luminance Classification (RPLC) uses the HSL (hue, saturation, lightness) color model to correctly identify the luminosity values of the entire image. Our findings prove that the proposed method yields high classification accuracy (over 78%) with a small error rate. We show that the computational complexity of RPLC is much less than that of the state-of-the-art ML algorithms.


Symmetry ◽  
2019 ◽  
Vol 11 (8) ◽  
pp. 1033 ◽  
Author(s):  
Wang ◽  
Zhang ◽  
Wang

Fine-grained image classification is a challenging problem because of its large intra-class differences and low inter-class variance. Bilinear pooling based models have been shown to be effective at fine-grained classification, while most previous approaches neglect the fact that distinctive features or modeling distinguishing regions usually have an important role in solving the fine-grained problem. In this paper, we propose a novel convolutional neural network framework, i.e., attention bilinear pooling, for fine-grained classification with attention. This framework can learn the distinctive feature information from the channel or spatial attention. Specifically, the channel and spatial attention allows the network to better focus on where the key targets are in the image. This paper embeds spatial attention and channel attention in the underlying network architecture to better represent image features. To further explore the differences between channels and spatial attention, we propose channel attention bilinear pooling (CAB), spatial attention bilinear pooling (SAB), channel spatial attention bilinear pooling (CSAB), and spatial channel attention bilinear pooling (SCAB) as four alternative frames. A variety of experiments on several datasets show that our proposed method has a very impressive performance compared to other methods based on bilinear pooling.


2021 ◽  
Author(s):  
◽  
~ Qurrat Ul Ain

<p>Skin image classification involves the development of computational methods for solving problems such as cancer detection in lesion images, and their use for biomedical research and clinical care. Such methods aim at extracting relevant information or knowledge from skin images that can significantly assist in the early detection of disease. Skin images are enormous, and come with various artifacts that hinder effective feature extraction leading to inaccurate classification. Feature selection and feature construction can significantly reduce the amount of data while improving classification performance by selecting prominent features and constructing high-level features. Existing approaches mostly rely on expert intervention and follow multiple stages for pre-processing, feature extraction, and classification, which decreases the reliability, and increases the computational complexity. Since good generalization accuracy is not always the primary objective, clinicians are also interested in analyzing specific features such as pigment network, streaks, and blobs responsible for developing the disease; interpretable methods are favored. In Evolutionary Computation, Genetic Programming (GP) can automatically evolve an interpretable model and address the curse of dimensionality (through feature selection and construction). GP has been successfully applied to many areas, but its potential for feature selection, feature construction, and classification in skin images has not been thoroughly investigated. The overall goal of this thesis is to develop a new GP approach to skin image classification by utilizing GP to evolve programs that are capable of automatically selecting prominent image features, constructing new high level features, interpreting useful image features which can help dermatologist to diagnose a type of cancer, and are robust to processing skin images captured from specialized instruments and standard cameras. This thesis focuses on utilizing a wide range of texture, color, frequency-based, local, and global image properties at the terminal nodes of GP to classify skin cancer images from multiple modalities effectively. This thesis develops new two-stage GP methods using embedded and wrapper feature selection and construction approaches to automatically generating a feature vector of selected and constructed features for classification. The results show that wrapper approach outperforms the embedded approach, the existing baseline GP and other machine learning methods, but the embedded approach is faster than the wrapper approach. This thesis develops a multi-tree GP based embedded feature selection approach for melanoma detection using domain specific and domain independent features. It explores suitable crossover and mutation operators to evolve GP classifiers effectively and further extends this approach using a weighted fitness function. The results show that these multi-tree approaches outperformed single tree GP and other classification methods. They identify that a specific feature extraction method extracts most suitable features for particular images taken from a specific optical instrument. This thesis develops the first GP method utilizing frequency-based wavelet features, where the wrapper based feature selection and construction methods automatically evolve useful constructed features to improve the classification performance. The results show the evidence of successful feature construction by significantly outperforming existing GP approaches, state-of-the-art CNN, and other classification methods. This thesis develops a GP approach to multiple feature construction for ensemble learning in classification. The results show that the ensemble method outperformed existing GP approaches, state-of-the-art skin image classification, and commonly used ensemble methods. Further analysis of the evolved constructed features identified important image features that can potentially help the dermatologist identify further medical procedures in real-world situations.</p>


Author(s):  
Ningyu Zhang ◽  
Shumin Deng ◽  
Xu Cheng ◽  
Xi Chen ◽  
Yichi Zhang ◽  
...  

Previous research has demonstrated the power of leveraging prior knowledge to improve the performance of deep models in natural language processing. However, traditional methods neglect the fact that redundant and irrelevant knowledge exists in external knowledge bases. In this study, we launched an in-depth empirical investigation into downstream tasks and found that knowledge-enhanced approaches do not always exhibit satisfactory improvements. To this end, we investigate the fundamental reasons for ineffective knowledge infusion and present selective injection for language pretraining, which constitutes a model-agnostic method and is readily pluggable into previous approaches. Experimental results on benchmark datasets demonstrate that our approach can enhance state-of-the-art knowledge injection methods.


2021 ◽  
Author(s):  
◽  
~ Qurrat Ul Ain

<p>Skin image classification involves the development of computational methods for solving problems such as cancer detection in lesion images, and their use for biomedical research and clinical care. Such methods aim at extracting relevant information or knowledge from skin images that can significantly assist in the early detection of disease. Skin images are enormous, and come with various artifacts that hinder effective feature extraction leading to inaccurate classification. Feature selection and feature construction can significantly reduce the amount of data while improving classification performance by selecting prominent features and constructing high-level features. Existing approaches mostly rely on expert intervention and follow multiple stages for pre-processing, feature extraction, and classification, which decreases the reliability, and increases the computational complexity. Since good generalization accuracy is not always the primary objective, clinicians are also interested in analyzing specific features such as pigment network, streaks, and blobs responsible for developing the disease; interpretable methods are favored. In Evolutionary Computation, Genetic Programming (GP) can automatically evolve an interpretable model and address the curse of dimensionality (through feature selection and construction). GP has been successfully applied to many areas, but its potential for feature selection, feature construction, and classification in skin images has not been thoroughly investigated. The overall goal of this thesis is to develop a new GP approach to skin image classification by utilizing GP to evolve programs that are capable of automatically selecting prominent image features, constructing new high level features, interpreting useful image features which can help dermatologist to diagnose a type of cancer, and are robust to processing skin images captured from specialized instruments and standard cameras. This thesis focuses on utilizing a wide range of texture, color, frequency-based, local, and global image properties at the terminal nodes of GP to classify skin cancer images from multiple modalities effectively. This thesis develops new two-stage GP methods using embedded and wrapper feature selection and construction approaches to automatically generating a feature vector of selected and constructed features for classification. The results show that wrapper approach outperforms the embedded approach, the existing baseline GP and other machine learning methods, but the embedded approach is faster than the wrapper approach. This thesis develops a multi-tree GP based embedded feature selection approach for melanoma detection using domain specific and domain independent features. It explores suitable crossover and mutation operators to evolve GP classifiers effectively and further extends this approach using a weighted fitness function. The results show that these multi-tree approaches outperformed single tree GP and other classification methods. They identify that a specific feature extraction method extracts most suitable features for particular images taken from a specific optical instrument. This thesis develops the first GP method utilizing frequency-based wavelet features, where the wrapper based feature selection and construction methods automatically evolve useful constructed features to improve the classification performance. The results show the evidence of successful feature construction by significantly outperforming existing GP approaches, state-of-the-art CNN, and other classification methods. This thesis develops a GP approach to multiple feature construction for ensemble learning in classification. The results show that the ensemble method outperformed existing GP approaches, state-of-the-art skin image classification, and commonly used ensemble methods. Further analysis of the evolved constructed features identified important image features that can potentially help the dermatologist identify further medical procedures in real-world situations.</p>


1995 ◽  
Vol 38 (5) ◽  
pp. 1126-1142 ◽  
Author(s):  
Jeffrey W. Gilger

This paper is an introduction to behavioral genetics for researchers and practioners in language development and disorders. The specific aims are to illustrate some essential concepts and to show how behavioral genetic research can be applied to the language sciences. Past genetic research on language-related traits has tended to focus on simple etiology (i.e., the heritability or familiality of language skills). The current state of the art, however, suggests that great promise lies in addressing more complex questions through behavioral genetic paradigms. In terms of future goals it is suggested that: (a) more behavioral genetic work of all types should be done—including replications and expansions of preliminary studies already in print; (b) work should focus on fine-grained, theory-based phenotypes with research designs that can address complex questions in language development; and (c) work in this area should utilize a variety of samples and methods (e.g., twin and family samples, heritability and segregation analyses, linkage and association tests, etc.).


2019 ◽  
Vol 11 (16) ◽  
pp. 1933 ◽  
Author(s):  
Yangyang Li ◽  
Ruoting Xing ◽  
Licheng Jiao ◽  
Yanqiao Chen ◽  
Yingte Chai ◽  
...  

Polarimetric synthetic aperture radar (PolSAR) image classification is a recent technology with great practical value in the field of remote sensing. However, due to the time-consuming and labor-intensive data collection, there are few labeled datasets available. Furthermore, most available state-of-the-art classification methods heavily suffer from the speckle noise. To solve these problems, in this paper, a novel semi-supervised algorithm based on self-training and superpixels is proposed. First, the Pauli-RGB image is over-segmented into superpixels to obtain a large number of homogeneous areas. Then, features that can mitigate the effects of the speckle noise are obtained using spatial weighting in the same superpixel. Next, the training set is expanded iteratively utilizing a semi-supervised unlabeled sample selection strategy that elaborately makes use of spatial relations provided by superpixels. In addition, a stacked sparse auto-encoder is self-trained using the expanded training set to obtain classification results. Experiments on two typical PolSAR datasets verified its capability of suppressing the speckle noise and showed excellent classification performance with limited labeled data.


Sign in / Sign up

Export Citation Format

Share Document