Knowledge Graph Representation Fusion Framework for Fine-Grained Object Recognition in Smart Cities

Autonomous object detection powered by cutting-edge artificial intelligent techniques has been an essential component for sustaining complex smart city systems. Fine-grained image classification focuses on recognizing subcategories of specific levels of images. As a result of the high similarity between images in the same category and the high dissimilarity in the same subcategories, it has always been a challenging problem in computer vision. Traditional approaches usually rely on exploring only the visual information in images. Therefore, this paper proposes a novel Knowledge Graph Representation Fusion (KGRF) framework to introduce prior knowledge into fine-grained image classification task. Specifically, the Graph Attention Network (GAT) is employed to learn the knowledge representation from the constructed knowledge graph modeling the categories-subcategories and subcategories-attributes associations. By introducing the Multimodal Compact Bilinear (MCB) module, the framework can fully integrate the knowledge representation and visual features for learning the high-level image features. Extensive experiments on the Caltech-UCSD Birds-200-2011 dataset verify the superiority of our proposed framework over several existing state-of-the-art methods.

Download Full-text

Fine-grained Image Classification by Visual-Semantic Embedding

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/145 ◽

2018 ◽

Cited By ~ 3

Author(s):

Huapeng Xu ◽

Guilin Qi ◽

Jingjing Li ◽

Meng Wang ◽

Kang Xu ◽

...

Keyword(s):

Image Classification ◽

Prior Knowledge ◽

Visual Information ◽

State Of The Art ◽

Knowledge Bases ◽

Image Features ◽

Challenging Problem ◽

Fine Grained ◽

Unstructured Text ◽

Conventional Computer

This paper investigates a challenging problem,which is known as fine-grained image classification(FGIC). Different from conventional computer visionproblems, FGIC suffers from the large intraclassdiversities and subtle inter-class differences.Existing FGIC approaches are limited to exploreonly the visual information embedded in the images.In this paper, we present a novel approachwhich can use handy prior knowledge from eitherstructured knowledge bases or unstructured text tofacilitate FGIC. Specifically, we propose a visual-semanticembedding model which explores semanticembedding from knowledge bases and text, andfurther trains a novel end-to-end CNN frameworkto linearly map image features to a rich semanticembedding space. Experimental results on a challenginglarge-scale UCSD Bird-200-2011 datasetverify that our approach outperforms several state-of-the-art methods with significant advances.

Download Full-text

Hybrid Attention Network for Language-Based Person Search

Sensors ◽

10.3390/s20185279 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5279

Author(s):

Yang Li ◽

Huahu Xu ◽

Junsheng Xiao

Keyword(s):

Image Features ◽

Attention Mechanism ◽

Feature Representation ◽

Semantic Features ◽

Retrieval Task ◽

Attention Network ◽

Fine Grained ◽

Person Search ◽

High Level ◽

Language Description

Language-based person search retrieves images of a target person using natural language description and is a challenging fine-grained cross-modal retrieval task. A novel hybrid attention network is proposed for the task. The network includes the following three aspects: First, a cubic attention mechanism for person image, which combines cross-layer spatial attention and channel attention. It can fully excavate both important midlevel details and key high-level semantics to obtain better discriminative fine-grained feature representation of a person image. Second, a text attention network for language description, which is based on bidirectional LSTM (BiLSTM) and self-attention mechanism. It can better learn the bidirectional semantic dependency and capture the key words of sentences, so as to extract the context information and key semantic features of the language description more effectively and accurately. Third, a cross-modal attention mechanism and a joint loss function for cross-modal learning, which can pay more attention to the relevant parts between text and image features. It can better exploit both the cross-modal and intra-modal correlation and can better solve the problem of cross-modal heterogeneity. Extensive experiments have been conducted on the CUHK-PEDES dataset. Our approach obtains higher performance than state-of-the-art approaches, demonstrating the advantage of the approach we propose.

Download Full-text

Image-embodied Knowledge Representation Learning

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/438 ◽

2017 ◽

Cited By ~ 13

Author(s):

Ruobing Xie ◽

Zhiyuan Liu ◽

Huanbo Luan ◽

Maosong Sun

Keyword(s):

Knowledge Representation ◽

Visual Information ◽

Representation Learning ◽

Learning Model ◽

Experimental Results ◽

Knowledge Graph ◽

Embodied Knowledge ◽

Knowledge Representations ◽

Conventional Methods ◽

Image Representations

Entity images could provide significant visual information for knowledge representation learning. Most conventional methods learn knowledge representations merely from structured triples, ignoring rich visual information extracted from entity images. In this paper, we propose a novel Image-embodied Knowledge Representation Learning model (IKRL), where knowledge representations are learned with both triple facts and images. More specifically, we first construct representations for all images of an entity with a neural image encoder. These image representations are then integrated into an aggregated image-based representation via an attention-based method. We evaluate our IKRL models on knowledge graph completion and triple classification. Experimental results demonstrate that our models outperform all baselines on both tasks, which indicates the significance of visual information for knowledge representations and the capability of our models in learning knowledge representations with images.

Download Full-text

A Privacy-Preserving Attribute-Based Encryption System for Data Sharing in Smart Cities

Wireless Communications and Mobile Computing ◽

10.1155/2021/6686675 ◽

2021 ◽

Vol 2021 ◽

pp. 1-15

Author(s):

Xieyang Shen ◽

Chuanhe Huang ◽

Danxin Wang ◽

Jiaoli Shi

Keyword(s):

Data Sharing ◽

Service Providers ◽

Smart Cities ◽

Information Leakage ◽

Cloud Service ◽

Data Access ◽

Smart Devices ◽

Fine Grained ◽

Data Access Control ◽

High Level

Information leakage and efficiency are the two main concerns of data sharing in cloud-aided IoT. The main problem is that smart devices cannot afford both energy and computation costs and tend to outsource data to a cloud server. Furthermore, most schemes focus on preserving the data stored in the cloud but omitting the access policy is typically stored in unencrypted form. In this paper, we proposed a fine-grained data access control scheme based on CP-ABE to implement access policies with a greater degree of expressiveness as well as hidden policies from curious cloud service providers. Moreover, to mitigate the extra computation cost generated by complex policies, an outsourcing service for decryption can be used by data users. Further experiments and extensive analysis show that we significantly decrease the communication and computation overhead while providing a high-level security scheme compared with the existing schemes.

Download Full-text

Embodying the Number of an Entity’s Relations for Knowledge Representation Learning

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194021500509 ◽

2021 ◽

Vol 31 (10) ◽

pp. 1495-1515

Author(s):

Xinhua Suo ◽

Bing Guo ◽

Yan Shen ◽

Wei Wang ◽

Yaosen Chen ◽

...

Keyword(s):

Knowledge Representation ◽

Visual Information ◽

Large Scale ◽

Critical Role ◽

Representation Learning ◽

Knowledge Graph ◽

Source Information ◽

Structure Information ◽

Common Information ◽

Additional Information

Knowledge representation learning (knowledge graph embedding) plays a critical role in the application of knowledge graph construction. The multi-source information knowledge representation learning, which is one class of the most promising knowledge representation learning at present, mainly focuses on learning a large number of useful additional information of entities and relations in the knowledge graph into their embeddings, such as the text description information, entity type information, visual information, graph structure information, etc. However, there is a kind of simple but very common information — the number of an entity’s relations which means the number of an entity’s semantic types has been ignored. This work proposes a multi-source knowledge representation learning model KRL-NER, which embodies information of the number of an entity’s relations between entities into the entities’ embeddings through the attention mechanism. Specifically, first of all, we design and construct a submodel of the KRL-NER LearnNER which learns an embedding including the information on the number of an entity’s relations; then, we obtain a new embedding by exerting attention onto the embedding learned by the models such as TransE with this embedding; finally, we translate based onto the new embedding. Experiments, such as related tasks on knowledge graph: entity prediction, entity prediction under different relation types, and triple classification, are carried out to verify our model. The results show that our model is effective on the large-scale knowledge graphs, e.g. FB15K.

Download Full-text

Feature Selection in Big Image Datasets

Proceedings ◽

10.3390/proceedings2020054040 ◽

2020 ◽

Vol 54 (1) ◽

pp. 40

Author(s):

J. Guzmán Figueira-Domínguez ◽

Verónica Bolón-Canedo ◽

Beatriz Remeseiro

Keyword(s):

Feature Selection ◽

Image Features ◽

Extraction Techniques ◽

Big Data Technologies ◽

High Level ◽

Traditional Approaches ◽

The Impact ◽

Broad Feature ◽

Current Feature ◽

Selection Of

In computer vision, current feature extraction techniques generate high dimensional data. Both convolutional neural networks and traditional approaches like keypoint detectors are used as extractors of high-level features. However, the resulting datasets have grown in the number of features, leading into long training times due to the curse of dimensionality. In this research, some feature selection methods were applied to these image features through big data technologies. Additionally, we analyzed how image resolutions may affect to extracted features and the impact of applying a selection of the most relevant features. Experimental results show that making an important reduction of the extracted features provides classification results similar to those obtained with the full set of features and, in some cases, outperforms the results achieved using broad feature vectors.

Download Full-text

Spatiotemporal Prediction of Theft Risk with Deep Inception-Residual Networks

Smart Cities ◽

10.3390/smartcities4010013 ◽

2021 ◽

Vol 4 (1) ◽

pp. 204-216

Author(s):

Xinyue Ye ◽

Lian Duan ◽

Qiong Peng

Keyword(s):

New York ◽

New York City ◽

Emergency Service ◽

Prediction Models ◽

Smart Cities ◽

Low Level ◽

Fine Grained ◽

Crime Prediction ◽

High Level ◽

Better Than

Spatiotemporal prediction of crime is crucial for public safety and smart cities operation. As crime incidents are distributed sparsely across space and time, existing deep-learning methods constrained by coarse spatial scale offer only limited values in prediction of crime density. This paper proposes the use of deep inception-residual networks (DIRNet) to conduct fine-grained, theft-related crime prediction based on non-emergency service request data (311 events). Specifically, it outlines the employment of inception units comprising asymmetrical convolution layers to draw low-level spatiotemporal dependencies hidden in crime events and complaint records in the 311 dataset. Afterward, this paper details how residual units can be applied to capture high-level spatiotemporal features from low-level spatiotemporal dependencies for the final prediction. The effectiveness of the proposed DIRNet is evaluated based on theft-related crime data and 311 data in New York City from 2010 to 2015. The results confirm that the DIRNet obtains an average F1 of 71%, which is better than other prediction models.

Download Full-text

Fine-grained news recommendation by fusing matrix factorization, topic analysis and knowledge graph representation

2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/smc.2017.8122727 ◽

2017 ◽

Cited By ~ 1

Author(s):

Kuai Zhang ◽

Xin Xin ◽

Pei Luo ◽

Ping Guot

Keyword(s):

Matrix Factorization ◽

Graph Representation ◽

Knowledge Graph ◽

Topic Analysis ◽

Fine Grained ◽

News Recommendation

Download Full-text

Genetic Programming based Feature Manipulation for Skin Cancer Image Classification

10.26686/wgtn.17151719.v1 ◽

2021 ◽

Author(s):

◽

~ Qurrat Ul Ain

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Image Classification ◽

State Of The Art ◽

Classification Performance ◽

Image Features ◽

Feature Construction ◽

Classification Methods ◽

Wrapper Approach ◽

High Level

<p>Skin image classification involves the development of computational methods for solving problems such as cancer detection in lesion images, and their use for biomedical research and clinical care. Such methods aim at extracting relevant information or knowledge from skin images that can significantly assist in the early detection of disease. Skin images are enormous, and come with various artifacts that hinder effective feature extraction leading to inaccurate classification. Feature selection and feature construction can significantly reduce the amount of data while improving classification performance by selecting prominent features and constructing high-level features. Existing approaches mostly rely on expert intervention and follow multiple stages for pre-processing, feature extraction, and classification, which decreases the reliability, and increases the computational complexity. Since good generalization accuracy is not always the primary objective, clinicians are also interested in analyzing specific features such as pigment network, streaks, and blobs responsible for developing the disease; interpretable methods are favored. In Evolutionary Computation, Genetic Programming (GP) can automatically evolve an interpretable model and address the curse of dimensionality (through feature selection and construction). GP has been successfully applied to many areas, but its potential for feature selection, feature construction, and classification in skin images has not been thoroughly investigated. The overall goal of this thesis is to develop a new GP approach to skin image classification by utilizing GP to evolve programs that are capable of automatically selecting prominent image features, constructing new high level features, interpreting useful image features which can help dermatologist to diagnose a type of cancer, and are robust to processing skin images captured from specialized instruments and standard cameras. This thesis focuses on utilizing a wide range of texture, color, frequency-based, local, and global image properties at the terminal nodes of GP to classify skin cancer images from multiple modalities effectively. This thesis develops new two-stage GP methods using embedded and wrapper feature selection and construction approaches to automatically generating a feature vector of selected and constructed features for classification. The results show that wrapper approach outperforms the embedded approach, the existing baseline GP and other machine learning methods, but the embedded approach is faster than the wrapper approach. This thesis develops a multi-tree GP based embedded feature selection approach for melanoma detection using domain specific and domain independent features. It explores suitable crossover and mutation operators to evolve GP classifiers effectively and further extends this approach using a weighted fitness function. The results show that these multi-tree approaches outperformed single tree GP and other classification methods. They identify that a specific feature extraction method extracts most suitable features for particular images taken from a specific optical instrument. This thesis develops the first GP method utilizing frequency-based wavelet features, where the wrapper based feature selection and construction methods automatically evolve useful constructed features to improve the classification performance. The results show the evidence of successful feature construction by significantly outperforming existing GP approaches, state-of-the-art CNN, and other classification methods. This thesis develops a GP approach to multiple feature construction for ensemble learning in classification. The results show that the ensemble method outperformed existing GP approaches, state-of-the-art skin image classification, and commonly used ensemble methods. Further analysis of the evolved constructed features identified important image features that can potentially help the dermatologist identify further medical procedures in real-world situations.</p>

Download Full-text

Content-Based Image Retrieval

Encyclopedia of Information Science and Technology, Second Edition ◽

10.4018/978-1-60566-026-4.ch121 ◽

2011 ◽

pp. 744-749

Author(s):

Alan Wee-Chung Liew ◽

Ngai-Fong Law

Keyword(s):

Image Retrieval ◽

Visual Information ◽

Image Features ◽

User Preferences ◽

Multimedia Systems ◽

Content Based Image Retrieval ◽

Top Down ◽

Low Level ◽

Feature Based ◽

High Level

With the rapid growth of Internet and multimedia systems, the use of visual information has increased enormously, such that indexing and retrieval techniques have become important. Historically, images are usually manually annotated with metadata such as captions or keywords (Chang & Hsu, 1992). Image retrieval is then performed by searching images with similar keywords. However, the keywords used may differ from one person to another. Also, many keywords can be used for describing the same image. Consequently, retrieval results are often inconsistent and unreliable. Due to these limitations, there is a growing interest in content-based image retrieval (CBIR). These techniques extract meaningful information or features from an image so that images can be classified and retrieved automatically based on their contents. Existing image retrieval systems such as QBIC and Virage extract the so-called low-level features such as color, texture and shape from an image in the spatial domain for indexing. Low-level features sometimes fail to represent high level semantic image features as they are subjective and depend greatly upon user preferences. To bridge the gap, a top-down retrieval approach involving high level knowledge can complement these low-level features. This articles deals with various aspects of CBIR. This includes bottom-up feature- based image retrieval in both the spatial and compressed domains, as well as top-down task-based image retrieval using prior knowledge.

Download Full-text