scholarly journals Unsupervised Stylish Image Description Generation via Domain Layer Norm

Author(s):  
Cheng-Kuan Chen ◽  
Zhufeng Pan ◽  
Ming-Yu Liu ◽  
Min Sun

Most of the existing works on image description focus on generating expressive descriptions. The only few works that are dedicated to generating stylish (e.g., romantic, lyric, etc.) descriptions suffer from limited style variation and content digression. To address these limitations, we propose a controllable stylish image description generation model. It can learn to generate stylish image descriptions that are more related to image content and can be trained with the arbitrary monolingual corpus without collecting new paired image and stylish descriptions. Moreover, it enables users to generate various stylish descriptions by plugging in style-specific parameters to include new styles into the existing model. We achieve this capability via a novel layer normalization layer design, which we will refer to as the Domain Layer Norm (DLN). Extensive experimental validation and user study on various stylish image description generation tasks are conducted to show the competitive advantages of the proposed model.

Author(s):  
Huimin Lu ◽  
Rui Yang ◽  
Zhenrong Deng ◽  
Yonglin Zhang ◽  
Guangwei Gao ◽  
...  

Chinese image description generation tasks usually have some challenges, such as single-feature extraction, lack of global information, and lack of detailed description of the image content. To address these limitations, we propose a fuzzy attention-based DenseNet-BiLSTM Chinese image captioning method in this article. In the proposed method, we first improve the densely connected network to extract features of the image at different scales and to enhance the model’s ability to capture the weak features. At the same time, a bidirectional LSTM is used as the decoder to enhance the use of context information. The introduction of an improved fuzzy attention mechanism effectively improves the problem of correspondence between image features and contextual information. We conduct experiments on the AI Challenger dataset to evaluate the performance of the model. The results show that compared with other models, our proposed model achieves higher scores in objective quantitative evaluation indicators, including BLEU , BLEU , METEOR, ROUGEl, and CIDEr. The generated description sentence can accurately express the image content.


Author(s):  
Gangavarapu Venkata Satya Kumar ◽  
Pillutla Gopala Krishna Mohan

In diverse computer applications, the analysis of image content plays a key role. This image content might be either textual (like text appearing in the images) or visual (like shape, color, texture). These two image contents consist of image’s basic features and therefore turn out to be as the major advantage for any of the implementation. Many of the art models are based on the visual search or annotated text for Content-Based Image Retrieval (CBIR) models. There is more demand toward multitasking, a new method needs to be introduced with the combination of both textual and visual features. This paper plans to develop the intelligent CBIR system for the collection of different benchmark texture datasets. Here, a new descriptor named Information Oriented Angle-based Local Tri-directional Weber Patterns (IOA-LTriWPs) is adopted. The pattern is operated not only based on tri-direction and eight neighborhood pixels but also based on four angles [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text]. Once the patterns concerning tri-direction, eight neighborhood pixels, and four angles are taken, the best patterns are selected based on maximum mutual information. Moreover, the histogram computation of the patterns provides the final feature vector, from which the new weighted feature extraction is performed. As a new contribution, the novel weight function is optimized by the Improved MVO on random basis (IMVO-RB), in such a way that the precision and recall of the retrieved image is high. Further, the proposed model has used the logarithmic similarity called Mean Square Logarithmic Error (MSLE) between the features of the query image and trained images for retrieving the concerned images. The analyses on diverse texture image datasets have validated the accuracy and efficiency of the developed pattern over existing.


Author(s):  
Wei Zhao ◽  
Benyou Wang ◽  
Jianbo Ye ◽  
Min Yang ◽  
Zhou Zhao ◽  
...  

In this paper, we propose a Multi-task Learning Approach for Image Captioning (MLAIC ), motivated by the fact that humans have no difficulty performing such task because they possess capabilities of multiple domains. Specifically, MLAIC consists of three key components: (i) A multi-object classification model that learns rich category-aware image representations using a CNN image encoder; (ii) A syntax generation model that learns better syntax-aware LSTM based decoder; (iii) An image captioning model that generates image descriptions in text, sharing its CNN encoder and LSTM decoder with the object classification task and the syntax generation task, respectively. In particular, the image captioning model can benefit from the additional object categorization and syntax knowledge. To verify the effectiveness of our approach, we conduct extensive experiments on MS-COCO dataset. The experimental results demonstrate that our model achieves impressive results compared to other strong competitors.


Author(s):  
Yanzhen Li ◽  
Rapinder S. Sawhney ◽  
Joseph H. Wilck IV

In order to retain competitive advantages, many manufacturing organizations have applied Lean Six Sigma techniques to improve production processes. The general approach for implementing Lean Six Sigma is to perform various projects to tackle specific problems or areas. However, with the manufacturing system and its external environment becoming more and more complex, it is simply not possible to solve all the problems given the limited resources. The purpose of this chapter is to develop a model that provides a systematic evaluation for potential opportunities to enhance the effectiveness of Lean Six Sigma. Deriving from the Bayesian Network methodology, the proposed model combines a graphical approach to represent cause-and-effect relationships between events of interests and probabilistic inference to estimate their likelihoods in the area of process improvement. The developed model can be used for assessing the problems associated with Lean Six Sigma initiatives and prioritizing efforts to solve these problems.


2019 ◽  
Vol 277 ◽  
pp. 02036
Author(s):  
Yu Li ◽  
Lizhuang Liu

In this work we investigate the use of deep learning for image quality classification problem. We use a pre-trained Convolutional Neural Network (CNN) for image description, and the Support Vector Machine (SVM) model is trained as an image quality classifier whose inputs are normalized features extracted by the CNN model. We report on different design choices, ranging from the use of various CNN architectures to the use of features extracted from different layers of a CNN model. To cope with the problem of a lack of adequate amounts of distorted picture data, a novel training strategy of multi-scale training, which is selecting a new image size for training after several batches, combined with data augmentation is introduced. The experimental results tested on the actual monitoring video images shows that the proposed model can accurately classify distorted images.


2018 ◽  
Vol 8 (10) ◽  
pp. 1850 ◽  
Author(s):  
Zhibin Guan ◽  
Kang Liu ◽  
Yan Ma ◽  
Xu Qian ◽  
Tongkai Ji

Image caption generation is attractive research which focuses on generating natural language sentences to describe the visual content of a given image. It is an interdisciplinary subject combining computer vision (CV) and natural language processing (NLP). The existing image captioning methods are mainly focused on generating the final image caption directly, which may lose significant identification information of objects contained in the raw image. Therefore, we propose a new middle-level attribute-based language retouching (MLALR) method to solve this problem. Our proposed MLALR method uses the middle-level attributes predicted from the object regions to retouch the intermediate image description, which is generated by our language generation model. The advantage of our MLALR method is that it can correct descriptive errors in the intermediate image description and make the final image caption more accurate. Moreover, evaluation using benchmark datasets—MSCOCO, Flickr8K, and Flickr30K—validated the impressive performance of our MLALR method with evaluation metrics—BLEU, METEOR, ROUGE-L, CIDEr, and SPICE.


2020 ◽  
Vol 12 (6) ◽  
pp. 2241 ◽  
Author(s):  
Muhammad Umar Afzaal ◽  
Intisar Ali Sajjad ◽  
Ahmed Bilal Awan ◽  
Kashif Nisar Paracha ◽  
Muhammad Faisal Nadeem Khan ◽  
...  

Around the world, countries are integrating photovoltaic generating systems to the grid to support climate change initiatives. However, solar power generation is highly uncertain due to variations in solar irradiance level during different hours of the day. Inaccurate modelling of this variability can lead to non-optimal dispatch of system resources. Therefore, accurate characterization of solar irradiance patterns is essential for effective management of renewable energy resources in an electrical power grid. In this paper, the Weibull distribution based probabilistic model is presented for characterization of solar irradiance patterns. Firstly, Weibull distribution is utilized to model inter-temporal variations associated with reference solar irradiance data through moving window averaging technique, and then the proposed model is used for irradiance pattern generation. To achieve continuity of discrete Weibull distribution parameters calculated at different steps of moving window, Generalized Regression Neural Network (GRNN) is employed. Goodness of Fit (GOF) techniques are used to calculate the error between mean and standard deviation of generated and reference patterns. The comparison of GOF results with the literature shows that the proposed model has improved performance. The presented model can be used for power system planning studies where the uncertainty of different resources such as generation, load, network, etc., needs to be considered for their better management.


2010 ◽  
Vol 09 (05) ◽  
pp. 759-778 ◽  
Author(s):  
O. O. OLUGBARA ◽  
S. O. OJO ◽  
M. I. MPHAHLELE

This paper demonstrates how image content can be used to realize a location-based shopping recommender system for intuitively supporting mobile users in decision making. Generic Fourier Descriptors (GFD) image content of an item was extracted to exploit knowledge contained in item and user profile databases for learning to rank recommendations. Analytic Hierarchy Process (AHP) was used to automatically select a query item from a user profile. Single Criterion Decision Ranking (SCDR) and Multiple-Criteria Decision-Ranking (MCDR) techniques were compared to study the effect of multidimensional ratings of items on recommendations effectiveness. The SCDR and MCDR techniques are, respectively, based on Image Content Similarity Score (ICSS) and Relative Ratio (RR) aggregating function. Experimental results of a real user study showed that an MCDR system increases user satisfaction and improves recommendations effectiveness better than an SCDR system.


1998 ◽  
Vol 16 (4) ◽  
pp. 441-449 ◽  
Author(s):  
B. A. Shand ◽  
M. Lester ◽  
T. K. Yeoman

Abstract. Substorm-associated radar auroral surges (SARAS) are a short lived (15–90 minutes) and spatially localised (~5° of latitude) perturbation of the plasma convection pattern observed within the auroral E-region. The understanding of such phenomena has important ramifications for the investigation of the larger scale plasma convection and ultimately the coupling of the solar wind, magnetosphere and ionosphere system. A statistical investigation is undertaken of SARAS, observed by the Sweden And Britain Radar Experiment (SABRE), in order to provide a more extensive examination of the local time occurrence and propagation characteristics of the events. The statistical analysis has determined a local time occurrence of observations between 1420 MLT and 2200 MLT with a maximum occurrence centred around 1700 MLT. The propagation velocity of the SARAS feature through the SABRE field of view was found to be predominately L-shell aligned with a velocity centred around 1750 m s–1 and within the range 500 m s–1 and 3500 m s–1. This comprehensive examination of the SARAS provides the opportunity to discuss, qualitatively, a possible generation mechanism for SARAS based on a proposed model for the production of a similar phenomenon referred to as sub-auroral ion drifts (SAIDs). The results of the comparison suggests that SARAS may result from a similar geophysical mechanism to that which produces SAID events, but probably occurs at a different time in the evolution of the event.Key words. Substorms · Auroral surges · Plasma con-vection · Sub-auroral ion drifts


Sign in / Sign up

Export Citation Format

Share Document