scholarly journals Multimodal Summarization with Guidance of Multimodal Reference

2020 ◽  
Vol 34 (05) ◽  
pp. 9749-9756
Author(s):  
Junnan Zhu ◽  
Yu Zhou ◽  
Jiajun Zhang ◽  
Haoran Li ◽  
Chengqing Zong ◽  
...  

Multimodal summarization with multimodal output (MSMO) is to generate a multimodal summary for a multimodal news report, which has been proven to effectively improve users' satisfaction. The existing MSMO methods are trained by the target of text modality, leading to the modality-bias problem that ignores the quality of model-selected image during training. To alleviate this problem, we propose a multimodal objective function with the guidance of multimodal reference to use the loss from the summary generation and the image selection. Due to the lack of multimodal reference data, we present two strategies, i.e., ROUGE-ranking and Order-ranking, to construct the multimodal reference by extending the text reference. Meanwhile, to better evaluate multimodal outputs, we propose a novel evaluation metric based on joint multimodal representation, projecting the model output and multimodal reference into a joint semantic space during evaluation. Experimental results have shown that our proposed model achieves the new state-of-the-art on both automatic and manual evaluation metrics. Besides, our proposed evaluation method can effectively improve the correlation with human judgments.

2019 ◽  
Vol 9 (13) ◽  
pp. 2684 ◽  
Author(s):  
Hongyang Li ◽  
Lizhuang Liu ◽  
Zhenqi Han ◽  
Dan Zhao

Peeling fibre is an indispensable process in the production of preserved Szechuan pickle, the accuracy of which can significantly influence the quality of the products, and thus the contour method of fibre detection, as a core algorithm of the automatic peeling device, is studied. The fibre contour is a kind of non-salient contour, characterized by big intra-class differences and small inter-class differences, meaning that the feature of the contour is not discriminative. The method called dilated-holistically-nested edge detection (Dilated-HED) is proposed to detect the fibre contour, which is built based on the HED network and dilated convolution. The experimental results for our dataset show that the Pixel Accuracy (PA) is 99.52% and the Mean Intersection over Union (MIoU) is 49.99%, achieving state-of-the-art performance.


2011 ◽  
Vol 1 ◽  
pp. 375-380
Author(s):  
Shu Ai Wan ◽  
Kai Fang Yang ◽  
Hai Yong Zhou

In this paper the important issue of multimedia quality evaluation is concerned, given the unimodal quality of audio and video. Firstly, the quality integration model recommended in G.1070 is evaluated using experimental results. Theoretical analyses aide empirical observations suggest that the constant coefficients used in the G.1070 model should actually be piecewise adjusted for different levels of audio and visual quality. Then a piecewise function is proposed to perform multimedia quality integration under different levels of the audio and visual quality. Performance gain observed from experimental results substantiates the effectiveness of the proposed model.


2019 ◽  
Vol 9 (18) ◽  
pp. 3908 ◽  
Author(s):  
Jintae Kim ◽  
Shinhyeok Oh ◽  
Oh-Woog Kwon ◽  
Harksoo Kim

To generate proper responses to user queries, multi-turn chatbot models should selectively consider dialogue histories. However, previous chatbot models have simply concatenated or averaged vector representations of all previous utterances without considering contextual importance. To mitigate this problem, we propose a multi-turn chatbot model in which previous utterances participate in response generation using different weights. The proposed model calculates the contextual importance of previous utterances by using an attention mechanism. In addition, we propose a training method that uses two types of Wasserstein generative adversarial networks to improve the quality of responses. In experiments with the DailyDialog dataset, the proposed model outperformed the previous state-of-the-art models based on various performance measures.


Author(s):  
Ziming Li ◽  
Julia Kiseleva ◽  
Maarten De Rijke

The performance of adversarial dialogue generation models relies on the quality of the reward signal produced by the discriminator. The reward signal from a poor discriminator can be very sparse and unstable, which may lead the generator to fall into a local optimum or to produce nonsense replies. To alleviate the first problem, we first extend a recently proposed adversarial dialogue generation method to an adversarial imitation learning solution. Then, in the framework of adversarial inverse reinforcement learning, we propose a new reward model for dialogue generation that can provide a more accurate and precise reward signal for generator training. We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.


2018 ◽  
Vol 2018 ◽  
pp. 1-16 ◽  
Author(s):  
Lei He ◽  
Yan Xing ◽  
Kangxiong Xia ◽  
Jieqing Tan

In view of the drawback of most image inpainting algorithms by which texture was not prominent, an adaptive inpainting algorithm based on continued fractions was proposed in this paper. In order to restore every damaged point, the information of known pixel points around the damaged point was used to interpolate the intensity of the damaged point. The proposed method included two steps; firstly, Thiele’s rational interpolation combined with the mask image was used to interpolate adaptively the intensities of damaged points to get an initial repaired image, and then Newton-Thiele’s rational interpolation was used to refine the initial repaired image to get a final result. In order to show the superiority of the proposed algorithm, plenty of experiments were tested on damaged images. Subjective evaluation and objective evaluation were used to evaluate the quality of repaired images, and the objective evaluation was comparison of Peak Signal to Noise Ratios (PSNRs). The experimental results showed that the proposed algorithm had better visual effect and higher Peak Signal to Noise Ratio compared with the state-of-the-art methods.


Author(s):  
Wei Li ◽  
Haiyu Song ◽  
Hongda Zhang ◽  
Houjie Li ◽  
Pengjie Wang

The ever-increasing size of images has made automatic image annotation one of the most important tasks in the fields of machine learning and computer vision. Despite continuous efforts in inventing new annotation algorithms and new models, results of the state-of-the-art image annotation methods are often unsatisfactory. In this paper, to further improve annotation refinement performance, a novel approach based on weighted mutual information to automatically refine the original annotations of images is proposed. Unlike the traditional refinement model using only visual feature, the proposed model use semantic embedding to properly map labels and visual features to a meaningful semantic space. To accurately measure the relevance between the particular image and its original annotations, the proposed model utilize all available information including image-to-image, label-to-label and image-to-label. Experimental results conducted on three typical datasets show not only the validity of the refinement, but also the superiority of the proposed algorithm over existing ones. The improvement largely benefits from our proposed mutual information method and utilizing all available information.


Organizational decisions are based on data-based-analysis and predictions. Effective decisions require accurate predictions, which in-turn depend on the quality of the data. Real time data is prone to inconsistencies, which exhibit negative impacts on the quality of the predictions. This mandates the need for data imputation techniques. This work presents a prediction-based data imputation technique, Rank Based Multivariate Imputation (RBMI) that operates on multivariate data. The proposed model is composed of the ranking phase and the imputation phase. Ranking dictates, the attribute order in which imputation is to be performed. The proposed model utilizes tree-based approach for the actual imputation process. Experiments were performed on Pima, a diabetes dataset. The data was amputed in range between 5% - 30%. The obtained results were compared with existing state-of-the-art models in terms of MAE and MSE levels. The proposed RBMI model exhibits a reduction of 0.03 in MAE levels and 0.001 in MSE levels.


Author(s):  
Penghui Wei ◽  
Wenji Mao ◽  
Guandan Chen

Analyzing public attitudes plays an important role in opinion mining systems. Stance detection aims to determine from a text whether its author is in favor of, against, or neutral towards a given target. One challenge of this task is that a text may not explicitly express an attitude towards the target, but existing approaches utilize target content alone to build models. Moreover, although weakly supervised approaches have been proposed to ease the burden of manually annotating largescale training data, such approaches are confronted with noisy labeling problem. To address the above two issues, in this paper, we propose a Topic-Aware Reinforced Model (TARM) for weakly supervised stance detection. Our model consists of two complementary components: (1) a detection network that incorporates target-related topic information into representation learning for identifying stance effectively; (2) a policy network that learns to eliminate noisy instances from auto-labeled data based on off-policy reinforcement learning. Two networks are alternately optimized to improve each other’s performances. Experimental results demonstrate that our proposed model TARM outperforms the state-of-the-art approaches.


Author(s):  
Qunsheng Ruan ◽  
Qingfeng Wu ◽  
Junfeng Yao ◽  
Yingdong Wang ◽  
Hsien-Wei Tseng ◽  
...  

In the intelligently processing of the tongue image, one of the most important tasks is to accurately segment the tongue body from a whole tongue image, and the good quality of tongue body edge processing is of great significance for the relevant tongue feature extraction. To improve the performance of the segmentation model for tongue images, we propose an efficient tongue segmentation model based on U-Net. Three important studies are launched, including optimizing the model’s main network, innovating a new network to specially handle tongue edge cutting and proposing a weighted binary cross-entropy loss function. The purpose of optimizing the tongue image main segmentation network is to make the model recognize the foreground and background features for the tongue image as well as possible. A novel tongue edge segmentation network is used to focus on handling the tongue edge because the edge of the tongue contains a number of important information. Furthermore, the advantageous loss function proposed is to be adopted to enhance the pixel supervision corresponding to tongue images. Moreover, thanks to a lack of tongue image resources on Traditional Chinese Medicine (TCM), some special measures are adopted to augment training samples. Various comparing experiments on two datasets were conducted to verify the performance of the segmentation model. The experimental results indicate that the loss rate of our model converges faster than the others. It is proved that our model has better stability and robustness of segmentation for tongue image from poor environment. The experimental results also indicate that our model outperforms the state-of-the-art ones in aspects of the two most important tongue image segmentation indexes: IoU and Dice. Moreover, experimental results on augmentation samples demonstrate our model have better performances.


2019 ◽  
Vol 9 (24) ◽  
pp. 5427 ◽  
Author(s):  
Beomjun Kim ◽  
Sungwon Kang ◽  
Seonah Lee

For software maintenance, bug reports provide useful information to developers because they can be used for various tasks such as debugging and understanding previous changes. However, as they are typically written in the form of conversations among developers, bug reports tend to be unnecessarily long and verbose, with the consequence that developers often have difficulties reading or understanding bug reports. To mitigate this problem, methods that automatically generate a summary of bug reports have been proposed, and various related studies have been conducted. However, existing bug report summarization methods have not fully exploited the inherent characteristics of bug reports. In this paper, we propose a bug report summarization method that uses the weighted-PageRank algorithm and exploits the 'duplicates’, ‘blocks’, and ‘depends-on’ relationships between bug reports. The experimental results show that our method outperforms the state-of-the-art method in terms of both the quality of the summary and the number of applicable bug reports.


Sign in / Sign up

Export Citation Format

Share Document