Graph-based Multimodal Ranking Models for Multimodal Summarization

Author(s):  
Junnan Zhu ◽  
Lu Xiang ◽  
Yu Zhou ◽  
Jiajun Zhang ◽  
Chengqing Zong

Multimodal summarization aims to extract the most important information from the multimedia input. It is becoming increasingly popular due to the rapid growth of multimedia data in recent years. There are various researches focusing on different multimodal summarization tasks. However, the existing methods can only generate single-modal output or multimodal output. In addition, most of them need a lot of annotated samples for training, which makes it difficult to be generalized to other tasks or domains. Motivated by this, we propose a unified framework for multimodal summarization that can cover both single-modal output summarization and multimodal output summarization. In our framework, we consider three different scenarios and propose the respective unsupervised graph-based multimodal summarization models without the requirement of any manually annotated document-summary pairs for training: (1) generic multimodal ranking, (2) modal-dominated multimodal ranking, and (3) non-redundant text-image multimodal ranking. Furthermore, an image-text similarity estimation model is introduced to measure the semantic similarity between image and text. Experiments show that our proposed models outperform the single-modal summarization methods on both automatic and human evaluation metrics. Besides, our models can also improve the single-modal summarization with the guidance of the multimedia information. This study can be applied as the benchmark for further study on multimodal summarization task.

Author(s):  
Rahmath Safeena ◽  
Abdullah Kammani

Technology adoption study has become a crucial or significant measure for understanding success or effectiveness of evolving technologies. Adoption of technology in general and Internet Banking Technology (IBT) in particular leads to decrease in coordination cost and increase in efficiency of banking process. Indian economy has experienced rapid growth over the last decade, developing Internet Banking Technology (IBT) for transforming the traditional lines of banking products and services. This shift has brought profound challenges and opportunities to both bank and its customers. The banks have utilized the potential of technology to provide new proficiencies in banking. Customers have found in IBT a new ease to do the financial transaction. However, it was observed from various literatures that there are high levels of uncertainties related to IBT adoption. This research attempts to formulate an integrated framework to investigate the factors of IBT adoption in India.


Author(s):  
S. Geisler ◽  
O. Kao

Sensing and processing of multimedia information is one of the basic traits of human beings. The development of digital technologies and applications allows the production of huge amounts of multimedia data. The rapidly decreasing prices for hardware such as digital cameras/camcorders, sound cards and the corresponding displays led to wide distribution of multimedia-capable input and output devices in all fields of the everyday life, from home entertainment to companies and educational organisations. Thus, multimedia information in terms of digital pictures, videos, and music can be created intuitively and is affordable for a broad spectrum of users.


Informatics ◽  
2019 ◽  
Vol 6 (2) ◽  
pp. 19 ◽  
Author(s):  
Rajat Pandit ◽  
Saptarshi Sengupta ◽  
Sudip Kumar Naskar ◽  
Niladri Sekhar Dash ◽  
Mohini Mohan Sardar

Semantic similarity is a long-standing problem in natural language processing (NLP). It is a topic of great interest as its understanding can provide a look into how human beings comprehend meaning and make associations between words. However, when this problem is looked at from the viewpoint of machine understanding, particularly for under resourced languages, it poses a different problem altogether. In this paper, semantic similarity is explored in Bangla, a less resourced language. For ameliorating the situation in such languages, the most rudimentary method (path-based) and the latest state-of-the-art method (Word2Vec) for semantic similarity calculation were augmented using cross-lingual resources in English and the results obtained are truly astonishing. In the presented paper, two semantic similarity approaches have been explored in Bangla, namely the path-based and distributional model and their cross-lingual counterparts were synthesized in light of the English WordNet and Corpora. The proposed methods were evaluated on a dataset comprising of 162 Bangla word pairs, which were annotated by five expert raters. The correlation scores obtained between the four metrics and human evaluation scores demonstrate a marked enhancement that the cross-lingual approach brings into the process of semantic similarity calculation for Bangla.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 109120-109132
Author(s):  
Job Isaias Quiroz-Mercado ◽  
Ricardo Barron-Fernandez ◽  
Marco Antonio Ramirez-Salinas

2011 ◽  
Vol 14 (1) ◽  
Author(s):  
Rocío L. Cecchini ◽  
Carlos M. Lorenzetti ◽  
Ana G. Maguitman ◽  
Filippo Menczer

The absence of reliable and efficient techniques to evaluate information retrieval systems has become a bottleneck in the development of novel retrieval methods. In traditional approaches users or hired evaluators provide manual assessments of relevance. However these approaches are neither efficient nor reliable since they do not scale with the complexity and heterogeneity of available digital information. Automatic approaches, on the other hand, could be efficient but disregard semantic data, which is usually important to assess the actual performance of the evaluated methods. This article proposes to use topic ontologies and semantic similarity data derived from these ontologies to implement an automatic semantic evaluation framework for information retrieval systems. The use of semantic simi- larity data allows to capture the notion of partial relevance, generalizing traditional evaluation metrics, and giving rise to novel performance measures such as semantic precision and semantic harmonic mean. The validity of the approach is supported by user studies and the application of the proposed framework is illustrated with the evaluation of topical retrieval systems. The evaluated systems include a baseline, a supervised version of the Bo1 query refinement method and two multi-objective evolutionary algorithms for context-based retrieval. Finally, we discuss the advantages of ap- plying evaluation metrics that account for semantic similarity data and partial relevance over existing metrics based on the notion of total relevance.


2020 ◽  
Vol 34 (05) ◽  
pp. 9725-9732
Author(s):  
Xiaorui Zhou ◽  
Senlin Luo ◽  
Yunfang Wu

In reading comprehension, generating sentence-level distractors is a significant task, which requires a deep understanding of the article and question. The traditional entity-centered methods can only generate word-level or phrase-level distractors. Although recently proposed neural-based methods like sequence-to-sequence (Seq2Seq) model show great potential in generating creative text, the previous neural methods for distractor generation ignore two important aspects. First, they didn't model the interactions between the article and question, making the generated distractors tend to be too general or not relevant to question context. Second, they didn't emphasize the relationship between the distractor and article, making the generated distractors not semantically relevant to the article and thus fail to form a set of meaningful options. To solve the first problem, we propose a co-attention enhanced hierarchical architecture to better capture the interactions between the article and question, thus guide the decoder to generate more coherent distractors. To alleviate the second problem, we add an additional semantic similarity loss to push the generated distractors more relevant to the article. Experimental results show that our model outperforms several strong baselines on automatic metrics, achieving state-of-the-art performance. Further human evaluation indicates that our generated distractors are more coherent and more educative compared with those distractors generated by baselines.


Author(s):  
Nora Aranberri-Monasterio ◽  
Sharon O‘Brien

-ing forms in English are reported to be problematic for Machine Transla-tion and are often the focus of rules in Controlled Language rule sets. We investigated how problematic -ing forms are for an RBMT system, translat-ing into four target languages in the IT domain. Constituent-based human evaluation was used and the results showed that, in general, -ing forms do not deserve their bad reputation. A comparison with the results of five automated MT evaluation metrics showed promising correlations. Some issues prevail, however, and can vary from target language to target lan-guage. We propose different strategies for dealing with these problems, such as Controlled Language rules, semi-automatic post-editing, source text tagging and “post-editing” the source text.


Sign in / Sign up

Export Citation Format

Share Document