Searching for memory-lighter architectures for OCR-augmented image captioning

2021 ◽  
pp. 1-12
Author(s):  
Rafael Gallardo García ◽  
Beatriz Beltrán Martínez ◽  
Carlos Hernández Gracidas ◽  
Darnes Vilariño Ayala

Current State-of-the-Art image captioning systems that can read and integrate read text into the generated descriptions need high processing power and memory usage, which limits the sustainability and usability of the models (as they require expensive and very specialized hardware). The present work introduces two alternative versions (L-M4C and L-CNMT) of top architectures (on the TextCaps challenge), which were mainly adapted to achieve near-State-of-The-Art performance while being memory-lighter when compared to the original architectures, this is mainly achieved by using distilled or smaller pre-trained models on the text-and-OCR embedding modules. On the one hand, a distilled version of BERT was used in order to reduce the size of the text-embedding module (the distilled model has 59% fewer parameters), on the other hand, the OCR context processor on both architectures was replaced by Global Vectors (GloVe), instead of using FastText pre-trained vectors, this can reduce the memory used by the OCR-embedding module up to a 94% . Two of the three models presented in this work surpassed the baseline (M4C-Captioner) of the challenge on the evaluation and test sets, also, our best lighter architecture reached a CIDEr score of 88.24 on the test set, which is 7.25 points above the baseline model.

2020 ◽  
Vol 34 (05) ◽  
pp. 9523-9530
Author(s):  
Junlang Zhan ◽  
Hai Zhao

Open Information Extraction (Open IE) is a challenging task especially due to its brittle data basis. Most of Open IE systems have to be trained on automatically built corpus and evaluated on inaccurate test set. In this work, we first alleviate this difficulty from both sides of training and test sets. For the former, we propose an improved model design to more sufficiently exploit training dataset. For the latter, we present our accurately re-annotated benchmark test set (Re-OIE2016) according to a series of linguistic observation and analysis. Then, we introduce a span model instead of previous adopted sequence labeling formulization for n-ary Open IE. Our newly introduced model achieves new state-of-the-art performance on both benchmark evaluation datasets.


2018 ◽  
pp. 41-63 ◽  
Author(s):  
Francesca Helm

The long and winding road is a metaphor for a journey, often used to describe life journeys and the challenges encountered. The metaphor was used for the title of my keynote to refer both to the journey towards the current position of virtual exchange in education policy – but also the long road ahead. This paper aims to explore the emergence of virtual exchange in educational policy and how it has been adopted by non-profit organisations, educational institutions, and policy makers to address geo- and socio-political tensions. Though still a relatively new field, in recent years there have been some important developments in terms of policy statements and public investments in virtual exchange. The paper starts by looking at the current state-of-the-art in terms of virtual exchange in education policy and initiatives in Europe. Then, using an approach based on ‘episode studies’ from the policy literature, the paper explores the main virtual exchange schemes and initiatives that have drawn the attention of European policy makers. The paper closes by looking at some of the lessons we have learnt from research on the practice of virtual exchange, and how this can inform us as we face the long road ahead of us. The focus of this paper is on the European context not because I assume it to be the most important or influential, but rather because it is the one I know best, since it is the context in which I have been working.


2020 ◽  
Vol 34 (05) ◽  
pp. 9354-9361
Author(s):  
Kun Xu ◽  
Linfeng Song ◽  
Yansong Feng ◽  
Yan Song ◽  
Dong Yu

Existing entity alignment methods mainly vary on the choices of encoding the knowledge graph, but they typically use the same decoding method, which independently chooses the local optimal match for each source entity. This decoding method may not only cause the “many-to-one” problem but also neglect the coordinated nature of this task, that is, each alignment decision may highly correlate to the other decisions. In this paper, we introduce two coordinated reasoning methods, i.e., the Easy-to-Hard decoding strategy and joint entity alignment algorithm. Specifically, the Easy-to-Hard strategy first retrieves the model-confident alignments from the predicted results and then incorporates them as additional knowledge to resolve the remaining model-uncertain alignments. To achieve this, we further propose an enhanced alignment model that is built on the current state-of-the-art baseline. In addition, to address the many-to-one problem, we propose to jointly predict entity alignments so that the one-to-one constraint can be naturally incorporated into the alignment prediction. Experimental results show that our model achieves the state-of-the-art performance and our reasoning methods can also significantly improve existing baselines.


Sensors ◽  
2018 ◽  
Vol 18 (9) ◽  
pp. 2983 ◽  
Author(s):  
Tiago Oliveira ◽  
Ana Silva ◽  
Ken Satoh ◽  
Vicente Julian ◽  
Pedro Leão ◽  
...  

Prediction in health care is closely related with the decision-making process. On the one hand, accurate survivability prediction can help physicians decide between palliative care or other practice for a patient. On the other hand, the notion of remaining lifetime can be an incentive for patients to live a fuller and more fulfilling life. This work presents a pipeline for the development of survivability prediction models and a system that provides survivability predictions for years one to five after the treatment of patients with colon or rectal cancer. The functionalities of the system are made available through a tool that balances the number of necessary inputs and prediction performance. It is mobile-friendly and facilitates the access of health care professionals to an instrument capable of enriching their practice and improving outcomes. The performance of survivability models was compared with other existing works in the literature and found to be an improvement over the current state of the art. The underlying system is capable of recalculating its prediction models upon the addition of new data, continuously evolving as time passes.


1990 ◽  
Vol 5 (4) ◽  
pp. 225-249 ◽  
Author(s):  
Ann Copestake ◽  
Karen Sparck Jones

AbstractThis paper reviews the current state of the art in natural language access to databases. This has been a long-standing area of work in natural language processing. But though some commercial systems are now available, providing front ends has proved much harder than was expected, and the necessary limitations on front ends have to be recognized. The paper discusses the issues, both general to language and task-specific, involved in front end design, and the way these have been addressed, concentrating on the work of the last decade. The focus is on the central process of translating a natural language question into a database query, but other supporting functions are also covered. The points are illustrated by the use of a single example application. The paper concludes with an evaluation of the current state, indicating that future progress will depend on the one hand on general advances in natural language processing, and on the other on expanding the capabilities of traditional databases.


2021 ◽  
Vol 11 (24) ◽  
pp. 11635
Author(s):  
Raymond Ian Osolo ◽  
Zhan Yang ◽  
Jun Long

In the quest to make deep learning systems more capable, a number of more complex, more computationally expensive and memory intensive algorithms have been proposed. This switchover glosses over the capabilities of many of the simpler systems or modules within them to adequately address current and future problems. This has led to some of the deep learning research being inaccessible to researchers who don’t possess top-of-the-line hardware. The use of simple feed forward networks has not been explicitly explored in the current transformer-based vision-language field. In this paper, we use a series of feed-forward layers to encode image features, and caption embeddings, alleviating some of the effects of the computational complexities that accompany the use of the self-attention mechanism and limit its application in long sequence task scenarios. We demonstrate that a decoder does not require masking for conditional short sequence generation where the task is not only dependent on the previously generated sequence, but another input such as image features. We perform an empirical and qualitative analysis of the use of linear transforms in place of self-attention layers in vision-language models, and obtain competitive results on the MSCOCO dataset. Our best feed-forward model obtains average scores of over 90% of the current state-of-the-art pre-trained Oscar model in the conventional image captioning metrics. We also demonstrate that the proposed models take less time training and use less memory at larger batch sizes and longer sequence lengths.


2020 ◽  
Vol 6 (6) ◽  
pp. 41 ◽  
Author(s):  
Björn Barz ◽  
Joachim Denzler

The CIFAR-10 and CIFAR-100 datasets are two of the most heavily benchmarked datasets in computer vision and are often used to evaluate novel methods and model architectures in the field of deep learning. However, we find that 3.3% and 10% of the images from the test sets of these datasets have duplicates in the training set. These duplicates are easily recognizable by memorization and may, hence, bias the comparison of image recognition techniques regarding their generalization capability. To eliminate this bias, we provide the “fair CIFAR” (ciFAIR) dataset, where we replaced all duplicates in the test sets with new images sampled from the same domain. The training set remains unchanged, in order not to invalidate pre-trained models. We then re-evaluate the classification performance of various popular state-of-the-art CNN architectures on these new test sets to investigate whether recent research has overfitted to memorizing data instead of learning abstract concepts. We find a significant drop in classification accuracy of between 9% and 14% relative to the original performance on the duplicate-free test set. We make both the ciFAIR dataset and pre-trained models publicly available and furthermore maintain a leaderboard for tracking the state of the art.


10.37236/35 ◽  
2013 ◽  
Vol 1000 ◽  
Author(s):  
Mirka Miller ◽  
Jozef Sirán

The degree/diameter problem is to determine the largest graphs or digraphs of given maximum degree and given diameter.General upper bounds - called Moore bounds - for the order of such graphs and digraphs are attainable only for certain special graphs and digraphs. Finding better (tighter) upper bounds for the maximum possible number of vertices, given the other two parameters, and thus attacking the degree/diameter problem 'from above', remains a largely unexplored area. Constructions producing large graphs and digraphs of given degree and diameter represent a way of attacking the degree/diameter problem 'from below'.This survey aims to give an overview of the current state-of-the-art of the degree/diameter problem. We focus mainly on the above two streams of research. However, we could not resist mentioning also results on various related problems. These include considering Moore-like bounds for special types of graphs and digraphs, such as vertex-transitive, Cayley, planar, bipartite, and many others, on the one hand, and related properties such as connectivity, regularity, and surface embeddability, on the other hand.


1995 ◽  
Vol 38 (5) ◽  
pp. 1126-1142 ◽  
Author(s):  
Jeffrey W. Gilger

This paper is an introduction to behavioral genetics for researchers and practioners in language development and disorders. The specific aims are to illustrate some essential concepts and to show how behavioral genetic research can be applied to the language sciences. Past genetic research on language-related traits has tended to focus on simple etiology (i.e., the heritability or familiality of language skills). The current state of the art, however, suggests that great promise lies in addressing more complex questions through behavioral genetic paradigms. In terms of future goals it is suggested that: (a) more behavioral genetic work of all types should be done—including replications and expansions of preliminary studies already in print; (b) work should focus on fine-grained, theory-based phenotypes with research designs that can address complex questions in language development; and (c) work in this area should utilize a variety of samples and methods (e.g., twin and family samples, heritability and segregation analyses, linkage and association tests, etc.).


1976 ◽  
Vol 21 (7) ◽  
pp. 497-498
Author(s):  
STANLEY GRAND

Sign in / Sign up

Export Citation Format

Share Document