textual description
Recently Published Documents


TOTAL DOCUMENTS

103
(FIVE YEARS 50)

H-INDEX

6
(FIVE YEARS 1)

Author(s):  
Santosh Kumar Mishra ◽  
Gaurav Rai ◽  
Sriparna Saha ◽  
Pushpak Bhattacharyya

Image captioning refers to the process of generating a textual description that describes objects and activities present in a given image. It connects two fields of artificial intelligence, computer vision, and natural language processing. Computer vision and natural language processing deal with image understanding and language modeling, respectively. In the existing literature, most of the works have been carried out for image captioning in the English language. This article presents a novel method for image captioning in the Hindi language using encoder–decoder based deep learning architecture with efficient channel attention. The key contribution of this work is the deployment of an efficient channel attention mechanism with bahdanau attention and a gated recurrent unit for developing an image captioning model in the Hindi language. Color images usually consist of three channels, namely red, green, and blue. The channel attention mechanism focuses on an image’s important channel while performing the convolution, which is basically to assign higher importance to specific channels over others. The channel attention mechanism has been shown to have great potential for improving the efficiency of deep convolution neural networks (CNNs). The proposed encoder–decoder architecture utilizes the recently introduced ECA-NET CNN to integrate the channel attention mechanism. Hindi is the fourth most spoken language globally, widely spoken in India and South Asia; it is India’s official language. By translating the well-known MSCOCO dataset from English to Hindi, a dataset for image captioning in Hindi is manually created. The efficiency of the proposed method is compared with other baselines in terms of Bilingual Evaluation Understudy (BLEU) scores, and the results obtained illustrate that the method proposed outperforms other baselines. The proposed method has attained improvements of 0.59%, 2.51%, 4.38%, and 3.30% in terms of BLEU-1, BLEU-2, BLEU-3, and BLEU-4 scores, respectively, with respect to the state-of-the-art. Qualities of the generated captions are further assessed manually in terms of adequacy and fluency to illustrate the proposed method’s efficacy.


2021 ◽  
pp. 1-19
Author(s):  
Marcella Cornia ◽  
Lorenzo Baraldi ◽  
Rita Cucchiara

Image Captioning is the task of translating an input image into a textual description. As such, it connects Vision and Language in a generative fashion, with applications that range from multi-modal search engines to help visually impaired people. Although recent years have witnessed an increase in accuracy in such models, this has also brought increasing complexity and challenges in interpretability and visualization. In this work, we focus on Transformer-based image captioning models and provide qualitative and quantitative tools to increase interpretability and assess the grounding and temporal alignment capabilities of such models. Firstly, we employ attribution methods to visualize what the model concentrates on in the input image, at each step of the generation. Further, we propose metrics to evaluate the temporal alignment between model predictions and attribution scores, which allows measuring the grounding capabilities of the model and spot hallucination flaws. Experiments are conducted on three different Transformer-based architectures, employing both traditional and Vision Transformer-based visual features.


2021 ◽  
Author(s):  
Jean Peccoud

Abstract Sharing research data is an integral part of the scientific publishing process. By sharing data authors enable their readers to use their results in a way that the textual description of the results does not allow by itself. In order to achieve this objective, data should be shared in a way that makes it as easy as possible for readers to import them in computer software where they can be viewed, manipulated, and analyzed. Many authors and reviewers seem to misunderstand the purpose of the data sharing policies developed by journals. Rather than being an administrative burden that authors should comply with to get published, the objective of these policies is to help authors maximize the impact of their work by allowing other members of the scientific community to build upon it. Authors and reviewers need to understand the purpose of data sharing policies to assist editors and publishers in their efforts to ensure that every article published complies with them.


Earth ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 605-621
Author(s):  
Maria Gkeli ◽  
Chryssy Potsiou ◽  
Sofia Soile ◽  
Giorgos Vathiotis ◽  
Maria-Eleni Cravariti

In most countries, three-dimensional (3D) property units are registered utilizing two-dimensional (2D) documentation and textual description. This approach has several limitations as it is unable to represent the actual extent of complicated 3D property units in space. As traditional procedures often lead to increased costs and long delays in 2D cadastral surveying, a fast, cost-effective, and reliable solution is needed to cope with the remaining global cadastral surveying needs. Crowdsourcing has claimed a critical role as a reliable methodology with huge potential regarding the realization of 2D and 3D cadastral registration in both an affordable and a timely manner. Many large modern constructions are now planned and constructed based on BIM technology all over the world. The utilization of 3D digital models, such as building information models (BIMs), and the establishment of a connection with the international standard of the Land Administration Domain Model (LADM) could be a solution for the rapid integration of these units into a 3D crowdsourced cadaster with a better representation of the cadastral boundaries of these units, a detailed visualization of complex infrastructures, and an enhancement in the interoperability between different parties and organizations. In this paper, the potential linkage between the BIM, the LADM, and crowdsourcing techniques is investigated in order to provide an effective technical solution for the integration of large new constructions into 3D crowdsourced cadastral surveys. The proposed framework is tested on a building block in Athens, Greece. The potential, perspectives, and reliability of such an implementation are assessed and discussed.


2021 ◽  
Author(s):  
Maxwell Adam Levinson ◽  
Justin Niestroy ◽  
Sadnan Al Manir ◽  
Karen Fairchild ◽  
Douglas E. Lake ◽  
...  

AbstractResults of computational analyses require transparent disclosure of their supporting resources, while the analyses themselves often can be very large scale and involve multiple processing steps separated in time. Evidence for the correctness of any analysis should include not only a textual description, but also a formal record of the computations which produced the result, including accessible data and software with runtime parameters, environment, and personnel involved. This article describes FAIRSCAPE, a reusable computational framework, enabling simplified access to modern scalable cloud-based components. FAIRSCAPE fully implements the FAIR data principles and extends them to provide fully FAIR Evidence, including machine-interpretable provenance of datasets, software and computations, as metadata for all computed results. The FAIRSCAPE microservices framework creates a complete Evidence Graph for every computational result, including persistent identifiers with metadata, resolvable to the software, computations, and datasets used in the computation; and stores a URI to the root of the graph in the result’s metadata. An ontology for Evidence Graphs, EVI (https://w3id.org/EVI), supports inferential reasoning over the evidence. FAIRSCAPE can run nested or disjoint workflows and preserves provenance across them. It can run Apache Spark jobs, scripts, workflows, or user-supplied containers. All objects are assigned persistent IDs, including software. All results are annotated with FAIR metadata using the evidence graph model for access, validation, reproducibility, and re-use of archived data and software.


2021 ◽  
Vol 9 (7) ◽  
pp. 27
Author(s):  
Alessia Eletta Coppi ◽  
Alberto Cattaneo

Professional beauticians regularly perform skin analyses and should be skilled at observing small skin anomalies and skin damages. However, little is done directly for improving their observation skills at school, during their training. In order to foster apprentices’ observation skills of skin anomalies, a new training scheme exploiting annotations and attention-guiding methods through the use of the platform Realto was developed and tested.A second year class of apprentice beauticians (N=9) was given a pre-test on the visual expertise of skin anomalies. Then, for a semester they attended multiple training sessions where firstly teachers explained skin anomaly images with the help of annotations (attention-guiding) and secondly, students were asked to observe other images of the same anomalies, annotate them, and then provide a textual description of the identified anomaly.At the end of the semester, the trained class completed a post-test and a questionnaire, and group interview have been collected. Another group of apprentices (N=19) of the third year who already completed the skin anomaly course was used as baseline. Results showed that the group who participated in the treatment mentioned almost double the amount of details noticed by the baseline group in the post-test; being trained using annotations and the Realto platform proved effective in developing observation skills, compared to normal lessons. Furthermore, apprentices confirmed, both through the questionnaire and the interviews, that they considered annotations useful for improving their observation skills and that using Realto and its annotation facilities was a good way to achieve this result.


2021 ◽  
Vol 30 (4) ◽  
pp. 1-29
Author(s):  
Philipp Paulweber ◽  
Georg Simhandl ◽  
Uwe Zdun

Abstract State Machine (ASM) theory is a well-known state-based formal method. As in other state-based formal methods, the proposed specification languages for ASMs still lack easy-to-comprehend abstractions to express structural and behavioral aspects of specifications. Our goal is to investigate object-oriented abstractions such as interfaces and traits for ASM-based specification languages. We report on a controlled experiment with 98 participants to study the specification efficiency and effectiveness in which participants needed to comprehend an informal specification as problem (stimulus) in form of a textual description and express a corresponding solution in form of a textual ASM specification using either interface or trait syntax extensions. The study was carried out with a completely randomized design and one alternative (interface or trait) per experimental group. The results indicate that specification effectiveness of the traits experiment group shows a better performance compared to the interfaces experiment group, but specification efficiency shows no statistically significant differences. To the best of our knowledge, this is the first empirical study studying the specification effectiveness and efficiency of object-oriented abstractions in the context of formal methods.


2021 ◽  
Vol 3 (7) ◽  
Author(s):  
Wael F. Youssef ◽  
Siba Haidar ◽  
Philippe Joly

AbstractThe purpose of our work is to automatically generate textual video description schemas from surveillance video scenes compatible with police incidents reports. Our proposed approach is based on a generic and flexible context-free ontology. The general schema is of the form [actuator] [action] [over/with] [actuated object] [+ descriptors: distance, speed, etc.]. We focus on scenes containing exactly two objects. Through elaborated steps, we generate a formatted textual description. We try to identify the existence of an interaction between the two objects, including remote interaction which does not involve physical contact and we point out when aggressivity took place in these cases. We use supervised deep learning to classify scenes into interaction or no-interaction classes and then into subclasses. The chosen descriptors used to represent subclasses are keys in surveillance systems that help generate live alerts and facilitate offline investigation.


Sign in / Sign up

Export Citation Format

Share Document