An empirically-based spatial segmentation and coreference annotation scheme for comics

2021 ◽  
Author(s):  
Lauren Edlin ◽  
Joshua Reiss
Corpora ◽  
2020 ◽  
Vol 15 (2) ◽  
pp. 125-140
Author(s):  
Yukiko Ohashi ◽  
Noriaki Katagiri ◽  
Katsutoshi Oka ◽  
Michiko Hanada

This paper reports on two research results: ( 1) designing an English for Specific Purposes (esp) corpus architecture complete with annotations structured by regular expressions; and ( 2) a case study to test the design to cater for creating a specific vocabulary list using the compiled corpus. The first half of this study involved designing a precisely structured esp corpus from 190 veterinary medical charts with a hierarchy of the data. The data hierarchy in the corpus consists of document types, outline elements and inline elements, such as species and breed. Perl scripts extracted the data attached to veterinary-specific categories, and the extraction led to creating wordlists. The second part of the research tested the corpus mode, creating a list of commonly observed lexical items in veterinary medicine. The coverage rate of the wordlists by General Service List (gsl) and Academic Word List (awl) was tested, with the result that 66.4 percent of all lexical items appeared in gsl and awl, whereas 33.7 percent appeared in none of those lists. The corpus compilation procedures as well as the annotation scheme introduced in this study enable the compilation of specific corpora with explicit annotations, allowing teachers to have access to data required for creating esp classroom materials.


2020 ◽  
Author(s):  
Sarah Delanys ◽  
Farah Benamara ◽  
Véronique Moriceau ◽  
François Olivier ◽  
Josiane Mothe

BACKGROUND With the advent of digital technology and specifically user generated contents in social media, new ways emerged for studying possible stigma of people in relation with mental health. Several pieces of work studied the discourse conveyed about psychiatric pathologies on Twitter considering mostly tweets in English and a limited number of psychiatric disorders terms. This paper proposes the first study to analyze the use of a wide range of psychiatric terms in tweets in French. OBJECTIVE Our aim is to study how generic, nosographic and therapeutic psychiatric terms are used on Twitter in French. More specifically, our study has three complementary goals: (1) to analyze the types of psychiatric word use namely medical, misuse, irrelevant, (2) to analyze the polarity conveyed in the tweets that use these terms (positive/negative/neural), and (3) to compare the frequency of these terms to those observed in related work (mainly in English ). METHODS Our study has been conducted on a corpus of tweets in French posted between 01/01/2016 to 12/31/2018 and collected using dedicated keywords. The corpus has been manually annotated by clinical psychiatrists following a multilayer annotation scheme that includes the type of word use and the opinion orientation of the tweet. Two analysis have been performed. First a qualitative analysis to measure the reliability of the produced manual annotation, then a quantitative analysis considering mainly term frequency in each layer and exploring the interactions between them. RESULTS One of the first result is a resource as an annotated dataset . The initial dataset is composed of 22,579 tweets in French containing at least one of the selected psychiatric terms. From this set, experts in psychiatry randomly annotated 3,040 tweets that corresponds to the resource resulting from our work. The second result is the analysis of the annotations; it shows that terms are misused in 45.3% of the tweets and that their associated polarity is negative in 86.2% of the cases. When considering the three types of term use, 59.5% of the tweets are associated to a negative polarity. Misused terms related to psychotic disorders (55.5%) are more frequent to those related to mood disorders (26.5%). CONCLUSIONS Some psychiatric terms are misused in the corpora we studied; which is consistent with the results reported in related work in other languages. Thanks to the great diversity of studied terms, this work highlighted a disparity in the representations and ways of using psychiatric terms. Moreover, our study is important to help psychiatrists to be aware of the term use in new communication media such as social networks which are widely used. This study has the huge advantage to be reproducible thanks to the framework and guidelines we produced; so that the study could be renewed in order to analyze the evolution of term usage. While the newly build dataset is a valuable resource for other analytical studies, it could also serve to train machine learning algorithms to automatically identify stigma in social media.


Author(s):  
Jan Mikulka ◽  
Daniel Chalupa ◽  
Jan Svoboda ◽  
Milan Filipovic ◽  
Martin Repko ◽  
...  
Keyword(s):  

2021 ◽  
Vol 184 ◽  
pp. 106095
Author(s):  
He Liu ◽  
Amy R. Reibman ◽  
Aaron C. Ault ◽  
James V. Krogmeier
Keyword(s):  

Author(s):  
Elena Álvarez-Mellado ◽  
María Luisa Díez-Platas ◽  
Pablo Ruiz-Fabo ◽  
Helena Bermúdez ◽  
Salvador Ros ◽  
...  

AbstractMedieval documents are a rich source of historical data. Performing named-entity recognition (NER) on this genre of texts can provide us with valuable historical evidence. However, traditional NER categories and schemes are usually designed with modern documents in mind (i.e. journalistic text) and the general-domain NER annotation schemes fail to capture the nature of medieval entities. In this paper we explore the challenges of performing named-entity annotation on a corpus of Spanish medieval documents: we discuss the mismatches that arise when applying traditional NER categories to a corpus of Spanish medieval documents and we propose a novel humanist-friendly TEI-compliant annotation scheme and guidelines intended to capture the particular nature of medieval entities.


2020 ◽  
Vol 48 (1) ◽  
pp. 1-46
Author(s):  
Michael Bender ◽  
Marcus Müller

AbstractThis article contains a comparative study of heuristic textual practices in various scientific disciplines. By this we mean formulation practices with which new knowledge is generated in institutionally influenced routines and connected to existing knowledge, e. g. ‚highlighting the relevance of a research topic‘, ‚defining a concept‘ or ‚supporting a statement argumentatively‘.The aim is to find out to what extent such textual practices occur in different scientific disciplines, how they are distributed and combined. Furthermore, we study the effects domain-specific contexts have on heuristic textual practices. The data basis of our study is a corpus of 65 dissertations from the 13 different faculties of the TU Darmstadt. In the pilot study we report here, we examined the introductory chapters of the dissertations. Methodologically, it is an annotation study: Based on the current state of research on the subject, we have derived a basic annotation scheme, which we have developed and refined in a collaborative process of guideline creation. Our study affiliates on socio-pragmatic research on text production and formulation routines in the sciences. It is theoretically informed by the philosophy of science research on heuristics, methodically we make a contribution to the scientific debate on collaborative annotation procedures.


2021 ◽  
Vol 13 (11) ◽  
pp. 2208
Author(s):  
Yi Yang ◽  
Zongxu Pan ◽  
Yuxin Hu ◽  
Chibiao Ding

Ship detection is a significant and challenging task in remote sensing. At present, due to the faster speed and higher accuracy, the deep learning method has been widely applied in the field of ship detection. In ship detection, targets usually have the characteristics of arbitrary-oriented property and large aspect ratio. In order to take full advantage of these features to improve speed and accuracy on the base of deep learning methods, this article proposes an anchor-free method, which is referred as CPS-Det, on ship detection using rotatable bounding box. The main improvements of CPS-Det as well as the contributions of this article are as follows. First, an anchor-free based deep learning network was used to improve speed with fewer parameters. Second, an annotation method of oblique rectangular frame is proposed, which solves the problem that periodic angle and bounded coordinates in conjunction with the regression calculation can lead to the problem of loss anomalies. For the annotation scheme proposed in this paper, a scheme for calculating Angle Loss is proposed, which makes the loss function of angle near the boundary value more accurate and greatly improves the accuracy of angle prediction. Third, the centerness calculation of feature points is optimized in this article so that the center weight distribution of each point is suitable for the rotation detection. Finally, a scheme combining centerness and positive sample screening is proposed and its effectiveness in ship detection is proved. Experiments on remote sensing public dataset HRSC2016 show the effectiveness of our approach.


2021 ◽  
pp. 2-11
Author(s):  
David Aufreiter ◽  
Doris Ehrlinger ◽  
Christian Stadlmann ◽  
Margarethe Uberwimmer ◽  
Anna Biedersberger ◽  
...  

On the servitization journey, manufacturing companies complement their offerings with new industrial and knowledge-based services, which causes challenges of uncertainty and risk. In addition to the required adjustment of internal factors, the international selling of services is a major challenge. This paper presents the initial results of an international research project aimed at assisting advanced manufacturers in making decisions about exporting their service offerings to foreign markets. In the frame of this project, a tool is developed to support managers in their service export decisions through the automated generation of market information based on Natural Language Processing and Machine Learning. The paper presents a roadmap for progressing towards an Artificial Intelligence-based market information solution. It describes the research process steps of analyzing problem statements of relevant industry partners, selecting target countries and markets, defining parameters for the scope of the tool, classifying different service offerings and their components into categories and developing annotation scheme for generating reliable and focused training data for the Artificial Intelligence solution. This paper demonstrates good practices in essential steps and highlights common pitfalls to avoid for researcher and managers working on future research projects supported by Artificial Intelligence. In the end, the paper aims at contributing to support and motivate researcher and manager to discover AI application and research opportunities within the servitization field.


Corpora ◽  
2009 ◽  
Vol 4 (2) ◽  
pp. 189-208 ◽  
Author(s):  
Yufang Qian ◽  
Scott Piao

In this paper, we propose a corpus annotation scheme and lexicon for Chinese kinship terms. We modify existing traditional Chinese kinship schemes into a comprehensive semantic field framework that covers kinship semantic categories in contemporary Chinese. The scheme is inspired by the Lancaster USAS (UCREL Semantic Analysis System) taxonomy, which contains categories for English kinship terms. We show how our scheme works with a Chinese kinship semantic lexicon which covers parents, siblings, marital relations, off-spring and same-sex partnerships. The kinship lexicon was created through a pilot study involving the Lancaster University Mandarin Corpus. We foresee that our annotation scheme and lexicon will provide a framework and resource for the kinship annotation of Chinese corpora and corpus-based kinship studies.


Sign in / Sign up

Export Citation Format

Share Document