Semantic Roles or Syntactic Functions: The Effects of Annotation Scheme on the Results of Dependency Measures

2021 ◽  
Author(s):  
Jianwei Yan ◽  
Haitao Liu
Corpora ◽  
2020 ◽  
Vol 15 (2) ◽  
pp. 125-140
Author(s):  
Yukiko Ohashi ◽  
Noriaki Katagiri ◽  
Katsutoshi Oka ◽  
Michiko Hanada

This paper reports on two research results: ( 1) designing an English for Specific Purposes (esp) corpus architecture complete with annotations structured by regular expressions; and ( 2) a case study to test the design to cater for creating a specific vocabulary list using the compiled corpus. The first half of this study involved designing a precisely structured esp corpus from 190 veterinary medical charts with a hierarchy of the data. The data hierarchy in the corpus consists of document types, outline elements and inline elements, such as species and breed. Perl scripts extracted the data attached to veterinary-specific categories, and the extraction led to creating wordlists. The second part of the research tested the corpus mode, creating a list of commonly observed lexical items in veterinary medicine. The coverage rate of the wordlists by General Service List (gsl) and Academic Word List (awl) was tested, with the result that 66.4 percent of all lexical items appeared in gsl and awl, whereas 33.7 percent appeared in none of those lists. The corpus compilation procedures as well as the annotation scheme introduced in this study enable the compilation of specific corpora with explicit annotations, allowing teachers to have access to data required for creating esp classroom materials.


2010 ◽  
Vol 3 (2) ◽  
pp. 181-204 ◽  
Author(s):  
Maria Rosenberg

This study addresses agentive nominal compounds in French and Swedish containing N and V constituents. French has only one such compound, VN, whereas Swedish has at least four, NV-are, NV-a, NV and VN. The study explores the semantic characteristics of their constituents and their semantic structures. Formal aspects are also considered within a lexeme-based morphology. The analysis shows that, although French and Swedish compounds differ formally, they share more or less the same semantics. Their V constituent takes one or more arguments, and their N constituents display several semantic roles. Semantically, the compounds generally denote an Actor of verbs taking two arguments, and the N constituents denote an Undergoer, except in Swedish VN compounds, which denote an entity which fills the same role as that of the N constituent, generally an Actor. Non argumental interpretations, such as Place or Event, are less frequent. In conclusion, the study can have typological value for the semantics of agentive nominal compounds.1


2020 ◽  
Author(s):  
Sarah Delanys ◽  
Farah Benamara ◽  
Véronique Moriceau ◽  
François Olivier ◽  
Josiane Mothe

BACKGROUND With the advent of digital technology and specifically user generated contents in social media, new ways emerged for studying possible stigma of people in relation with mental health. Several pieces of work studied the discourse conveyed about psychiatric pathologies on Twitter considering mostly tweets in English and a limited number of psychiatric disorders terms. This paper proposes the first study to analyze the use of a wide range of psychiatric terms in tweets in French. OBJECTIVE Our aim is to study how generic, nosographic and therapeutic psychiatric terms are used on Twitter in French. More specifically, our study has three complementary goals: (1) to analyze the types of psychiatric word use namely medical, misuse, irrelevant, (2) to analyze the polarity conveyed in the tweets that use these terms (positive/negative/neural), and (3) to compare the frequency of these terms to those observed in related work (mainly in English ). METHODS Our study has been conducted on a corpus of tweets in French posted between 01/01/2016 to 12/31/2018 and collected using dedicated keywords. The corpus has been manually annotated by clinical psychiatrists following a multilayer annotation scheme that includes the type of word use and the opinion orientation of the tweet. Two analysis have been performed. First a qualitative analysis to measure the reliability of the produced manual annotation, then a quantitative analysis considering mainly term frequency in each layer and exploring the interactions between them. RESULTS One of the first result is a resource as an annotated dataset . The initial dataset is composed of 22,579 tweets in French containing at least one of the selected psychiatric terms. From this set, experts in psychiatry randomly annotated 3,040 tweets that corresponds to the resource resulting from our work. The second result is the analysis of the annotations; it shows that terms are misused in 45.3% of the tweets and that their associated polarity is negative in 86.2% of the cases. When considering the three types of term use, 59.5% of the tweets are associated to a negative polarity. Misused terms related to psychotic disorders (55.5%) are more frequent to those related to mood disorders (26.5%). CONCLUSIONS Some psychiatric terms are misused in the corpora we studied; which is consistent with the results reported in related work in other languages. Thanks to the great diversity of studied terms, this work highlighted a disparity in the representations and ways of using psychiatric terms. Moreover, our study is important to help psychiatrists to be aware of the term use in new communication media such as social networks which are widely used. This study has the huge advantage to be reproducible thanks to the framework and guidelines we produced; so that the study could be renewed in order to analyze the evolution of term usage. While the newly build dataset is a valuable resource for other analytical studies, it could also serve to train machine learning algorithms to automatically identify stigma in social media.


2005 ◽  
Vol 31 (1) ◽  
pp. 71-106 ◽  
Author(s):  
Martha Palmer ◽  
Daniel Gildea ◽  
Paul Kingsbury

The Proposition Bank project takes a practical approach to semantic representation, adding a layer of predicate-argument information, or semantic role labels, to the syntactic structures of the Penn Treebank. The resulting resource can be thought of as shallow, in that it does not represent coreference, quantification, and many other higher-order phenomena, but also broad, in that it covers every instance of every verb in the corpus and allows representative statistics to be calculated. We discuss the criteria used to define the sets of semantic roles used in the annotation process and to analyze the frequency of syntactic/semantic alternations in the corpus. We describe an automatic system for semantic role tagging trained on the corpus and discuss the effect on its performance of various types of information, including a comparison of full syntactic parsing with a flat representation and the contribution of the empty “trace” categories of the treebank.


Author(s):  
Elena Álvarez-Mellado ◽  
María Luisa Díez-Platas ◽  
Pablo Ruiz-Fabo ◽  
Helena Bermúdez ◽  
Salvador Ros ◽  
...  

AbstractMedieval documents are a rich source of historical data. Performing named-entity recognition (NER) on this genre of texts can provide us with valuable historical evidence. However, traditional NER categories and schemes are usually designed with modern documents in mind (i.e. journalistic text) and the general-domain NER annotation schemes fail to capture the nature of medieval entities. In this paper we explore the challenges of performing named-entity annotation on a corpus of Spanish medieval documents: we discuss the mismatches that arise when applying traditional NER categories to a corpus of Spanish medieval documents and we propose a novel humanist-friendly TEI-compliant annotation scheme and guidelines intended to capture the particular nature of medieval entities.


2020 ◽  
Vol 48 (1) ◽  
pp. 1-46
Author(s):  
Michael Bender ◽  
Marcus Müller

AbstractThis article contains a comparative study of heuristic textual practices in various scientific disciplines. By this we mean formulation practices with which new knowledge is generated in institutionally influenced routines and connected to existing knowledge, e. g. ‚highlighting the relevance of a research topic‘, ‚defining a concept‘ or ‚supporting a statement argumentatively‘.The aim is to find out to what extent such textual practices occur in different scientific disciplines, how they are distributed and combined. Furthermore, we study the effects domain-specific contexts have on heuristic textual practices. The data basis of our study is a corpus of 65 dissertations from the 13 different faculties of the TU Darmstadt. In the pilot study we report here, we examined the introductory chapters of the dissertations. Methodologically, it is an annotation study: Based on the current state of research on the subject, we have derived a basic annotation scheme, which we have developed and refined in a collaborative process of guideline creation. Our study affiliates on socio-pragmatic research on text production and formulation routines in the sciences. It is theoretically informed by the philosophy of science research on heuristics, methodically we make a contribution to the scientific debate on collaborative annotation procedures.


2010 ◽  
Vol 34 (4) ◽  
pp. 749-801 ◽  
Author(s):  
Anna Bugaeva
Keyword(s):  

This paper explores the polyfunctionality, grammaticalization, and typological relevance of applicatives in Ainu. Applicatives are derived by the valency-increasing prefixes which are generally defined here as instrumental e-, dative ko-, and locative o-. The referential range of the respective constructions stretches over several semantic roles and the exact role is attributed to the interaction between the semantics of the prefix and verb. The typologically unusual properties of Ainu applicatives include the ability of e- applicatives to add the roles of Theme and Content, the ability of the so-called unaccusative intransitives to host applicative prefixes e- and ko-, the possibility of e-ko- and ko-e- double applicatives, the absence of non-applicative paraphrases for some applicatives, and the possibility of applicative object incorporation.


2021 ◽  
Vol 13 (11) ◽  
pp. 2208
Author(s):  
Yi Yang ◽  
Zongxu Pan ◽  
Yuxin Hu ◽  
Chibiao Ding

Ship detection is a significant and challenging task in remote sensing. At present, due to the faster speed and higher accuracy, the deep learning method has been widely applied in the field of ship detection. In ship detection, targets usually have the characteristics of arbitrary-oriented property and large aspect ratio. In order to take full advantage of these features to improve speed and accuracy on the base of deep learning methods, this article proposes an anchor-free method, which is referred as CPS-Det, on ship detection using rotatable bounding box. The main improvements of CPS-Det as well as the contributions of this article are as follows. First, an anchor-free based deep learning network was used to improve speed with fewer parameters. Second, an annotation method of oblique rectangular frame is proposed, which solves the problem that periodic angle and bounded coordinates in conjunction with the regression calculation can lead to the problem of loss anomalies. For the annotation scheme proposed in this paper, a scheme for calculating Angle Loss is proposed, which makes the loss function of angle near the boundary value more accurate and greatly improves the accuracy of angle prediction. Third, the centerness calculation of feature points is optimized in this article so that the center weight distribution of each point is suitable for the rotation detection. Finally, a scheme combining centerness and positive sample screening is proposed and its effectiveness in ship detection is proved. Experiments on remote sensing public dataset HRSC2016 show the effectiveness of our approach.


Sign in / Sign up

Export Citation Format

Share Document