annotation study
Recently Published Documents


TOTAL DOCUMENTS

16
(FIVE YEARS 4)

H-INDEX

5
(FIVE YEARS 1)

Author(s):  
Cristina Grisot ◽  
Joanna Blochowiak

Abstract In this paper, we carried out two experimental studies to investigate whether verbal tenses, in their perspectival usages, give access to the speaker’s perspective. Study 1 is an annotation study in which annotators evaluated corpus excerpts as expressing situations or narrating events in a subjective or objective way. We manipulated access to the verbal tense: half of the annotators saw the tense of the verbs, and half saw only infinitive forms of the verbs. Study 2 is a self-paced reading experiment in which we examined how native speakers of French process utterances with the Passé Simple when it is preceded by aujourd’hui (semantic incompatibility solved pragmatically by perspective-taking), hier (semantic and pragmatic compatibility) and en ce moment (semantic incompatibility which cannot be solved pragmatically). The results of Study 1 suggest that the subjective interpretation of an utterance is not triggered by its verbal tense. Study 2 questions the idea that perspective-taking is a component of speaker’s subjectivity. In general, our experimental findings do not support the hypothesis that verbal tenses give access to speaker’s subjectivity, a theoretical hypothesis which has never before been directly experimentally tested.


2021 ◽  
Author(s):  
Cristina Grisot ◽  
Joanna Blochowiak

AbstractThis study investigates the role of non-linguistic biases in the obligatory (verb tenses) and optional (discourse connectives) linguistic marking for inferring temporal relations at the sentence and the text genre levels. Specifically, we formulated and tested several assumptions: (1) the linguistic cueing assumption (verb tenses inform language users about the temporal relation), (2) the implicitness assumption (highly expected relations need not be overtly marked), (3) the specialized connective assumption (specialized connectives are more efficient than underspecified ones), (4) the text genre assumption (language users’ expectations of temporal relations are linked to the text genre), and (5) the text status assumption (information in translated texts tends to be more explicit than in original texts). We carried out an annotation study of a bilingual corpus (French–English) belonging to two different text genres: literary and journalistic. Our results challenge the implicitness and the text status assumptions while confirming the linguistic cueing and the text genre assumptions. So, we put forth an alternative view, according to which language users have equal expectations about all three types of temporal relations and are oriented to one relation or the other by linguistic cueing (obligatory and optional marking) as well as text genre.


Author(s):  
Judy Hong ◽  
Anahita Davoudi ◽  
Shun Yu ◽  
Danielle L. Mowery

Abstract Background Age and time information stored within the histories of clinical notes can provide valuable insights for assessing a patient’s disease risk, understanding disease progression, and studying therapeutic outcomes. However, details of age and temporally-specified clinical events are not well captured, consistently codified, and readily available to research databases for study. Methods We expanded upon existing annotation schemes to capture additional age and temporal information, conducted an annotation study to validate our expanded schema, and developed a prototypical, rule-based Named Entity Recognizer to extract our novel clinical named entities (NE). The annotation study was conducted on 138 discharge summaries from the pre-annotated 2014 ShARe/CLEF eHealth Challenge corpus. In addition to existing NE classes (TIMEX3, SUBJECT_CLASS, DISEASE_DISORDER), our schema proposes 3 additional NEs (AGE, PROCEDURE, OTHER_EVENTS). We also propose new attributes, e.g., “degree_relation” which captures the degree of biological relation for subjects annotated under SUBJECT_CLASS. As a proof of concept, we applied the schema to 49 H&P notes to encode pertinent history information for a lung cancer cohort study. Results An abundance of information was captured under the new OTHER_EVENTS, PROCEDURE and AGE classes, with 23%, 10% and 8% of all annotated NEs belonging to the above classes, respectively. We observed high inter-annotator agreement of >80% for AGE and TIMEX3; the automated NLP system achieved F1 scores of 86% (AGE) and 86% (TIMEX3). Age and temporally-specified mentions within past medical, family, surgical, and social histories were common in our lung cancer data set; annotation is ongoing to support this translational research study. Conclusions Our annotation schema and NLP system can encode historical events from clinical notes to support clinical and translational research studies.


2020 ◽  
Vol 34 (05) ◽  
pp. 8775-8782
Author(s):  
Claudia Schulz ◽  
Damir Juric

A large number of embeddings trained on medical data have emerged, but it remains unclear how well they represent medical terminology, in particular whether the close relationship of semantically similar medical terms is encoded in these embeddings. To date, only small datasets for testing medical term similarity are available, not allowing to draw conclusions about the generalisability of embeddings to the enormous amount of medical terms used by doctors. We present multiple automatically created large-scale medical term similarity datasets and confirm their high quality in an annotation study with doctors. We evaluate state-of-the-art word and contextual embeddings on our new datasets, comparing multiple vector similarity metrics and word vector aggregation techniques. Our results show that current embeddings are limited in their ability to adequately encode medical terms. The novel datasets thus form a challenging new benchmark for the development of medical embeddings able to accurately represent the whole medical terminology.


2018 ◽  
Vol 37 (6) ◽  
pp. 1245-1259 ◽  
Author(s):  
Negin Mirriahi ◽  
Srećko Joksimović ◽  
Dragan Gašević ◽  
Shane Dawson

2018 ◽  
Vol 42 (1-2) ◽  
pp. 1-8
Author(s):  
Yasemin Dincer ◽  
Julian Schulz ◽  
Sandra Wilson ◽  
Christoph Marschall ◽  
Monika Y. Cohen ◽  
...  

AbstractNext-generation sequencing (NGS) technologies in clinical diagnostics open vast opportunities through the ability to sequence all genes simultaneously at a cost and speed that is superior to traditional sequencing approaches. On the other hand, the practical implementation of NGS in routine diagnostics involves a variety of challenges, which need to be overcome. Among these are the generation, analysis and storage of large amounts of data, strict control of sequencing performance, validation of results, interpretation of detected variants and reporting. Here, we outline the Multiple Integration and Data Annotation Study, an approach for data integration in clinical diagnostics based on genotype-phenotype correlations. MIDAS aims to accelerate NGS data analysis and to enhance the validity of the results by computer-based variant prioritization using the clinical data of the patient. In this context, we present the MIDAS case reports of one patient with intellectual disability caused by a novel de novo loss-of-function variant in theGATAD2Bgene [NM_020699.3: c.1426G>T (p.Glu476*)] identified by trio whole-exome sequencing, as well as two cardiac disease patients with severe phenotype and multiple variants in genes linked to cardiac arrhythmogenic disorders analyzed with multi-gene panel sequencing. Based on the data collected in the MIDAS cohort, the MIDAS software will be tested and optimized. Moreover, the MIDAS software concept can be extended modularly to include further data resources for improved data handling and interpretation in the broad field of diagnostics.


2017 ◽  
Vol 8 (2) ◽  
pp. 56-83 ◽  
Author(s):  
Merel C.J. Scholman ◽  
Vera Demberg

Examples and specifications occur frequently in text, but not much is known about how they function in discourse and how readers interpret them. Looking at how they’re annotated in existing discourse corpora, we find that annotators often disagree on these types of relations; specifically, there is disagreement about whether these relations are elaborative (additive) or argumentative (pragmatic causal). To investigate how readers interpret examples and specifications, we conducted a crowdsourced discourse annotation study. The results show that these relations can indeed have two functions: they can be used to both illustrate/specify a situation and serve as an argument for a claim. These findings suggest that examples and specifications can have multiple simultaneous readings. We discuss the implications of these results for discourse annotation. 


2017 ◽  
Vol 43 (1) ◽  
pp. 125-179 ◽  
Author(s):  
Ivan Habernal ◽  
Iryna Gurevych

The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.


2013 ◽  
Vol 47 (4) ◽  
pp. 1261-1284 ◽  
Author(s):  
Ekaterina Shutova ◽  
Barry J. Devereux ◽  
Anna Korhonen

Sign in / Sign up

Export Citation Format

Share Document