Multiword Expressions in the Medical Domain: Who Carries the Domain-Specific Meaning

Author(s):  
Kristina Kocijan ◽  
Krešimir Šojat ◽  
Silvia Kurolt
2021 ◽  
Author(s):  
Huseyin Denli ◽  
Hassan A Chughtai ◽  
Brian Hughes ◽  
Robert Gistri ◽  
Peng Xu

Abstract Deep learning has recently been providing step-change capabilities, particularly using transformer models, for natural language processing applications such as question answering, query-based summarization, and language translation for general-purpose context. We have developed a geoscience-specific language processing solution using such models to enable geoscientists to perform rapid, fully-quantitative and automated analysis of large corpuses of data and gain insights. One of the key transformer-based model is BERT (Bidirectional Encoder Representations from Transformers). It is trained with a large amount of general-purpose text (e.g., Common Crawl). Use of such a model for geoscience applications can face a number of challenges. One is due to the insignificant presence of geoscience-specific vocabulary in general-purpose context (e.g. daily language) and the other one is due to the geoscience jargon (domain-specific meaning of words). For example, salt is more likely to be associated with table salt within a daily language but it is used as a subsurface entity within geosciences. To elevate such challenges, we retrained a pre-trained BERT model with our 20M internal geoscientific records. We will refer the retrained model as GeoBERT. We fine-tuned the GeoBERT model for a number of tasks including geoscience question answering and query-based summarization. BERT models are very large in size. For example, BERT-Large has 340M trained parameters. Geoscience language processing with these models, including GeoBERT, could result in a substantial latency when all database is processed at every call of the model. To address this challenge, we developed a retriever-reader engine consisting of an embedding-based similarity search as a context retrieval step, which helps the solution to narrow the context for a given query before processing the context with GeoBERT. We built a solution integrating context-retrieval and GeoBERT models. Benchmarks show that it is effective to help geologists to identify answers and context for given questions. The prototype will also produce a summary to different granularity for a given set of documents. We have also demonstrated that domain-specific GeoBERT outperforms general-purpose BERT for geoscience applications.


2020 ◽  
Vol 2 ◽  
Author(s):  
Abeed Sarker ◽  
Yuan-Chi Yang ◽  
Mohammed Ali Al-Garadi ◽  
Aamir Abbas

As the volume of published medical research continues to grow rapidly, staying up-to-date with the best-available research evidence regarding specific topics is becoming an increasingly challenging problem for medical experts and researchers. The current COVID19 pandemic is a good example of a topic on which research evidence is rapidly evolving. Automatic query-focused text summarization approaches may help researchers to swiftly review research evidence by presenting salient and query-relevant information from newly-published articles in a condensed manner. Typical medical text summarization approaches require domain knowledge, and the performances of such systems rely on resource-heavy medical domain-specific knowledge sources and pre-processing methods (e.g., text classification) for deriving semantic information. Consequently, these systems are often difficult to speedily customize, extend, or deploy in low-resource settings, and they are often operationally slow. In this paper, we propose a fast and simple extractive summarization approach that can be easily deployed and run, and may thus aid medical experts and researchers obtain fast access to the latest research evidence. At runtime, our system utilizes similarity measurements derived from pre-trained medical domain-specific word embeddings in addition to simple features, rather than computationally-expensive pre-processing and resource-heavy knowledge bases. Automatic evaluation using ROUGE—a summary evaluation tool—on a public dataset for evidence-based medicine shows that our system's performance, despite the simple implementation, is statistically comparable with the state-of-the-art. Extrinsic manual evaluation based on recently-released COVID19 articles demonstrates that the summarizer performance is close to human agreement, which is generally low, for extractive summarization.


10.29007/4kf5 ◽  
2018 ◽  
Author(s):  
Antonio Moreno-Ortiz ◽  
Chantal Pérez-Hernández ◽  
Cristian Gómez-Pascual

This paper is a first attempt at designing a procedure to derive a domain-specific lexicon (both single words and multiword expressions) from an opinion corpus of specialized language. We use a corpus of reviews of running shoes as case study, compiled for this particular purpose. The main goal is to obtain a first approximation to the task of automatically extracting domain-specific expressions of sentiment to be used by our sentiment analysis software, Lingmotif.


2008 ◽  
Vol 67 (2) ◽  
pp. 71-83 ◽  
Author(s):  
Yolanda A. Métrailler ◽  
Ester Reijnen ◽  
Cornelia Kneser ◽  
Klaus Opwis

This study compared individuals with pairs in a scientific problem-solving task. Participants interacted with a virtual psychological laboratory called Virtue to reason about a visual search theory. To this end, they created hypotheses, designed experiments, and analyzed and interpreted the results of their experiments in order to discover which of five possible factors affected the visual search process. Before and after their interaction with Virtue, participants took a test measuring theoretical and methodological knowledge. In addition, process data reflecting participants’ experimental activities and verbal data were collected. The results showed a significant but equal increase in knowledge for both groups. We found differences between individuals and pairs in the evaluation of hypotheses in the process data, and in descriptive and explanatory statements in the verbal data. Interacting with Virtue helped all students improve their domain-specific and domain-general psychological knowledge.


2008 ◽  
Vol 16 (3) ◽  
pp. 112-115 ◽  
Author(s):  
Stephan Bongard ◽  
Volker Hodapp ◽  
Sonja Rohrmann

Abstract. Our unit investigates the relationship of emotional processes (experience, expression, and coping), their physiological correlates and possible health outcomes. We study domain specific anger expression behavior and associated cardio-vascular loads and found e.g. that particularly an open anger expression at work is associated with greater blood pressure. Furthermore, we demonstrated that women may be predisposed for the development of certain mental disorders because of their higher disgust sensitivity. We also pointed out that the suppression of negative emotions leads to increased physiological stress responses which results in a higher risk for cardiovascular diseases. We could show that relaxation as well as music activity like singing in a choir causes increases in the local immune parameter immunoglobuline A. Finally, we are investigating connections between migrants’ strategy of acculturation and health and found e.g. elevated cardiovascular stress responses in migrants when they where highly adapted to the German culture.


2009 ◽  
Vol 25 (1) ◽  
pp. 1-7 ◽  
Author(s):  
Jörg-Tobias Kuhn ◽  
Heinz Holling

The present study explores the factorial structure and the degree of measurement invariance of 12 divergent thinking tests. In a large sample of German students (N = 1328), a three-factor model representing verbal, figural, and numerical divergent thinking was supported. Multigroup confirmatory factor analyses revealed that partial strong measurement invariance was tenable across gender and age groups as well as school forms. Latent mean comparisons resulted in significantly higher divergent thinking skills for females and students in schools with higher mean IQ. Older students exhibited higher latent means on the verbal and figural factor, but not on the numerical factor. These results suggest that a domain-specific model of divergent thinking may be assumed, although further research is needed to elucidate the sources that negatively affect measurement invariance.


2020 ◽  
Author(s):  
Jamie Buck ◽  
Rena Subotnik ◽  
Frank Worrell ◽  
Paula Olszewski-Kubilius ◽  
Chi Wang

2012 ◽  
Author(s):  
Christine M. Szostak ◽  
Mark A. Pitt ◽  
Laura C. Dilley

Sign in / Sign up

Export Citation Format

Share Document