Designing a Concept-Mining Model for the Extraction of Medical Information in Spanish

Author(s):  
Olga Acosta ◽  
César Aguilar

This article sketches the development of a method for mining concepts applied on medical corpora in Spanish. Such method is based in the approach formulated by Ananiadou and McNaught, who give a special relevance to the need to create and use natural language processing (NLP) tools, in order to extract information from large collections of documents, such as PubMed (www.ncbi.nlm.nih.gov/pubmed/). Thanks to this repository, projects such as the Corpus Genia (www.geniaproject.org), the MEDIE search engine (www.nactem.ac.uk/medie/), which considers syntactic criteria and semantics to extract medical concepts, or the Open Biological and Biomedical Ontology Project (http://obofoundry.org/), which focuses on the development of ontologies that provide an organized knowledge system in biomedicine. Particularly, this proposal focused in two objectives: (1) the extraction of specialized terms and (2) the identification of lexical-semantic relationships, in concrete hyponymy/hypernymy and meronymy.

2019 ◽  
Vol 13 (2) ◽  
pp. 159-165
Author(s):  
Manik Sharma ◽  
Gurvinder Singh ◽  
Rajinder Singh

Background: For almost every domain, a tremendous degree of data is accessible in an online and offline mode. Billions of users are daily posting their views or opinions by using different online applications like WhatsApp, Facebook, Twitter, Blogs, Instagram etc. Objective: These reviews are constructive for the progress of the venture, civilization, state and even nation. However, this momentous amount of information is useful only if it is collectively and effectively mined. Methodology: Opinion mining is used to extract the thoughts, expression, emotions, critics, appraisal from the data posted by different persons. It is one of the prevailing research techniques that coalesce and employ the features from natural language processing. Here, an amalgamated approach has been employed to mine online reviews. Results: To improve the results of genetic algorithm based opining mining patent, here, a hybrid genetic algorithm and ontology based 3-tier natural language processing framework named GAO_NLP_OM has been designed. First tier is used for preprocessing and corrosion of the sentences. Middle tier is composed of genetic algorithm based searching module, ontology for English sentences, base words for the review, complete set of English words with item and their features. Genetic algorithm is used to expedite the polarity mining process. The last tier is liable for semantic, discourse and feature summarization. Furthermore, the use of ontology assists in progressing more accurate opinion mining model. Conclusion: GAO_NLP_OM is supposed to improve the performance of genetic algorithm based opinion mining patent. The amalgamation of genetic algorithm, ontology and natural language processing seems to produce fast and more precise results. The proposed framework is able to mine simple as well as compound sentences. However, affirmative preceded interrogative, hidden feature and mixed language sentences still be a challenge for the proposed framework.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
George Mastorakos ◽  
Aditya Khurana ◽  
Ming Huang ◽  
Sunyang Fu ◽  
Ahmad P. Tafti ◽  
...  

Background. Patients increasingly use asynchronous communication platforms to converse with care teams. Natural language processing (NLP) to classify content and automate triage of these messages has great potential to enhance clinical efficiency. We characterize the contents of a corpus of portal messages generated by patients using NLP methods. We aim to demonstrate descriptive analyses of patient text that can contribute to the development of future sophisticated NLP applications. Methods. We collected approximately 3,000 portal messages from the cardiology, dermatology, and gastroenterology departments at Mayo Clinic. After labeling these messages as either Active Symptom, Logistical, Prescription, or Update, we used NER (named entity recognition) to identify medical concepts based on the UMLS library. We hierarchically analyzed the distribution of these messages in terms of departments, message types, medical concepts, and keywords therewithin. Results. Active Symptom and Logistical content types comprised approximately 67% of the message cohort. The “Findings” medical concept had the largest number of keywords across all groupings of content types and departments. “Anatomical Sites” and “Disorders” keywords were more prevalent in Active Symptom messages, while “Drugs” keywords were most prevalent in Prescription messages. Logistical messages tended to have the lower proportions of “Anatomical Sites,”, “Disorders,”, “Drugs,”, and “Findings” keywords when compared to other message content types. Conclusions. This descriptive corpus analysis sheds light on the content and foci of portal messages. The insight into the content and differences among message themes can inform the development of more robust NLP models.


2018 ◽  
Vol 25 (6) ◽  
pp. 726-733
Author(s):  
Maria S. Karyaeva ◽  
Pavel I. Braslavski ◽  
Valery A. Sokolov

The ability to identify semantic relations between words has made a word2vec model widely used in NLP tasks. The idea of word2vec is based on a simple rule that a higher similarity can be reached if two words have a similar context. Each word can be represented as a vector, so the closest coordinates of vectors can be interpreted as similar words. It allows to establish semantic relations (synonymy, relations of hypernymy and hyponymy and other semantic relations) by applying an automatic extraction. The extraction of semantic relations by hand is considered as a time-consuming and biased task, requiring a large amount of time and some help of experts. Unfortunately, the word2vec model provides an associative list of words which does not consist of relative words only. In this paper, we show some additional criteria that may be applicable to solve this problem. Observations and experiments with well-known characteristics, such as word frequency, a position in an associative list, might be useful for improving results for the task of extraction of semantic relations for the Russian language by using word embedding. In the experiments, the word2vec model trained on the Flibusta and pairs from Wiktionary are used as examples with semantic relationships. Semantically related words are applicable to thesauri, ontologies and intelligent systems for natural language processing.


Author(s):  
Kaan Ant ◽  
Ugur Sogukpinar ◽  
Mehmet Fatif Amasyali

The use of databases those containing semantic relationships between words is becoming increasingly widespread in order to make natural language processing work more effective. Instead of the word-bag approach, the suggested semantic spaces give the distances between words, but they do not express the relation types. In this study, it is shown how semantic spaces can be used to find the type of relationship and it is compared with the template method. According to the results obtained on a very large scale, while is_a and opposite are more successful for semantic spaces for relations, the approach of templates is more successful in the relation types at_location, made_of and non relational.


2019 ◽  
Author(s):  
Jiang Han ◽  
Ken Chen ◽  
Lei Fang ◽  
Shaodian Zhang ◽  
Fei Wang ◽  
...  

BACKGROUND The growing interest in observational trials using patient data from electronic medical records poses challenges to both efficiency and quality of clinical data collection and management. Even with the help of electronic data capture systems and electronic case report forms (eCRFs), the manual data entry process followed by chart review is still time consuming. OBJECTIVE To facilitate the data entry process, we developed a natural language processing–driven medical information extraction system (NLP-MIES) based on the i2b2 reference standard. We aimed to evaluate whether the NLP-MIES–based eCRF application could improve the accuracy and efficiency of the data entry process. METHODS We conducted a randomized and controlled field experiment, and 24 eligible participants were recruited (12 for the manual group and 12 for NLP-MIES–supported group). We simulated the real-world eCRF completion process using our system and compared the performance of data entry on two research topics, pediatric congenital heart disease and pneumonia. RESULTS For the congenital heart disease condition, the NLP-MIES–supported group increased accuracy by 15% (95% CI 4%-120%, P=.03) and reduced elapsed time by 33% (95% CI 22%-42%, P<.001) compared with the manual group. For the pneumonia condition, the NLP-MIES–supported group increased accuracy by 18% (95% CI 6%-32%, P=.008) and reduced elapsed time by 31% (95% CI 19%-41%, P<.001). CONCLUSIONS Our system could improve both the accuracy and efficiency of the data entry process.


2017 ◽  
Vol 1 (2) ◽  
pp. 89 ◽  
Author(s):  
Azam Orooji ◽  
Mostafa Langarizadeh

It is estimated that each year many people, most of whom are teenagers and young adults die by suicide worldwide. Suicide receives special attention with many countries developing national strategies for prevention. Since, more medical information is available in text, Preventing the growing trend of suicide in communities requires analyzing various textual resources, such as patient records, information on the web or questionnaires. For this purpose, this study systematically reviews recent studies related to the use of natural language processing techniques in the area of people’s health who have completed suicide or are at risk. After electronically searching for the PubMed and ScienceDirect databases and studying articles by two reviewers, 21 articles matched the inclusion criteria. This study revealed that, if a suitable data set is available, natural language processing techniques are well suited for various types of suicide related research.


Author(s):  
Rachel Gorman ◽  
Pierre Maret ◽  
Alexandra Creighton ◽  
Bushra Kundi ◽  
Fabrice Muhlenbach ◽  
...  

Human rights monitoring for people with disabilities is in urgent need for disability data that is shared and available for local and international disability stakeholders (e.g., advocacy groups). Our aim is to use a Wikibase for editing, integrating, storing structured disability related data and to develop a Natural Language Processing (NLP) enabled multilingual search engine to tap into the wikibase data. In this paper, we explain the project first phase.


Author(s):  
Jia Zeng ◽  
Christian X. Cruz-Pico ◽  
Turçin Saridogan ◽  
Md Abu Shufean ◽  
Michael Kahle ◽  
...  

PURPOSE Despite advances in molecular therapeutics, few anticancer agents achieve durable responses. Rational combinations using two or more anticancer drugs have the potential to achieve a synergistic effect and overcome drug resistance, enhancing antitumor efficacy. A publicly accessible biomedical literature search engine dedicated to this domain will facilitate knowledge discovery and reduce manual search and review. METHODS We developed RetriLite, an information retrieval and extraction framework that leverages natural language processing and domain-specific knowledgebase to computationally identify highly relevant papers and extract key information. The modular architecture enables RetriLite to benefit from synergizing information retrieval and natural language processing techniques while remaining flexible to customization. We customized the application and created an informatics pipeline that strategically identifies papers that describe efficacy of using combination therapies in clinical or preclinical studies. RESULTS In a small pilot study, RetriLite achieved an F 1 score of 0.93. A more extensive validation experiment was conducted to determine agents that have enhanced antitumor efficacy in vitro or in vivo with poly (ADP-ribose) polymerase inhibitors: 95.9% of the papers determined to be relevant by our application were true positive and the application's feature of distinguishing a clinical paper from a preclinical paper achieved an accuracy of 97.6%. Interobserver assessment was conducted, which resulted in a 100% concordance. The data derived from the informatics pipeline have also been made accessible to the public via a dedicated online search engine with an intuitive user interface. CONCLUSION RetriLite is a framework that can be applied to establish domain-specific information retrieval and extraction systems. The extensive and high-quality metadata tags along with keyword highlighting facilitate information seekers to more effectively and efficiently discover knowledge in the combination therapy domain.


Author(s):  
Oana Frunza ◽  
Diana Inkpen

This book chapter presents several natural language processing (NLP) and machine learning (ML) techniques that can help achieve a better medical practice by means of extracting relevant medical information from the wealth of textual data. The chapter describes three major tasks: building intelligent tools that can help in the clinical decision making, tools that can automatically identify relevant medical information from the life-science literature, and tools that can extract semantic relations between medical concepts. Besides introducing and describing these tasks, methodological settings accompanied by representative results obtained on real-life data sets are presented.


Sign in / Sign up

Export Citation Format

Share Document