scholarly journals ENTITY EXTRACTION FROM UNSTRUCTURED MEDICAL TEXT

Author(s):  
G Deepank ◽  
R Tharun Raj ◽  
Aditya Verma

Electronic medical records represent rich data repositories loaded with valuable patient information. As artificial intelligence and machine learning in the field of medicine is becoming more popular by the day, ways to integrate it are always changing. One such way is processing the clinical notes and records, which are maintained by doctors and other medical professionals. Natural language processing can record this data and read more deeply into it than any human. Deep learning techniques such as entity extraction which involves identifying and returning of key data elements from an electronic medical record, and other techniques involving models such as BERT for question answering, when applied to all these medical records can create bespoke and efficient treatment plans for the patients, which can help in a swift and carefree recovery.

Author(s):  
Alaa Altheneyan ◽  
Mohamed El Bachir Menai

Paraphrase identification is a natural language processing (NLP) problem that involves the determination of whether two text segments have the same meaning. Various NLP applications rely on a solution to this problem, including automatic plagiarism detection, text summarization, machine translation (MT), and question answering. The methods for identifying paraphrases found in the literature fall into two main classes: similarity-based methods and classification methods. This paper presents a critical study and an evaluation of existing methods for paraphrase identification and its application to automatic plagiarism detection. It presents the classes of paraphrase phenomena, the main methods, and the sets of features used by each particular method. All the methods and features used are discussed and enumerated in a table for easy comparison. Their performances on benchmark corpora are also discussed and compared via tables. Automatic plagiarism detection is presented as an application of paraphrase identification. The performances on benchmark corpora of existing plagiarism detection systems able to detect paraphrases are compared and discussed. The main outcome of this study is the identification of word overlap, structural representations, and MT measures as feature subsets that lead to the best performance results for support vector machines in both paraphrase identification and plagiarism detection on corpora. The performance results achieved by deep learning techniques highlight that these techniques are the most promising research direction in this field.


Author(s):  
Samuele Martinelli ◽  
Gloria Gonella ◽  
Dario Bertolino

During decades, Natural language processing (NLP) expanded its range of tasks, from document classification to automatic text summarization, sentiment analysis, text mining, machine translation, automatic question answering and others. In 2018, T. Young described NLP as a theory-motivated range of computational techniques for the automatic analysis and representation of human language. Outside and before AI, human language has been studied by specialists from various disciplines: linguistics, philosophy, logic, psychology. The aim of this work is to build a neural network to perform a sentiment analysis on Italian reviews from the chatbot customer service. Sentiment analysis is a data mining process which identifies and extracts subjective information from text. It could help to understand the social sentiment of clients, respect a business product or service. It could be a simple classification task that analyses a sentence and tells whether the underlying sentiment is positive or negative. The potentiality of deep learning techniques made this simple classification task evolve, creating new, more complex sentiment analysis, e.g. Intent Analysis and Contextual Semantic Search.


Author(s):  
Faith Wavinya Mutinda ◽  
Shuntaro Yada ◽  
Shoko Wakamiya ◽  
Eiji Aramaki

Abstract Background Semantic textual similarity (STS) captures the degree of semantic similarity between texts. It plays an important role in many natural language processing applications such as text summarization, question answering, machine translation, information retrieval, dialog systems, plagiarism detection, and query ranking. STS has been widely studied in the general English domain. However, there exists few resources for STS tasks in the clinical domain and in languages other than English, such as Japanese. Objective The objective of this study is to capture semantic similarity between Japanese clinical texts (Japanese clinical STS) by creating a Japanese dataset that is publicly available. Materials We created two datasets for Japanese clinical STS: (1) Japanese case reports (CR dataset) and (2) Japanese electronic medical records (EMR dataset). The CR dataset was created from publicly available case reports extracted from the CiNii database. The EMR dataset was created from Japanese electronic medical records. Methods We used an approach based on bidirectional encoder representations from transformers (BERT) to capture the semantic similarity between the clinical domain texts. BERT is a popular approach for transfer learning and has been proven to be effective in achieving high accuracy for small datasets. We implemented two Japanese pretrained BERT models: a general Japanese BERT and a clinical Japanese BERT. The general Japanese BERT is pretrained on Japanese Wikipedia texts while the clinical Japanese BERT is pretrained on Japanese clinical texts. Results The BERT models performed well in capturing semantic similarity in our datasets. The general Japanese BERT outperformed the clinical Japanese BERT and achieved a high correlation with human score (0.904 in the CR dataset and 0.875 in the EMR dataset). It was unexpected that the general Japanese BERT outperformed the clinical Japanese BERT on clinical domain dataset. This could be due to the fact that the general Japanese BERT is pretrained on a wide range of texts compared with the clinical Japanese BERT.


2021 ◽  
Author(s):  
Chun-chao Lo ◽  
Shubo Tian ◽  
Yuchuan Tao ◽  
Jie Hao ◽  
Jinfeng Zhang

Most queries submitted to a literature search engine can be more precisely written as sentences to give the search engine more specific information. Sentence queries should be more effective, in principle, than short queries with small numbers of keywords. Querying with full sentences is also a key step in question-answering and citation recommendation systems. Despite the considerable progress in natural language processing (NLP) in recent years, using sentence queries on current search engines does not yield satisfactory results. In this study, we developed a deep learning-based method for sentence queries, called DeepSenSe, using citation data available in full-text articles obtained from PubMed Central (PMC). A large amount of labeled data was generated from millions of matched citing sentences and cited articles, making it possible to train quality predictive models using modern deep learning techniques. A two-stage approach was designed: in the first stage we used a modified BM25 algorithm to obtain the top 1000 relevant articles; the second stage involved re-ranking the relevant articles using DeepSenSe. We tested our method using a large number of sentences extracted from real scientific articles in PMC. Our method performed substantially better than PubMed and Google Scholar for sentence queries.


AI Magazine ◽  
2019 ◽  
Vol 40 (3) ◽  
pp. 67-78
Author(s):  
Guy Barash ◽  
Mauricio Castillo-Effen ◽  
Niyati Chhaya ◽  
Peter Clark ◽  
Huáscar Espinoza ◽  
...  

The workshop program of the Association for the Advancement of Artificial Intelligence’s 33rd Conference on Artificial Intelligence (AAAI-19) was held in Honolulu, Hawaii, on Sunday and Monday, January 27–28, 2019. There were fifteen workshops in the program: Affective Content Analysis: Modeling Affect-in-Action, Agile Robotics for Industrial Automation Competition, Artificial Intelligence for Cyber Security, Artificial Intelligence Safety, Dialog System Technology Challenge, Engineering Dependable and Secure Machine Learning Systems, Games and Simulations for Artificial Intelligence, Health Intelligence, Knowledge Extraction from Games, Network Interpretability for Deep Learning, Plan, Activity, and Intent Recognition, Reasoning and Learning for Human-Machine Dialogues, Reasoning for Complex Question Answering, Recommender Systems Meet Natural Language Processing, Reinforcement Learning in Games, and Reproducible AI. This report contains brief summaries of the all the workshops that were held.


Sign in / Sign up

Export Citation Format

Share Document