scholarly journals Natural language processing and recurrent network models for identifying genomic mutation-associated cancer treatment change from patient progress notes

JAMIA Open ◽  
2019 ◽  
Vol 2 (1) ◽  
pp. 139-149 ◽  
Author(s):  
Meijian Guan ◽  
Samuel Cho ◽  
Robin Petro ◽  
Wei Zhang ◽  
Boris Pasche ◽  
...  

Abstract Objectives Natural language processing (NLP) and machine learning approaches were used to build classifiers to identify genomic-related treatment changes in the free-text visit progress notes of cancer patients. Methods We obtained 5889 deidentified progress reports (2439 words on average) for 755 cancer patients who have undergone a clinical next generation sequencing (NGS) testing in Wake Forest Baptist Comprehensive Cancer Center for our data analyses. An NLP system was implemented to process the free-text data and extract NGS-related information. Three types of recurrent neural network (RNN) namely, gated recurrent unit, long short-term memory (LSTM), and bidirectional LSTM (LSTM_Bi) were applied to classify documents to the treatment-change and no-treatment-change groups. Further, we compared the performances of RNNs to 5 machine learning algorithms including Naive Bayes, K-nearest Neighbor, Support Vector Machine for classification, Random forest, and Logistic Regression. Results Our results suggested that, overall, RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among the RNNs in terms of accuracy, precision, recall, and F1 score. In addition, pretrained word embedding can improve the accuracy of LSTM by 3.4% and reduce the training time by more than 60%. Discussion and Conclusion NLP and RNN-based text mining solutions have demonstrated advantages in information retrieval and document classification tasks for unstructured clinical progress notes.

Author(s):  
Anurag Langan

Grading student answers is a tedious and time-consuming task. A study had found that almost on average around 25% of a teacher's time is spent in scoring the answer sheets of students. This time could be utilized in much better ways if computer technology could be used to score answers. This system will aim to grade student answers using the various Natural Language processing techniques and Machine Learning algorithms available today.


Author(s):  
Dr. K. Suresh

The current way of checking answer scripts is hectic for the college. They need to manually check the answers and allocate the marks to the students. Our proposed system uses Machine Learning and Natural Language Processing techniques to beat this. Machine learning algorithms use computational methods to find out directly from data without hopping on predetermined rules. NLP algorithms identify specific entities within the text, explore for key elements during a document, run a contextual search for synonyms and detect misspelled words or similar entries, and more. Our algorithm performs similarity checking and also the number of words associated with the question exactly matched between two documents. It also checks whether the grammar is correctly used or not within the student's answer. Our proposed system performs text extraction and evaluation of marks by applying Machine Learning and Natural Language Processing techniques.


2021 ◽  
Vol 10 (5) ◽  
pp. 2857-2865
Author(s):  
Moanda Diana Pholo ◽  
Yskandar Hamam ◽  
Abdel Baset Khalaf ◽  
Chunling Du

Available literature reports several lymphoma cases misdiagnosed as tuberculosis, especially in countries with a heavy TB burden. This frequent misdiagnosis is due to the fact that the two diseases can present with similar symptoms. The present study therefore aims to analyse and explore TB as well as lymphoma case reports using Natural Language Processing tools and evaluate the use of machine learning to differentiate between the two diseases. As a starting point in the study, case reports were collected for each disease using web scraping. Natural language processing tools and text clustering were then used to explore the created dataset. Finally, six machine learning algorithms were trained and tested on the collected data, which contained 765 lymphoma and 546 tuberculosis case reports. Each method was evaluated using various performance metrics. The results indicated that the multi-layer perceptron model achieved the best accuracy (93.1%), recall (91.9%) and precision score (93.7%), thus outperforming other algorithms in terms of correctly classifying the different case reports.


2021 ◽  
Vol 5 (2 (113)) ◽  
pp. 55-65
Author(s):  
Aigerim Yerimbetova ◽  
Madina Tussupova ◽  
Madina Sambetbayeva ◽  
Mussa Turdalyuly ◽  
Bakzhan Sakenov

This research is aimed at identifying the parts of speech for the Kazakh and Turkish languages in an information retrieval system. The proposed algorithms are based on machine learning techniques. In this paper, we consider the binary classification of words according to parts of speech. We decided to take the most popular machine learning algorithms. In this paper, the following approaches and well-known machine learning algorithms are studied and considered. We defined 7 dictionaries and tagged 135 million words in Kazakh and 9 dictionaries and 50 million words in the Turkish language. The main problem considered in the paper is to create algorithms for the execution of dictionaries of the so-called Link Grammar Parser (LGP) system, in particular for the Kazakh and Turkish languages, using machine learning techniques. The focus of the research is on the review and comparison of machine learning algorithms and methods that have accomplished results on various natural language processing tasks such as grammatical categories determination. For the operation of the LGP system, a dictionary is created in which a connector for each word is indicated – the type of connection that can be created using this word. The authors considered methods of filling in LGP dictionaries using machine learning.  The complexities of natural language processing, however, do not exclude the possibility of identifying narrower tasks that can already be solved algorithmically: for example, determining parts of speech or splitting texts into logical groups. However, some features of natural languages significantly reduce the effectiveness of these solutions. Thus, taking into account all word forms for each word in the Kazakh and Turkish languages increases the complexity of text processing by an order of magnitude


2021 ◽  
Vol 28 (1) ◽  
pp. e100262
Author(s):  
Mustafa Khanbhai ◽  
Patrick Anyadi ◽  
Joshua Symons ◽  
Kelsey Flott ◽  
Ara Darzi ◽  
...  

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.


Sign in / Sign up

Export Citation Format

Share Document