scholarly journals Natural Language Processing based Medical Needs Extraction for Breast Cancer Patients from Question and Answer Services (Preprint)

JMIR Cancer ◽  
10.2196/32005 ◽  
2021 ◽  
Author(s):  
Masaru Kamba ◽  
Masae Manabe ◽  
Shoko Wakamiya ◽  
Shuntaro Yada ◽  
Eiji Aramaki ◽  
...  
2021 ◽  
Author(s):  
Masaru Kamba ◽  
Masae Manabe ◽  
Shoko Wakamiya ◽  
Shuntaro Yada ◽  
Eiji Aramaki ◽  
...  

BACKGROUND Currently, a large number of patient narratives are available on various web services. On web question and answer (QA) services, patient questions often relate to medical needs. Therefore, we expect these questions to provide clues to understanding patients’ medical needs. OBJECTIVE This study aims to extract patient needs and classify them into thematic categories. To clarify the patient's needs would be the first step to solve social issues for cancer patients. METHODS The material of this study is patient question texts containing the keyword “breast cancer" in the Yahoo! Japan QA service, Yahoo! Chiebukuro, which contains over 60,000 questions on cancer. First, we convert the question text into a vector representation; then, the relevance between patient needs and existing cancer needs categories are calculated based on cosine similarity. RESULTS The proportion of correct classifications in our proposed method is approximately 70%. We reveal the variation and the number of needs from the results of classifying questions. CONCLUSIONS There are various clinical applications to applying the proposed method such as identifying the side effect signaling of drugs and the unmet needs of cancer patients. Revealing these needs is important to satisfy the medical needs of cancer patients.


JAMIA Open ◽  
2019 ◽  
Vol 2 (1) ◽  
pp. 139-149 ◽  
Author(s):  
Meijian Guan ◽  
Samuel Cho ◽  
Robin Petro ◽  
Wei Zhang ◽  
Boris Pasche ◽  
...  

Abstract Objectives Natural language processing (NLP) and machine learning approaches were used to build classifiers to identify genomic-related treatment changes in the free-text visit progress notes of cancer patients. Methods We obtained 5889 deidentified progress reports (2439 words on average) for 755 cancer patients who have undergone a clinical next generation sequencing (NGS) testing in Wake Forest Baptist Comprehensive Cancer Center for our data analyses. An NLP system was implemented to process the free-text data and extract NGS-related information. Three types of recurrent neural network (RNN) namely, gated recurrent unit, long short-term memory (LSTM), and bidirectional LSTM (LSTM_Bi) were applied to classify documents to the treatment-change and no-treatment-change groups. Further, we compared the performances of RNNs to 5 machine learning algorithms including Naive Bayes, K-nearest Neighbor, Support Vector Machine for classification, Random forest, and Logistic Regression. Results Our results suggested that, overall, RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among the RNNs in terms of accuracy, precision, recall, and F1 score. In addition, pretrained word embedding can improve the accuracy of LSTM by 3.4% and reduce the training time by more than 60%. Discussion and Conclusion NLP and RNN-based text mining solutions have demonstrated advantages in information retrieval and document classification tasks for unstructured clinical progress notes.


Sign in / Sign up

Export Citation Format

Share Document