scholarly journals Tweet Classification Toward Twitter-Based Disease Surveillance: New Data, Methods, and Evaluations (Preprint)

2018 ◽  
Author(s):  
Shoko Wakamiya ◽  
Mizuki Morita ◽  
Yoshinobu Kano ◽  
Tomoko Ohkuma ◽  
Eiji Aramaki

BACKGROUND The amount of medical and clinical-related information on the Web is increasing. Among the different types of information available, social media–based data obtained directly from people are particularly valuable and are attracting significant attention. To encourage medical natural language processing (NLP) research exploiting social media data, the 13th NII Testbeds and Community for Information access Research (NTCIR-13) Medical natural language processing for Web document (MedWeb) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering 3 languages (Japanese, English, and Chinese) and annotated with 8 symptom labels (such as cold, fever, and flu). Then, participants classify each tweet into 1 of the 2 categories: those containing a patient’s symptom and those that do not. OBJECTIVE This study aimed to present the results of groups participating in a Japanese subtask, English subtask, and Chinese subtask along with discussions, to clarify the issues that need to be resolved in the field of medical NLP. METHODS In summary, 8 groups (19 systems) participated in the Japanese subtask, 4 groups (12 systems) participated in the English subtask, and 2 groups (6 systems) participated in the Chinese subtask. In total, 2 baseline systems were constructed for each subtask. The performance of the participant and baseline systems was assessed using the exact match accuracy, F-measure based on precision and recall, and Hamming loss. RESULTS The best system achieved exactly 0.880 match accuracy, 0.920 F-measure, and 0.019 Hamming loss. The averages of match accuracy, F-measure, and Hamming loss for the Japanese subtask were 0.720, 0.820, and 0.051; those for the English subtask were 0.770, 0.850, and 0.037; and those for the Chinese subtask were 0.810, 0.880, and 0.032, respectively. CONCLUSIONS This paper presented and discussed the performance of systems participating in the NTCIR-13 MedWeb task. As the MedWeb task settings can be formalized as the factualization of text, the achievement of this task could be directly applied to practical clinical applications.

2017 ◽  
Vol 5 (12) ◽  
pp. 287-289
Author(s):  
Aditya . ◽  
◽  
◽  
◽  
hare . ◽  
...  

2021 ◽  
Vol 28 (1) ◽  
pp. e100262
Author(s):  
Mustafa Khanbhai ◽  
Patrick Anyadi ◽  
Joshua Symons ◽  
Kelsey Flott ◽  
Ara Darzi ◽  
...  

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.


2021 ◽  
Author(s):  
Arash Maghsoudi ◽  
Sara Nowakowski ◽  
Ritwick Agrawal ◽  
Amir Sharafkhaneh ◽  
Sadaf Aram ◽  
...  

BACKGROUND The COVID-19 pandemic has imposed additional stress on population health that may result in a higher incidence of insomnia. In this study, we hypothesized that using natural language processing (NLP) to explore social media would help to identify the mental health condition of the population experiencing insomnia after the outbreak of COVID-19. OBJECTIVE In this study, we hypothesized that using natural language processing (NLP) to explore social media would help to identify the mental health condition of the population experiencing insomnia after the outbreak of COVID-19. METHODS We designed a pre-post retrospective study using public social media content from Twitter. We categorized tweets based on time into two intervals: prepandemic (01/01/2019 to 01/01/2020) and pandemic (01/01/2020 to 01/01/2021). We used NLP to analyze polarity (positive/negative) and intensity of emotions and also users’ tweets psychological states in terms of sadness, anxiety and anger by counting the words related to these categories in each tweet. Additionally, we performed temporal analysis to examine the effect of time on the users’ insomnia experience. RESULTS We extracted 268,803 tweets containing the word insomnia (prepandemic, 123,293 and pandemic, 145,510). The odds of negative tweets (OR, 1.31; 95% CI, 1.29-1.33), anger (OR, 1.19; 95% CI, 1.16-1.21), and anxiety (OR, 1.24; 95% CI: 1.21-1.26) were higher during the pandemic compared to prepandemic. The likelihood of negative tweets after midnight was higher than for other daily intevals, comprising approximately 60% of all negative insomnia-related tweets in 2020 and 2021 collectively. CONCLUSIONS Twitter users shared more negative tweets about insomnia during the pandemic than during the year before. Also, more anger and anxiety-related content were disseminated during the pandemic on the social media platform. Future studies using an NLP framework could assess tweets about other psychological distress, habit changes, weight gain due to inactivity, and the effect of viral infection on sleep.


2021 ◽  
Author(s):  
Abul Hasan ◽  
Mark Levene ◽  
David Weston ◽  
Renate Fromson ◽  
Nicolas Koslover ◽  
...  

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.


2021 ◽  
Author(s):  
Vishal Dey ◽  
Peter Krasniak ◽  
Minh Nguyen ◽  
Clara Lee ◽  
Xia Ning

BACKGROUND A new illness can come to public attention through social media before it is medically defined, formally documented, or systematically studied. One example is a condition known as breast implant illness (BII), which has been extensively discussed on social media, although it is vaguely defined in the medical literature. OBJECTIVE The objective of this study is to construct a data analysis pipeline to understand emerging illnesses using social media data and to apply the pipeline to understand the key attributes of BII. METHODS We constructed a pipeline of social media data analysis using natural language processing and topic modeling. Mentions related to signs, symptoms, diseases, disorders, and medical procedures were extracted from social media data using the clinical Text Analysis and Knowledge Extraction System. We mapped the mentions to standard medical concepts and then summarized these mapped concepts as topics using latent Dirichlet allocation. Finally, we applied this pipeline to understand BII from several BII-dedicated social media sites. RESULTS Our pipeline identified topics related to toxicity, cancer, and mental health issues that were highly associated with BII. Our pipeline also showed that cancers, autoimmune disorders, and mental health problems were emerging concerns associated with breast implants, based on social media discussions. Furthermore, the pipeline identified mentions such as rupture, infection, pain, and fatigue as common self-reported issues among the public, as well as concerns about toxicity from silicone implants. CONCLUSIONS Our study could inspire future studies on the suggested symptoms and factors of BII. Our study provides the first analysis and derived knowledge of BII from social media using natural language processing techniques and demonstrates the potential of using social media information to better understand similar emerging illnesses. CLINICALTRIAL


Clinical parsing is useful in medical domain .Clinical narratives are difficult to understand as it is in unstructured format .Medical Natural language processing systems are used to make these clinical narratives in readable format. Clinical Parser is the combination of natural language processing and medical lexicon .For making clinical narrative understandable parsing technique is used .In this paper we are discussing about constituency parser for clinical narratives, which is based on phrase structured grammar. This parser convert unstructured clinical narratives into structured report. This paper focus on clinical sentences which is in unstructured format after parsing convert into structured format. For each sentence recall ,precision and bracketing f- measure are calculated .


Sign in / Sign up

Export Citation Format

Share Document