scholarly journals A Hybrid Approach to Sentiment Sentence Classification in Suicide Notes

2012 ◽  
Vol 5s1 ◽  
pp. BII.S8961 ◽  
Author(s):  
Sunghwan Sohn ◽  
Manabu Torii ◽  
Dingcheng Li ◽  
Kavishwar Wagholikar ◽  
Stephen Wu ◽  
...  

This paper describes the sentiment classification system developed by the Mayo Clinic team for the 2011 I2B2/VA/Cincinnati Natural Language Processing (NLP) Challenge. The sentiment classification task is to assign any pertinent emotion to each sentence in suicide notes. We have implemented three systems that have been trained on suicide notes provided by the I2B2 challenge organizer–-a machine learning system, a rule-based system, and a system consisting of a combination of both. Our machine learning system was trained on re-annotated data in which apparently inconsistent emotion assignment was adjusted. Then, the machine learning methods by RIPPER and multinomial Naïve Bayes classifiers, manual pattern matching rules, and the combination of the two systems were tested to determine the emotions within sentences. The combination of the machine learning and rule-based system performed best and produced a micro-average F-score of 0.5640.

2018 ◽  
Author(s):  
Sunyang Fu ◽  
Lester Y Leung ◽  
Yanshan Wang ◽  
Anne-Olivia Raulli ◽  
David F Kallmes ◽  
...  

BACKGROUND Silent brain infarction (SBI) is defined as the presence of 1 or more brain lesions, presumed to be because of vascular occlusion, found by neuroimaging (magnetic resonance imaging or computed tomography) in patients without clinical manifestations of stroke. It is more common than stroke and can be detected in 20% of healthy elderly people. Early detection of SBI may mitigate the risk of stroke by offering preventative treatment plans. Natural language processing (NLP) techniques offer an opportunity to systematically identify SBI cases from electronic health records (EHRs) by extracting, normalizing, and classifying SBI-related incidental findings interpreted by radiologists from neuroimaging reports. OBJECTIVE This study aimed to develop NLP systems to determine individuals with incidentally discovered SBIs from neuroimaging reports at 2 sites: Mayo Clinic and Tufts Medical Center. METHODS Both rule-based and machine learning approaches were adopted in developing the NLP system. The rule-based system was implemented using the open source NLP pipeline MedTagger, developed by Mayo Clinic. Features for rule-based systems, including significant words and patterns related to SBI, were generated using pointwise mutual information. The machine learning models adopted convolutional neural network (CNN), random forest, support vector machine, and logistic regression. The performance of the NLP algorithm was compared with a manually created gold standard. RESULTS A total of 5 reports were removed due to invalid scan types. The interannotator agreements across Mayo and Tufts neuroimaging reports were 0.87 and 0.91, respectively. The rule-based system yielded the best performance of predicting SBI with an accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of 0.991, 0.925, 1.000, 1.000, and 0.990, respectively. The CNN achieved the best score on predicting white matter disease (WMD) with an accuracy, sensitivity, specificity, PPV, and NPV of 0.994, 0.994, 0.994, 0.994, and 0.994, respectively. CONCLUSIONS We adopted a standardized data abstraction and modeling process to developed NLP techniques (rule-based and machine learning) to detect incidental SBIs and WMDs from annotated neuroimaging reports. Validation statistics suggested a high feasibility of detecting SBIs and WMDs from EHRs using NLP.


2021 ◽  
Author(s):  
Abul Hasan ◽  
Mark Levene ◽  
David Weston ◽  
Renate Fromson ◽  
Nicolas Koslover ◽  
...  

BACKGROUND The COVID-19 pandemic has created a pressing need for integrating information from disparate sources, in order to assist decision makers. Social media is important in this respect, however, to make sense of the textual information it provides and be able to automate the processing of large amounts of data, natural language processing methods are needed. Social media posts are often noisy, yet they may provide valuable insights regarding the severity and prevalence of the disease in the population. In particular, machine learning techniques for triage and diagnosis could allow for a better understanding of what social media may offer in this respect. OBJECTIVE This study aims to develop an end-to-end natural language processing pipeline for triage and diagnosis of COVID-19 from patient-authored social media posts, in order to provide researchers and other interested parties with additional information on the symptoms, severity and prevalence of the disease. METHODS The text processing pipeline first extracts COVID-19 symptoms and related concepts such as severity, duration, negations, and body parts from patients’ posts using conditional random fields. An unsupervised rule-based algorithm is then applied to establish relations between concepts in the next step of the pipeline. The extracted concepts and relations are subsequently used to construct two different vector representations of each post. These vectors are applied separately to build support vector machine learning models to triage patients into three categories and diagnose them for COVID-19. RESULTS We report that Macro- and Micro-averaged F_{1\ }scores in the range of 71-96% and 61-87%, respectively, for the triage and diagnosis of COVID-19, when the models are trained on human labelled data. Our experimental results indicate that similar performance can be achieved when the models are trained using predicted labels from concept extraction and rule-based classifiers, thus yielding end-to-end machine learning. Also, we highlight important features uncovered by our diagnostic machine learning models and compare them with the most frequent symptoms revealed in another COVID-19 dataset. In particular, we found that the most important features are not always the most frequent ones. CONCLUSIONS Our preliminary results show that it is possible to automatically triage and diagnose patients for COVID-19 from natural language narratives using a machine learning pipeline, in order to provide additional information on the severity and prevalence of the disease through the eyes of social media.


2009 ◽  
Vol 16 (4) ◽  
pp. 571-575 ◽  
Author(s):  
L. C. Childs ◽  
R. Enelow ◽  
L. Simonsen ◽  
N. H. Heintzelman ◽  
K. M. Kowalski ◽  
...  

Author(s):  
Mosammat Tahnin Tariq ◽  
Aidin Massahi ◽  
Rajib Saha ◽  
Mohammed Hadi

Events such as surges in demand or lane blockages can create queue spillbacks even during off-peak periods, resulting in delays and spillbacks to upstream intersections. To address this issue, some transportation agencies have started implementing processes to change signal timings in real time based on traffic signal engineers’ observations of incident and traffic conditions at the intersections upstream and downstream of the congested locations. Decisions to change the signal timing are governed by many factors, such as queue length, conditions of the main and side streets, potential of traffic spilling back to upstream intersections, the importance of upstream cross streets, and the potential of the queue backing up to a freeway ramp. This paper investigates and assesses automating the process of updating the signal timing plans during non-recurrent conditions by capturing the history of the responses of the traffic signal engineers to non-recurrent conditions and utilizing this experience to train a machine learning model. A combination of recursive partitioning and regression decision tree (RPART) and fuzzy rule-based system (FRBS) is utilized in this study to deal with the vagueness and uncertainty of human decisions. Comparing the decisions made based on the resulting fuzzy rules from applying the methodology with previously recorded expert decisions for a project case study indicates accurate recommendations for shifts in the green phases of traffic signals. The simulation results indicate that changing the green times based on the output of the fuzzy rules decreased delays caused by lane blockages or demand surge.


2020 ◽  
pp. 1-22 ◽  
Author(s):  
D. Sykes ◽  
A. Grivas ◽  
C. Grover ◽  
R. Tobin ◽  
C. Sudlow ◽  
...  

Abstract Using natural language processing, it is possible to extract structured information from raw text in the electronic health record (EHR) at reasonably high accuracy. However, the accurate distinction between negated and non-negated mentions of clinical terms remains a challenge. EHR text includes cases where diseases are stated not to be present or only hypothesised, meaning a disease can be mentioned in a report when it is not being reported as present. This makes tasks such as document classification and summarisation more difficult. We have developed the rule-based EdIE-R-Neg, part of an existing text mining pipeline called EdIE-R (Edinburgh Information Extraction for Radiology reports), developed to process brain imaging reports, (https://www.ltg.ed.ac.uk/software/edie-r/) and two machine learning approaches; one using a bidirectional long short-term memory network and another using a feedforward neural network. These were developed on data from the Edinburgh Stroke Study (ESS) and tested on data from routine reports from NHS Tayside (Tayside). Both datasets consist of written reports from medical scans. These models are compared with two existing rule-based models: pyConText (Harkema et al. 2009. Journal of Biomedical Informatics42(5), 839–851), a python implementation of a generalisation of NegEx, and NegBio (Peng et al. 2017. NegBio: A high-performance tool for negation and uncertainty detection in radiology reports. arXiv e-prints, p. arXiv:1712.05898), which identifies negation scopes through patterns applied to a syntactic representation of the sentence. On both the test set of the dataset from which our models were developed, as well as the largely similar Tayside test set, the neural network models and our custom-built rule-based system outperformed the existing methods. EdIE-R-Neg scored highest on F1 score, particularly on the test set of the Tayside dataset, from which no development data were used in these experiments, showing the power of custom-built rule-based systems for negation detection on datasets of this size. The performance gap of the machine learning models to EdIE-R-Neg on the Tayside test set was reduced through adding development Tayside data into the ESS training set, demonstrating the adaptability of the neural network models.


2019 ◽  
Vol 26 (11) ◽  
pp. 1218-1226 ◽  
Author(s):  
Long Chen ◽  
Yu Gu ◽  
Xin Ji ◽  
Chao Lou ◽  
Zhiyong Sun ◽  
...  

Abstract Objective Identifying patients who meet selection criteria for clinical trials is typically challenging and time-consuming. In this article, we describe our clinical natural language processing (NLP) system to automatically assess patients’ eligibility based on their longitudinal medical records. This work was part of the 2018 National NLP Clinical Challenges (n2c2) Shared-Task and Workshop on Cohort Selection for Clinical Trials. Materials and Methods The authors developed an integrated rule-based clinical NLP system which employs a generic rule-based framework plugged in with lexical-, syntactic- and meta-level, task-specific knowledge inputs. In addition, the authors also implemented and evaluated a general clinical NLP (cNLP) system which is built with the Unified Medical Language System and Unstructured Information Management Architecture. Results and Discussion The systems were evaluated as part of the 2018 n2c2-1 challenge, and authors’ rule-based system obtained an F-measure of 0.9028, ranking fourth at the challenge and had less than 1% difference from the best system. While the general cNLP system didn’t achieve performance as good as the rule-based system, it did establish its own advantages and potential in extracting clinical concepts. Conclusion Our results indicate that a well-designed rule-based clinical NLP system is capable of achieving good performance on cohort selection even with a small training data set. In addition, the investigation of a Unified Medical Language System-based general cNLP system suggests that a hybrid system combining these 2 approaches is promising to surpass the state-of-the-art performance.


1996 ◽  
Vol 8 (5) ◽  
pp. 454-458
Author(s):  
Kenichi Matsuura ◽  
◽  
Yukinori Kakazu

There are some great features in distributed problem solving systems, such as fault tolerance, robustness and so on. This system performs problem solving with search depending on an objective function. Distributed rulebased problem systems are considered to be of the same type. That is to say, the set of rules and the objective function exist separately within the system. However, in distributed rule-based systems, a set of rules should hold the objective function. The system should have a set of rules only, and the objective function should exist within that set of rules. In this paper, our objective is to acquire the objective function of a distributed rule-based system. A rule generation mechanism analyzes some given examples and acquires strategies for problem solving to a set of rules. In this way, the set of rules of the examples class in the domain represents the objective function of that class in the domain. Therefore, a solution using those rules keeps the same features as the examples if the problem belongs to the examples class that generates the set of rules. The system implemented by this theory has been applied to the domain of traveling salesman problem. This system has generated a set of rules that has held the objective function of its domain.


2020 ◽  
Vol 22 (3) ◽  
pp. 517-532
Author(s):  
Gabriele De Luca ◽  
Marko Beck

This paper tackles the issue of analyst bias in performance of comparative political analyses on political discourse, by leveraging data and machine-learning over human prior knowledge. The case studied is characterization of the issue of migration in the Croatian political discourse, which was chosen arbitrarily. We developed a machine-learning system that identifies most prominent features in the Croatian political discourse, with regards to migration and were interested solo in comparative political analysis in political science. This system does not rely on human judgement on the part of the researchers, and can be thus considered to be “objective”, short of possible sampling or selection bias. It is replicable. If provided, the same dataset and algorithm used, same conclusions should be reached by any scientist. This result was achieved by creating a text corpus from news items and press releases extracted from the websites of Croatian political parties currently represented in the Parliament. Available and collected data consist of public announcements mainly from IDS (Istarski Demokratski Sabor / Istrian Democratic Assambly), SDSS (Samostalna Demokratska Srpska Stranka / Independed Democratic Serb Party) and HSLS (Hrvatska Socijalno Liberalna Stranka / Croatian Social Liberal Party). Data analyzed suggests three dominant phrases of the research process. All political parties had similar political stand towards pointed out issues. Three most significant phrases were determined. First phrase is related to words “Demography” and “Reduction” and finding suggest that most analyzed articles relates towards migration of Croatian citizens in connection to economic hardships of some kind. Phrase two is related to words “Border” and “Croatia-Serbia” which strongly indicates relation to migration and is related towards inter-Balkan migration, mostly connected with consequences of the Croatian War of Independence from 1990’s, and is of most interest to SDSS, a Serb minority party in Croatia. Phrase three is related towards Marrakesh Agreement (Global Compact for Safe, Orderly and Regular Migration), where most of analyzed data shows that parties have a constructive but ambivalent stance towards migration from the third countries. Research conducted on available data, shows that wide spread international migration is not in the focus of most Croatian political parties, while topics and interest for inter-Balkan and Croatian economic/political migration dominates Croatian political spectre


Sign in / Sign up

Export Citation Format

Share Document