scholarly journals Identification of Adverse Drug Event–Related Japanese Articles: Natural Language Processing Analysis (Preprint)

2020 ◽  
Author(s):  
Shogo Ujiie ◽  
Shuntaro Yada ◽  
Shoko Wakamiya ◽  
Eiji Aramaki

BACKGROUND Medical articles covering adverse drug events (ADEs) are systematically reported by pharmaceutical companies for drug safety information purposes. Although policies governing reporting to regulatory bodies vary among countries and regions, all medical article reporting may be categorized as precision or recall based. Recall-based reporting, which is implemented in Japan, requires the reporting of any possible ADE. Therefore, recall-based reporting can introduce numerous false negatives or substantial amounts of noise, a problem that is difficult to address using limited manual labor. OBJECTIVE Our aim was to develop an automated system that could identify ADE-related medical articles, support recall-based reporting, and alleviate manual labor in Japanese pharmaceutical companies. METHODS Using medical articles as input, our system based on natural language processing applies document-level classification to extract articles containing ADEs (replacing manual labor in the first screening) and sentence-level classification to extract sentences within those articles that imply ADEs (thus supporting experts in the second screening). We used 509 Japanese medical articles annotated by a medical engineer to evaluate the performance of the proposed system. RESULTS Document-level classification yielded an F1 of 0.903. Sentence-level classification yielded an F1 of 0.413. These were averages of fivefold cross-validations. CONCLUSIONS A simple automated system may alleviate the manual labor involved in screening drug safety–related medical articles in pharmaceutical companies. After improving the accuracy of the sentence-level classification by considering a wider context, we intend to apply this system toward real-world postmarketing surveillance.

10.2196/22661 ◽  
2020 ◽  
Vol 8 (11) ◽  
pp. e22661
Author(s):  
Shogo Ujiie ◽  
Shuntaro Yada ◽  
Shoko Wakamiya ◽  
Eiji Aramaki

Background Medical articles covering adverse drug events (ADEs) are systematically reported by pharmaceutical companies for drug safety information purposes. Although policies governing reporting to regulatory bodies vary among countries and regions, all medical article reporting may be categorized as precision or recall based. Recall-based reporting, which is implemented in Japan, requires the reporting of any possible ADE. Therefore, recall-based reporting can introduce numerous false negatives or substantial amounts of noise, a problem that is difficult to address using limited manual labor. Objective Our aim was to develop an automated system that could identify ADE-related medical articles, support recall-based reporting, and alleviate manual labor in Japanese pharmaceutical companies. Methods Using medical articles as input, our system based on natural language processing applies document-level classification to extract articles containing ADEs (replacing manual labor in the first screening) and sentence-level classification to extract sentences within those articles that imply ADEs (thus supporting experts in the second screening). We used 509 Japanese medical articles annotated by a medical engineer to evaluate the performance of the proposed system. Results Document-level classification yielded an F1 of 0.903. Sentence-level classification yielded an F1 of 0.413. These were averages of fivefold cross-validations. Conclusions A simple automated system may alleviate the manual labor involved in screening drug safety–related medical articles in pharmaceutical companies. After improving the accuracy of the sentence-level classification by considering a wider context, we intend to apply this system toward real-world postmarketing surveillance.


2018 ◽  
Author(s):  
Αγγελική-Σπυριδούλα Βλαχοστέργιου

Τα τελευταία χρόνια έχει παρατηρηθεί μια αύξηση του αριθμού των προσπαθειών για την αυτόματη αναγνώριση και κατηγοριοποίηση του ανθρωπίνου συναισθήματος χρησιμοποιώντας σήματα φυσιολογίας, σήματα από το πρόσωπο, τη φωνή, καθώς επίσης και προσωπικές ερμηνείες από κείμενα μεγάλων κοινωνικών δεδομένων. Αρκετοί είναι οι τομείς της έρευνας που θα μπορούσαν να επωφεληθούν από αυτά τα συστήματα: διαδραστικά συστήματα διδασκαλίας, τα οποία να επιτρέπουν στους εκπαιδευτικούς να γνωρίζουν το άγχος των φοιτητών, πρόληψη των ατυχημάτων (π.χ. εντοπισμός της κόπωσης του οδηγού), στρατιωτικά ομαδικά καθήκοντα που χαρακτηρίζονται από μεγάλης διάρκειας περιόδους άγχους και πίεσης και εφαρμογές στον τομέα της Υγείας για την έγκαιρη διάγνωση νευροεκφυλιστικών νόσων (π.χ. νόσος του Πάρκινσον), όπου η εκδήλωση των συμπτωμάτων συμβαίνει πολλά χρόνια μετά την έναρξη του νευροεκφυλισμού.Ωστόσο, παρά τις μέχρι τώρα ερευνητικές προσπάθειες, δεν έχει επιτευχθεί ο μακροπρόθεσμος στόχος της δημιουργίας ενός ισχυρού πλαισίου αναγνώρισης του εξεταζόμενου τομέα έρευνας που να βασίζεται στην ανάλυση και στην ερμηνεία του. Δεν υπάρχει καμία αμφιβολία ότι η δημιουργία του συναισθήματος (affect production) επηρεάζεται από το εκάστοτε πλαίσιο που λαμβάνει χώρα τη δεδομένη στιγμή, όπως το έργο στο οποίο υποβάλλεται ο χρήστης, τα άτομα που αλληλεπιδρούν με το χρήστη, η ταυτότητα αλλά και η εκφραστικότητά τους. Η οποιαδήποτε λοιπόν συμπληρωματική μορφή πληροφορίας πλαισίου αναφορικά με τον εξεταζόμενο τομέα έρευνας μας βοηθά ώστε να απαντήσουμε στο ερώτημα: τί είναι πιθανότερο να συμβεί, εκτρέποντας έτσι τον ταξινομητή από τις πιθανότερες/σχετικές κατηγορίες. Χωρίς το πλαίσιο, ακόμη και οι άνθρωποι μπορεί να παρερμηνεύουν τις παρατηρούμενες εκφράσεις του. Έτσι, με την αντιμετώπιση των προκλήσεων υπό το πρίσμα της αναγνώρισης του συναισθήματος υπό συγκεκριμένο πλαίσιο (context-aware affect analysis), δηλαδή με την καλύτερη μελέτη των πληροφοριών πλαισίου, με την ερμηνεία του σε συγκεκριμένους τομείς εφαρμογών, την αναπαράστασή του, τη μοντελοποίησή του, μπορούμε να προσεγγίσουμε καλύτερα την αναγνώριση του συναισθήματος σε πραγματικό χρόνο. Αντίστοιχα, στον τομέα των προσωπικών ερμηνειών από το κείμενο (Sentiment Analysis) αλλά και γενικότερα στον τομέα της Φυσικής Γλώσσας (Natural Language Processing (NLP)) η συνεισφορά του πλαισίου έγκειται στην καλύτερη αναγνώριση, ερμηνεία και επεξεργασία των απόψεων (opinions) και συναισθημάτων (sentiments) σε κείμενα, τα οποία εξετάζονται σε επίπεδο κειμένου (document-level), προτάσεων sentence-level και χαρακτηριστικών (aspect-level) αντίστοιχα. Στην περίπτωση αυτή, λαμβάνονται υπόψιν η σημασιολογία, οι γνωστικές και οι συναισθηματικές πληροφορίες των υποκειμενικών απαντήσεων των ατόμων. Ειδικότερα, στον τομέα αυτό, η συνεισφορά μας έγκειται στην εκπαίδευση ισχυρών αναπαραστάσεις χαρακτηριστικών από μη επισημειωμένα δεδομένα με τη χρήση Νευρωνικών Δικτύων και συγκεκριμένα με τη χρήση Ανταγωνιστικά Παραγωγικών Μοντέλων (GANs), η χρήση των οποίων έχει επιδείξει εντυπωσιακά αποτελέσματα στον τομέα της Όρασης Υπολογιστών. Η πρωτοτυπία της συγκεριμένης μεθόδου έγκειται στον τρόπο υλοποίησης του μοντέλου, στην επιλογή των υπερπαραμετρων, στη χρήση μη επιβλεπόμενης μάθησης και στην πειραματική επικύρωση του προτεινόμενου μοντέλου σε σώματα κειμένου που προέρχονται από διαφορετικές πηγές αναφορικά με το είδος τους και την έκτασή τους.


2018 ◽  
Vol 10 ◽  
pp. 117822261879286 ◽  
Author(s):  
Glen Coppersmith ◽  
Ryan Leary ◽  
Patrick Crutchley ◽  
Alex Fine

Suicide is among the 10 most common causes of death, as assessed by the World Health Organization. For every death by suicide, an estimated 138 people’s lives are meaningfully affected, and almost any other statistic around suicide deaths is equally alarming. The pervasiveness of social media—and the near-ubiquity of mobile devices used to access social media networks—offers new types of data for understanding the behavior of those who (attempt to) take their own lives and suggests new possibilities for preventive intervention. We demonstrate the feasibility of using social media data to detect those at risk for suicide. Specifically, we use natural language processing and machine learning (specifically deep learning) techniques to detect quantifiable signals around suicide attempts, and describe designs for an automated system for estimating suicide risk, usable by those without specialized mental health training (eg, a primary care doctor). We also discuss the ethical use of such technology and examine privacy implications. Currently, this technology is only used for intervention for individuals who have “opted in” for the analysis and intervention, but the technology enables scalable screening for suicide risk, potentially identifying many people who are at risk preventively and prior to any engagement with a health care system. This raises a significant cultural question about the trade-off between privacy and prevention—we have potentially life-saving technology that is currently reaching only a fraction of the possible people at risk because of respect for their privacy. Is the current trade-off between privacy and prevention the right one?


2017 ◽  
Vol 11 (03) ◽  
pp. 345-371
Author(s):  
Avani Chandurkar ◽  
Ajay Bansal

With the inception of the World Wide Web, the amount of data present on the Internet is tremendous. This makes the task of navigating through this enormous amount of data quite difficult for the user. As users struggle to navigate through this wealth of information, the need for the development of an automated system that can extract the required information becomes urgent. This paper presents a Question Answering system to ease the process of information retrieval. Question Answering systems have been around for quite some time and are a sub-field of information retrieval and natural language processing. The task of any Question Answering system is to seek an answer to a free form factual question. The difficulty of pinpointing and verifying the precise answer makes question answering more challenging than simple information retrieval done by search engines. The research objective of this paper is to develop a novel approach to Question Answering based on a composition of conventional approaches of Information Retrieval (IR) and Natural Language processing (NLP). The focus is on using a structured and annotated knowledge base instead of an unstructured one. The knowledge base used here is DBpedia and the final system is evaluated on the Text REtrieval Conference (TREC) 2004 questions dataset.


2019 ◽  
Vol 25 (4) ◽  
pp. 467-482 ◽  
Author(s):  
Aarne Talman ◽  
Anssi Yli-Jyrä ◽  
Jörg Tiedemann

AbstractSentence-level representations are necessary for various natural language processing tasks. Recurrent neural networks have proven to be very effective in learning distributed representations and can be trained efficiently on natural language inference tasks. We build on top of one such model and propose a hierarchy of bidirectional LSTM and max pooling layers that implements an iterative refinement strategy and yields state of the art results on the SciTail dataset as well as strong results for Stanford Natural Language Inference and Multi-Genre Natural Language Inference. We can show that the sentence embeddings learned in this way can be utilized in a wide variety of transfer learning tasks, outperforming InferSent on 7 out of 10 and SkipThought on 8 out of 9 SentEval sentence embedding evaluation tasks. Furthermore, our model beats the InferSent model in 8 out of 10 recently published SentEval probing tasks designed to evaluate sentence embeddings’ ability to capture some of the important linguistic properties of sentences.


2021 ◽  
Author(s):  
Priya B ◽  
Nandhini J.M ◽  
Gnanasekaran T

Natural Language processing (NLP) dealing with Artificial Intelligence concept is a subfield of Computer Science, enabling computers to understand and process human language. Natural Language Processing being a part of artificial intelligence provides understanding of human language by computers for the purpose of extracting information or insights and create meaningful response. It involves creating algorithms that transform text in to words labeling With the emerging advancements in Machine learning and Deep Learning, NLP can contributed a lot towards health sector, education, agriculture and so on. This paper summarizes the various aspects of NLP along with case studies associated with Health Sector for Voice Automated System, prediction of Diabetes Millets, Crop Detection technique in Agriculture Sector.


Author(s):  
Prof. P. Y. Pawar

This project was primarily aimed to create an automated system for solving captcha’s automatically. CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Human Apart) are the Internet’s first line of defence against automated account creation and service abuse. This paper presents unCaptcha, an automates system that can solve Captcha’s most difficult auditory challenges with high success rate using Deep Learning and Natural Language processing. There are four types of Captcha’s Audio Captcha,Text based captcha, Image captcha,Maths-solver captcha.


2020 ◽  
Vol 10 (4) ◽  
pp. 286
Author(s):  
Tak Sung Heo ◽  
Yu Seop Kim ◽  
Jeong Myeong Choi ◽  
Yeong Seok Jeong ◽  
Soo Young Seo ◽  
...  

Brain magnetic resonance imaging (MRI) is useful for predicting the outcome of patients with acute ischemic stroke (AIS). Although deep learning (DL) using brain MRI with certain image biomarkers has shown satisfactory results in predicting poor outcomes, no study has assessed the usefulness of natural language processing (NLP)-based machine learning (ML) algorithms using brain MRI free-text reports of AIS patients. Therefore, we aimed to assess whether NLP-based ML algorithms using brain MRI text reports could predict poor outcomes in AIS patients. This study included only English text reports of brain MRIs examined during admission of AIS patients. Poor outcome was defined as a modified Rankin Scale score of 3–6, and the data were captured by trained nurses and physicians. We only included MRI text report of the first MRI scan during the admission. The text dataset was randomly divided into a training and test dataset with a 7:3 ratio. Text was vectorized to word, sentence, and document levels. In the word level approach, which did not consider the sequence of words, and the “bag-of-words” model was used to reflect the number of repetitions of text token. The “sent2vec” method was used in the sensation-level approach considering the sequence of words, and the word embedding was used in the document level approach. In addition to conventional ML algorithms, DL algorithms such as the convolutional neural network (CNN), long short-term memory, and multilayer perceptron were used to predict poor outcomes using 5-fold cross-validation and grid search techniques. The performance of each ML classifier was compared with the area under the receiver operating characteristic (AUROC) curve. Among 1840 subjects with AIS, 645 patients (35.1%) had a poor outcome 3 months after the stroke onset. Random forest was the best classifier (0.782 of AUROC) using a word-level approach. Overall, the document-level approach exhibited better performance than did the word- or sentence-level approaches. Among all the ML classifiers, the multi-CNN algorithm demonstrated the best classification performance (0.805), followed by the CNN (0.799) algorithm. When predicting future clinical outcomes using NLP-based ML of radiology free-text reports of brain MRI, DL algorithms showed superior performance over the other ML algorithms. In particular, the prediction of poor outcomes in document-level NLP DL was improved more by multi-CNN and CNN than by recurrent neural network-based algorithms. NLP-based DL algorithms can be used as an important digital marker for unstructured electronic health record data DL prediction.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Sunny X. Tang ◽  
Reno Kriz ◽  
Sunghye Cho ◽  
Suh Jung Park ◽  
Jenna Harowitz ◽  
...  

AbstractComputerized natural language processing (NLP) allows for objective and sensitive detection of speech disturbance, a hallmark of schizophrenia spectrum disorders (SSD). We explored several methods for characterizing speech changes in SSD (n = 20) compared to healthy control (HC) participants (n = 11) and approached linguistic phenotyping on three levels: individual words, parts-of-speech (POS), and sentence-level coherence. NLP features were compared with a clinical gold standard, the Scale for the Assessment of Thought, Language and Communication (TLC). We utilized Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art embedding algorithm incorporating bidirectional context. Through the POS approach, we found that SSD used more pronouns but fewer adverbs, adjectives, and determiners (e.g., “the,” “a,”). Analysis of individual word usage was notable for more frequent use of first-person singular pronouns among individuals with SSD and first-person plural pronouns among HC. There was a striking increase in incomplete words among SSD. Sentence-level analysis using BERT reflected increased tangentiality among SSD with greater sentence embedding distances. The SSD sample had low speech disturbance on average and there was no difference in group means for TLC scores. However, NLP measures of language disturbance appear to be sensitive to these subclinical differences and showed greater ability to discriminate between HC and SSD than a model based on clinical ratings alone. These intriguing exploratory results from a small sample prompt further inquiry into NLP methods for characterizing language disturbance in SSD and suggest that NLP measures may yield clinically relevant and informative biomarkers.


Sign in / Sign up

Export Citation Format

Share Document