Predicting Drug Indications and Side Effects Using Deep Learning and Transfer Learning

In the area of biology, text mining is commonly used since it obtains the unknown relationship among medicines, phenotypes and syndromes from much information. Enhanced Topic modeling with Improved Predict drug Indications and Side effects using Topic modelling and Natural language processing (ETP-IPISTON) has been employed to predict the drug-phenotype and drug-side effect association. Initially, corpus documents are collected from the literature data and the topics in the data are modeled using logistic Linear Discriminative Analysis (LDA) and Bi-directional Long-Short Term Memory-Conditional Random Field (BILSTM-CRF). From the sentences in the literature data, a dependency graph was constructed which discovered the relations between gene and drug. The product of the drug on phenotype rule was identified by the Gene Regulation Score (GRS) which creates the drug-topic probability matrix. The probability matrix and a syntactic distance measure was processed in Classification and Regression Tree (CART), Naïve Bayes (NB), logistic regression and Convolutional Neural Network (CNN) classifiers for estimating the drug-gene and drug-side effects. Besides the literature data, social media offers various promising resources with massive volume of data that can be useful in the drug-phenotype and drug-side effect association prediction. So in this paper, drug information with gene, disease and side effects are extracted from different social media such as Twitter, Facebook and LinkedIn and it can be used with the literature data to provide more relevant disease and drug relations. In addition to this, topic modeling with transfer learning is introduced to consider the element categories, probability of overlapping elements and deep contextual significance of a text for better modeling of topics. The topic modeling with transfer learning shares as much knowledge as possible between the literature data and social media information for topic modeling. The topics from social media and literature data are used for creating the drug-topic matrix. The probability matrix and syntactic distance measure are given as input to CART, NB, logistic regression and CNN for estimating the drug-gene and drug-side effect association. This proposed work is named as Enhanced Topic Modeling with Transfer Leaning- IPISTON (ETPTL-IPISTON). The simulation findings exhibit that the efficiency of ETPTL-IPISTON than the traditional methods.

Download Full-text

A Bilingual Comparison of Sentiment and Topics for a Product Event on Twitter

Information Systems Frontiers ◽

10.1007/s10796-021-10169-x ◽

2021 ◽

Author(s):

Irina Wedel ◽

Michael Palk ◽

Stefan Voß

Keyword(s):

Social Media ◽

Language Processing ◽

Topic Modeling ◽

New Product ◽

Business Value ◽

Data Driven ◽

New Product Introduction ◽

Social Media Analytics ◽

Product Introduction ◽

Textual Data

AbstractSocial media enable companies to assess consumers’ opinions, complaints and needs. The systematic and data-driven analysis of social media to generate business value is summarized under the term Social Media Analytics which includes statistical, network-based and language-based approaches. We focus on textual data and investigate which conversation topics arise during the time of a new product introduction on Twitter and how the overall sentiment is during and after the event. The analysis via Natural Language Processing tools is conducted in two languages and four different countries, such that cultural differences in the tonality and customer needs can be identified for the product. Different methods of sentiment analysis and topic modeling are compared to identify the usability in social media and in the respective languages English and German. Furthermore, we illustrate the importance of preprocessing steps when applying these methods and identify relevant product insights.

Download Full-text

Análise de discursos em notícias sobre homofobia, racismo e sexismo em comentários de portais brasileiros de notícias

10.14210/cotb.v12.p467-474 ◽

2021 ◽

Author(s):

Lucas Rodrigues ◽

Antonio Jacob Junior ◽

Fábio Lobato

Keyword(s):

Social Media ◽

Natural Language Processing ◽

Sentiment Analysis ◽

Data Visualization ◽

Language Processing ◽

Topic Modeling ◽

Hate Speech ◽

Psychological Impact ◽

Internet Service ◽

General Law

Posts with defamatory content or hate speech are constantly foundon social media. The results for readers are numerous, not restrictedonly to the psychological impact, but also to the growth of thissocial phenomenon. With the General Law on the Protection ofPersonal Data and the Marco Civil da Internet, service providersbecame responsible for the content in their platforms. Consideringthe importance of this issue, this paper aims to analyze the contentpublished (news and comments) on the G1 News Portal with techniquesbased on data visualization and Natural Language Processing,such as sentiment analysis and topic modeling. The results showthat even with most of the comments being neutral or negative andclassified or not as hate speech, the majority of them were acceptedby the users.

Download Full-text

Learning predictive models of drug side-effect relationships from distributed representations of literature-derived semantic predications

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy077 ◽

2018 ◽

Vol 25 (10) ◽

pp. 1339-1350 ◽

Cited By ~ 5

Author(s):

Justin Mower ◽

Devika Subramanian ◽

Trevor Cohen

Keyword(s):

Machine Learning ◽

Language Processing ◽

Side Effect ◽

Cross Validation ◽

Processing System ◽

Biomedical Literature ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Drug Side Effect ◽

Natural Language Processing System

Abstract Objective The aim of this work is to leverage relational information extracted from biomedical literature using a novel synthesis of unsupervised pretraining, representational composition, and supervised machine learning for drug safety monitoring. Methods Using ≈80 million concept-relationship-concept triples extracted from the literature using the SemRep Natural Language Processing system, distributed vector representations (embeddings) were generated for concepts as functions of their relationships utilizing two unsupervised representational approaches. Embeddings for drugs and side effects of interest from two widely used reference standards were then composed to generate embeddings of drug/side-effect pairs, which were used as input for supervised machine learning. This methodology was developed and evaluated using cross-validation strategies and compared to contemporary approaches. To qualitatively assess generalization, models trained on the Observational Medical Outcomes Partnership (OMOP) drug/side-effect reference set were evaluated against a list of ≈1100 drugs from an online database. Results The employed method improved performance over previous approaches. Cross-validation results advance the state of the art (AUC 0.96; F1 0.90 and AUC 0.95; F1 0.84 across the two sets), outperforming methods utilizing literature and/or spontaneous reporting system data. Examination of predictions for unseen drug/side-effect pairs indicates the ability of these methods to generalize, with over tenfold label support enrichment in the top 100 predictions versus the bottom 100 predictions. Discussion and Conclusion Our methods can assist the pharmacovigilance process using information from the biomedical literature. Unsupervised pretraining generates a rich relationship-based representational foundation for machine learning techniques to classify drugs in the context of a putative side effect, given known examples.

Download Full-text

Building the process-drug–side effect network to discover the relationship between biological Processes and side effects

BMC Bioinformatics ◽

10.1186/1471-2105-12-s2-s2 ◽

2011 ◽

Vol 12 (S2) ◽

Cited By ~ 36

Author(s):

Sejoon Lee ◽

Kwang H Lee ◽

Min Song ◽

Doheon Lee

Keyword(s):

Side Effects ◽

Side Effect ◽

Biological Processes ◽

Drug Side Effect ◽

The Relationship

Download Full-text

Ternion: An Autonomous Model for Fake News Detection

Applied Sciences ◽

10.3390/app11199292 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9292

Author(s):

Noman Islam ◽

Asadullah Shaikh ◽

Asma Qaiser ◽

Yousef Asiri ◽

Sultan Almakdi ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Support Vector Machine ◽

Logistic Regression ◽

Language Processing ◽

Negative Impact ◽

Machine Learning Techniques ◽

Support Vector ◽

Fake News ◽

Processing Techniques

In recent years, the consumption of social media content to keep up with global news and to verify its authenticity has become a considerable challenge. Social media enables us to easily access news anywhere, anytime, but it also gives rise to the spread of fake news, thereby delivering false information. This also has a negative impact on society. Therefore, it is necessary to determine whether or not news spreading over social media is real. This will allow for confusion among social media users to be avoided, and it is important in ensuring positive social development. This paper proposes a novel solution by detecting the authenticity of news through natural language processing techniques. Specifically, this paper proposes a novel scheme comprising three steps, namely, stance detection, author credibility verification, and machine learning-based classification, to verify the authenticity of news. In the last stage of the proposed pipeline, several machine learning techniques are applied, such as decision trees, random forest, logistic regression, and support vector machine (SVM) algorithms. For this study, the fake news dataset was taken from Kaggle. The experimental results show an accuracy of 93.15%, precision of 92.65%, recall of 95.71%, and F1-score of 94.15% for the support vector machine algorithm. The SVM is better than the second best classifier, i.e., logistic regression, by 6.82%.

Download Full-text

Transfer Learning with Social Media Content in the Ride-Hailing Domain by Using a Hybrid Machine Learning Architecture

Electronics ◽

10.3390/electronics11020189 ◽

2022 ◽

Vol 11 (2) ◽

pp. 189

Author(s):

Álvaro de Pablo ◽

Oscar Araque ◽

Carlos A. Iglesias

Keyword(s):

Machine Learning ◽

Social Media ◽

Transfer Learning ◽

Topic Modeling ◽

Media Content ◽

Learning From Data ◽

Modeling Techniques ◽

Hybrid Machine ◽

Vector Representations ◽

Google Play

The analysis of the content of posts written on social media has established an important line of research in recent years. The study of these texts, as well as their relationship with each other and their dependence on the platform on which they are written, enables the behavior analysis of users and their opinions with respect to different domains. In this work, a hybrid machine learning-based system has been developed to classify texts using topic modeling techniques and different word-vector representations, as well as traditional text representations. The system has been trained with ride-hailing posts extracted from Reddit, showing promising performance. Then, the generated models have been tested with data extracted from other sources such as Twitter and Google Play, classifying these texts without retraining any models and thus performing Transfer Learning. The obtained results show that our proposed architecture is effective when performing Transfer Learning from data-rich domains and applying them to other sources.

Download Full-text

Discovering biological processes and side effects relationship using the process-drug-side effect network

Proceedings of the ACM fourth international workshop on Data and text mining in biomedical informatics - DTMBIO '10 ◽

10.1145/1871871.1871878 ◽

2010 ◽

Author(s):

Sejoon Lee ◽

Min Song ◽

Doheon Lee

Keyword(s):

Side Effects ◽

Side Effect ◽

Biological Processes ◽

Drug Side Effect

Download Full-text

Predicting new drug indications from network analysis

International Journal of Modern Physics C ◽

10.1142/s0129183117501182 ◽

2017 ◽

Vol 28 (09) ◽

pp. 1750118

Author(s):

Yousoff Effendy Mohd Ali ◽

Kiam Heong Kwa ◽

Kurunathan Ratnavelu

Keyword(s):

Side Effects ◽

Network Analysis ◽

Side Effect ◽

Drug Repositioning ◽

Centrality Measures ◽

Drug Side Effect ◽

Alternative Approach ◽

Therapeutic Properties ◽

Basic Hypothesis ◽

Optimum Threshold

This work adapts centrality measures commonly used in social network analysis to identify drugs with better positions in drug-side effect network and drug-indication network for the purpose of drug repositioning. Our basic hypothesis is that drugs having similar phenotypic profiles such as side effects may also share similar therapeutic properties based on related mechanism of action and vice versa. The networks were constructed from Side Effect Resource (SIDER) 4.1 which contains 1430 unique drugs with side effects and 1437 unique drugs with indications. Within the giant components of these networks, drugs were ranked based on their centrality scores whereby 18 prominent drugs from the drug-side effect network and 15 prominent drugs from the drug-indication network were identified. Indications and side effects of prominent drugs were deduced from the profiles of their neighbors in the networks and compared to existing clinical studies while an optimum threshold of similarity among drugs was sought for. The threshold can then be utilized for predicting indications and side effects of all drugs. Similarities of drugs were measured by the extent to which they share phenotypic profiles and neighbors. To improve the likelihood of accurate predictions, only profiles such as side effects of common or very common frequencies were considered. In summary, our work is an attempt to offer an alternative approach to drug repositioning using centrality measures commonly used for analyzing social networks.

Download Full-text

Unfolding determinants of COVID-19 vaccine acceptance in China (Preprint)

10.2196/preprints.26089 ◽

2020 ◽

Author(s):

Fulian Yin ◽

Zhaoliang Wu ◽

Xinyu Xia ◽

Meiqi Ji ◽

Yanyan Wang ◽

...

Keyword(s):

Social Media ◽

Public Opinion ◽

Side Effects ◽

Language Processing ◽

Large Scale ◽

Chinese Government ◽

Vaccine Acceptance ◽

Reproduction Ratio ◽

Inactivated Vaccines ◽

Chinese Social Media

BACKGROUND China is at the forefront of global efforts to develop COVID-19 vaccines and has five fast-tracked candidates in the final-stage, large scale human clinical trials tests. Layered on top of public engagement, making an informed and judicious choice is a catch-22 for the Chinese government in the context of COVID-19 vaccination promotion. OBJECTIVE In this study, public opinions in China are analyzed via public dialogues on Chinese social media, based on which the views on COVID-19 vaccines and vaccination of Chinese netizens are investigated. We recommend strategies for promoting vaccination programs in the most populous country based on in-depth understanding of the challenges in risk communication and social mobilizations. METHODS We proposed a novel emotional dynamics model SRS/I to analyze the opinion transmission paradigms on Chinese social media. Coupled with meta-analysis and natural language processing (NLP) techniques, the emotion polarity of individual opinion is examined in contexts. RESULTS We collected more than 1.75 million Weibo messages about COVID-19 vaccines from January to October in 2020. According to the public opinion reproduction ratio (R_0), the dynamic propagation of those messages can be classified into three-stage: the Ferment period (R_0,1.1360), the Evolution period (R_0, 2.8278) and the Transmission period (R_0, 3.0729). Significantly, the topics on COVID-19 vaccine acceptance in China are emerging from the landscape of public opinion transmission, such as Price, side effects, and the like. From September to October, 18.3% people held the idea that the vaccine price is high and gets 38.1% “likes,” while 35.9% people regarded it as inexpensive with 25.0% “likes.” The netizen’s emotional polarity on side effects is also the aspect of our research. We got 47.7% positive and 31.9% negative comments. We also captured that the inactivated vaccines aroused much more heated discussion than any other type of vaccine. It accounts for 53% of Discussions of all types’ vaccines, 42% of Forwards, 56% of Comments, and 49% of Likes. CONCLUSIONS Most Chinese hold that the vaccine is cheaper than previously thought, while some claim they could not afford it for their entire family. The Chinese are inclined to be positive to side effects over time and proud of China’s development regarding vaccines. Nevertheless, they have a collective misunderstanding about inactivated vaccines, insisting that inactivated vaccines are safer than other vaccines. Reflecting on those collective responses, the unfolding determinants of COVID-19 vaccine acceptance provide illuminating benchmarks for vaccine-promoting policy-makings.

Download Full-text

Occupants’ Satisfaction with LEED- and Non-LEED-Certified Apartments Using Social Media Data

10.31219/osf.io/8q4zt ◽

2021 ◽

Author(s):

Xingtong Guo ◽

Kyumin Lee ◽

Zhe Wang ◽

Shichao Liu

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Language Processing ◽

Topic Modeling ◽

System Development ◽

Residential Buildings ◽

Statistical Significance ◽

Environmental Design ◽

Online Reviews ◽

Star Rating

Leadership in Energy and Environmental Design (LEED) certified buildings aim to offer a sustainable and healthy built environment. Previous studies have shown mixed and inconsistent results on whether occupants in LEED-certified buildings are more satisfying than in non-LEED-certified counterparts. Those studies usually based on surveys or questionnaires for commercial buildings were limited by sample size and pre-defined question structures. Since most people stay longer at home during the COVID-19 pandemic and the trend might continue in the post-pandemic era, assessing the satisfaction with LEED-certified residential buildings benefits future environmental design and certification system development. In this work, we propose a natural language processing-based approach for such assessment. The study collected 16,761 online reviews on 260 LEED-certified apartments and 180 non-LEED-certified-apartments from social media, then applied topic modeling and sentiment analysis to evaluate occupants’ satisfaction. Based on topic modeling, we categorized online comments into three topic clusters, 1) location and transportation, 2) running cost, and 3) health and wellbeing. The subsequent sentiment analysis has shown a statistically significant but small or negligible enhancement in the satisfaction occurring in LEED-certified apartments compared to non-LEED-certified ones concerning all the three topic clusters. The “significant but small or negligible uptick” has also been found in online star rating and indoor environmental satisfaction. The only exception with a large effect size is lighting that is significantly more satisfying in LEED-certified apartments. Nevertheless, the statistical significance in online star rating disappears when it is normalized by rent price and property house value.

Download Full-text