scholarly journals Semi-Supervised Generative Adversarial Network for Sentiment Analysis of drug reviews

Author(s):  
Cristóbal Colón-Ruiz

<div>Sentiment analysis has become a very popular research topic and covers a wide range of domains such as economy, politics and health. In the pharmaceutical field, automated analysis of online user reviews provides information on the effectiveness and potential side effects of drugs, which could be used to improve pharmacovigilance systems. Deep learning approaches have revolutionized the field of Natural Language Processing (NLP), achieving state-of-the-art results in many tasks, such as sentiment analysis.</div><div>These methods require large annotated datasets to train their models. However, in most real-world scenarios, obtaining high-quality labeled datasets is an expensive and time-consuming task. In contrast, unlabeled texts task can be, generally, easily obtained. </div><div>In this work, we propose a semi-supervised approach based on a Semi-Supervised Generative Adversarial Network (SSGAN) to address the lack of labeled data for the sentiment analysis of drug reviews, and improve the results provided by supervised approaches in this task.</div><div>To evaluate the real contribution of this approach, we present a benchmark comparison between our semi-supervised approach and a supervised approach, which uses a similar architecture but without the generative adversal setting. </div><div>Experimental results show better performance of the semi-supervised approach when annotated reviews are less than ten percent of the training set, obtaining a significant improvement for the classification of neutral reviews, the class with least examples. To the best of our knowledge, this is the first study that applies a SSGAN to the sentiment classification of drug reviews. Our semi-supervised approach provides promising results for dealing with the shortage of annotated dataset, but there is still much room to improvement.</div>

2021 ◽  
Author(s):  
Cristóbal Colón-Ruiz

<div>Sentiment analysis has become a very popular research topic and covers a wide range of domains such as economy, politics and health. In the pharmaceutical field, automated analysis of online user reviews provides information on the effectiveness and potential side effects of drugs, which could be used to improve pharmacovigilance systems. Deep learning approaches have revolutionized the field of Natural Language Processing (NLP), achieving state-of-the-art results in many tasks, such as sentiment analysis.</div><div>These methods require large annotated datasets to train their models. However, in most real-world scenarios, obtaining high-quality labeled datasets is an expensive and time-consuming task. In contrast, unlabeled texts task can be, generally, easily obtained. </div><div>In this work, we propose a semi-supervised approach based on a Semi-Supervised Generative Adversarial Network (SSGAN) to address the lack of labeled data for the sentiment analysis of drug reviews, and improve the results provided by supervised approaches in this task.</div><div>To evaluate the real contribution of this approach, we present a benchmark comparison between our semi-supervised approach and a supervised approach, which uses a similar architecture but without the generative adversal setting. </div><div>Experimental results show better performance of the semi-supervised approach when annotated reviews are less than ten percent of the training set, obtaining a significant improvement for the classification of neutral reviews, the class with least examples. To the best of our knowledge, this is the first study that applies a SSGAN to the sentiment classification of drug reviews. Our semi-supervised approach provides promising results for dealing with the shortage of annotated dataset, but there is still much room to improvement.</div>


Author(s):  
Cara Murphy ◽  
John Kerekes

The classification of trace chemical residues through active spectroscopic sensing is challenging due to the lack of physics-based models that can accurately predict spectra. To overcome this challenge, we leveraged the field of domain adaptation to translate data from the simulated to the measured domain for training a classifier. We developed the first 1D conditional generative adversarial network (GAN) to perform spectrum-to-spectrum translation of reflectance signatures. We applied the 1D conditional GAN to a library of simulated spectra and quantified the improvement in classification accuracy on real data using the translated spectra for training the classifier. Using the GAN-translated library, the average classification accuracy increased from 0.622 to 0.723 on real chemical reflectance data, including data from chemicals not included in the GAN training set.


Author(s):  
Changshun Du ◽  
Lei Huang

Text sentiment analysis is one of the most important tasks in the field of public opinion monitoring, service evaluation and satisfaction analysis under network environments. Compared with the traditional Natural Language Processing analysis tools, convolution neural networks can automatically learn useful features from sentences and improve the performance of the affective analysis model. However, the original convolution neural network model ignores sentence structure information which is very important for text sentiment analysis. In this paper, we add piece-wise pooling to the convolution neural network, which allows the model to obtain the sentence structure. And the main features of different sentences are extracted to analyze the emotional tendencies of the text. At the same time, the user’s feedback involves many different fields, and there is less labeled data. In order to alleviate the sparsity of the data, this paper also uses the generative adversarial network to make common feature extractions, so that the model can obtain the common features associated with emotions in different fields, and improves the model’s Generalization ability with less training data. Experiments on different datasets demonstrate the effectiveness of this method.


2019 ◽  
Vol 31 (6) ◽  
pp. 1709-1712
Author(s):  
Majlinda Axhiu

Beside the advantages of the typical sentiment analysis, which focuses on predicting the positive or negative polarity of the given sentence(s), there are two main drawbacks of performing sentiment analysis on higher level, namely on sentence and document level. Firstly, gaining the overall sentiment of a sentence or a paragraph may not lead to accurate and precise information. The polarity will be valid for a broader context and not for particular targets. Secondly, many sentences or paragraphs may have opposing polarities towards different targets. This makes it difficult or impossible to give an accurate overall polarity. The necessity for detecting aspect terms and their corresponding polarity gave rise to aspect-based sentiment analysis (ABSA). To meet the objectives of aspect-based sentiment analysis systems, the process can be summarized in three main tasks: Aspect Term Extraction, Aspect-term and Opinion-word Separation and Sentiment Polarity Classification. Most commonly, supervised learning approaches are used for ABSA. However, having to build the tagged training and testing corpora for each language and each domain is highly time consuming and can often be achieved only manually. This is why we have used a semi-supervised model for designing a language- and domain-independent system that is based on novel machine learning approaches through which we are focused on analyzing Albanian texts and make use of Albanian data in the digital world. In this approach where we try to extract the aspects and the polarity of their corresponding opinions through almost unsupervised learning, the biggest challenge is to reach high accuracy in natural language processing. In order to achieve this, in language-independent systems there must be taken into consideration all the differences and similarities of the languages. In this paper our aim is to define the biggest challenges that appear in Albanian language in comparison with English; and after analyzing certain amount of data, we have identified the following issues: inflections, negation, homonyms, dialects, irony, sarcasm and stop-words’ presence in aspect terms. This is not an exhaustive list of the language issues, since we have selected and discussed only the ones that have greater impact in the process of extracting the aspect-terms and opinions, and can highly affect the accuracy of the final polarity classification of the texts.


Information ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 249
Author(s):  
Xin Jin ◽  
Yuanwen Zou ◽  
Zhongbing Huang

The cell cycle is an important process in cellular life. In recent years, some image processing methods have been developed to determine the cell cycle stages of individual cells. However, in most of these methods, cells have to be segmented, and their features need to be extracted. During feature extraction, some important information may be lost, resulting in lower classification accuracy. Thus, we used a deep learning method to retain all cell features. In order to solve the problems surrounding insufficient numbers of original images and the imbalanced distribution of original images, we used the Wasserstein generative adversarial network-gradient penalty (WGAN-GP) for data augmentation. At the same time, a residual network (ResNet) was used for image classification. ResNet is one of the most used deep learning classification networks. The classification accuracy of cell cycle images was achieved more effectively with our method, reaching 83.88%. Compared with an accuracy of 79.40% in previous experiments, our accuracy increased by 4.48%. Another dataset was used to verify the effect of our model and, compared with the accuracy from previous results, our accuracy increased by 12.52%. The results showed that our new cell cycle image classification system based on WGAN-GP and ResNet is useful for the classification of imbalanced images. Moreover, our method could potentially solve the low classification accuracy in biomedical images caused by insufficient numbers of original images and the imbalanced distribution of original images.


Proceedings ◽  
2021 ◽  
Vol 77 (1) ◽  
pp. 17
Author(s):  
Andrea Giussani

In the last decade, advances in statistical modeling and computer science have boosted the production of machine-produced contents in different fields: from language to image generation, the quality of the generated outputs is remarkably high, sometimes better than those produced by a human being. Modern technological advances such as OpenAI’s GPT-2 (and recently GPT-3) permit automated systems to dramatically alter reality with synthetic outputs so that humans are not able to distinguish the real copy from its counteracts. An example is given by an article entirely written by GPT-2, but many other examples exist. In the field of computer vision, Nvidia’s Generative Adversarial Network, commonly known as StyleGAN (Karras et al. 2018), has become the de facto reference point for the production of a huge amount of fake human face portraits; additionally, recent algorithms were developed to create both musical scores and mathematical formulas. This presentation aims to stimulate participants on the state-of-the-art results in this field: we will cover both GANs and language modeling with recent applications. The novelty here is that we apply a transformer-based machine learning technique, namely RoBerta (Liu et al. 2019), to the detection of human-produced versus machine-produced text concerning fake news detection. RoBerta is a recent algorithm that is based on the well-known Bidirectional Encoder Representations from Transformers algorithm, known as BERT (Devlin et al. 2018); this is a bi-directional transformer used for natural language processing developed by Google and pre-trained over a huge amount of unlabeled textual data to learn embeddings. We will then use these representations as an input of our classifier to detect real vs. machine-produced text. The application is demonstrated in the presentation.


Author(s):  
Neha Thomas ◽  
Susan Elias

 Abstract— Detection of fake review and reviewers is currently a challenging problem in cyber space. It is challenging primarily due to the dynamic nature of the methodology used to fake the review. There are several aspects to be considered when analyzing reviews to classify them effective into genuine and fake. Sentiment analysis, opinion mining and intend mining are fields of research that try to accomplish the goal through Natural Language Processing of the text content of the review.  In this paper, an approach that uses the review ratings evaluated along a timeline is presented. An Amazon dataset comprising of ratings indicated for a wide range of products was used for the analysis presented here. The analysis of the ratings was carried out for an electronic product over a period of six years.  The computed average rating helps to identify linear classifiers that define solution boundaries within the dataspace. This enables a product specific classification of review ratings and suitable recommendations can also be generated automatically. The paper explains a methodology to evaluate the average product ratings over time and presents the research outcomes using a novel classification tool. The proposed approach helps to determine the optimal point to distinguish between fake and genuine ratings for each product.    Index Terms: Fake reviews, Fake Ratings, Product Ratings, Online Shopping, Amazon Dataset.


2021 ◽  
Vol 17 (6) ◽  
pp. e1008981
Author(s):  
Yaniv Morgenstern ◽  
Frieder Hartmann ◽  
Filipp Schmidt ◽  
Henning Tiedemann ◽  
Eugen Prokott ◽  
...  

Shape is a defining feature of objects, and human observers can effortlessly compare shapes to determine how similar they are. Yet, to date, no image-computable model can predict how visually similar or different shapes appear. Such a model would be an invaluable tool for neuroscientists and could provide insights into computations underlying human shape perception. To address this need, we developed a model (‘ShapeComp’), based on over 100 shape features (e.g., area, compactness, Fourier descriptors). When trained to capture the variance in a database of >25,000 animal silhouettes, ShapeComp accurately predicts human shape similarity judgments between pairs of shapes without fitting any parameters to human data. To test the model, we created carefully selected arrays of complex novel shapes using a Generative Adversarial Network trained on the animal silhouettes, which we presented to observers in a wide range of tasks. Our findings show that incorporating multiple ShapeComp dimensions facilitates the prediction of human shape similarity across a small number of shapes, and also captures much of the variance in the multiple arrangements of many shapes. ShapeComp outperforms both conventional pixel-based metrics and state-of-the-art convolutional neural networks, and can also be used to generate perceptually uniform stimulus sets, making it a powerful tool for investigating shape and object representations in the human brain.


2020 ◽  
Vol 4 (Supplement_1) ◽  
Author(s):  
Lina Sulieman ◽  
Jing He ◽  
Robert Carroll ◽  
Lisa Bastarache ◽  
Andrea Ramirez

Abstract Electronic Health Records (EHR) contain rich data to identify and study diabetes. Many phenotype algorithms have been developed to identify research subjects with type 2 diabetes (T2D), but very few accurately identify type 1 diabetes (T1D) cases or more rare forms of monogenic and atypical metabolic presentations. Polygenetic risk scores (PRS) quantify risk of a disease using common genomic variants well for both T1D and T2D. In this study, we apply validated phenotyping algorithms to EHRs linked to a genomic biobank to understand the independent contribution of PRS to classification of diabetes etiology and generate additional novel markers to distinguish subtypes of diabetes in EHR data. Using a de-identified mirror of medical center’s electronic health record, we applied published algorithms for T1D and T2D to identify cases, and used natural language processing and chart review strategies to identify cases of maturity onset diabetes of the young (MODY) and other more rare presentations. This novel approach included additional data types such as medication sequencing, ratio and temporality of insulin and non-insulin agents, clinical genetic testing, and ratios of diagnostic codes. Chart review was performed to validate etiology. To calculate PRS, we used genome wide genotyping from our BioBank, the de-identified biobank linking EHR to genomic data using coefficients of 65 published T1D SNPS and 76,996 T2D SNPS using PLINK in Caucasian subjects. In the dataset, we identified 82,238 cases of T2D but only 130 cases of T1D using the most cited published algorithms. Adding novel structured elements and natural language processing identified an additional 138 cases of T1D and distinguished 354 cases as MODY. Among over 90,000 subjects with genotyping data available, we included 72,624 Caucasian subjects since PRS coefficients were generated in Caucasian cohorts. Among those subjects, 248, 6,488, and 21 subjects were identified as T1D, T2D, and MODY subjects respectively in our final PRS cohort. The T1D PRS did significantly discriminate well between cases and controls (Mann-Whitney p-value is 3.4 e-17). The PRS for T2D did not significantly discriminate between cases and controls using published algorithms. The atypical case count was too low to calculate PRS discrimination. Calculation of the PRS score was limited by quality inclusion of variants available, and discrimination may improve in larger data sets. Additionally, blinded physician case review is ongoing to validate the novel classification scheme and provide a gold standard for machine learning approaches that can be applied in validation sets.


Sign in / Sign up

Export Citation Format

Share Document