scholarly journals Sentiment Analysis using Multiple Word Embedding for Words

2020 ◽  
Vol 9 (1) ◽  
pp. 1689-1693

Nowadays users express their opinions on different websites like e-commerce and special review websites. Analyzing customers' opinions and their responses is important for decision making. So the researchers worked on analyzing these reviews automatically using a classical machine learning approach like Support Vector Machine (SVM) and various modern deep neural networks. For these networks, words are represented by using vectors called word embeddings. The required word embeddings are taken from pre-trained Word2Vec or learned from a corpus of the given main task. But, each method has its demerits. In the case of pre-trained word embeddings, embeddings are learned from large general corpus so these embeddings are not task specific. While in the case of learning words from the corpus of the main task, it does not reflect the true semantics. To deal with these problems, we have proposed an embedding developer model. This model develops task specific word embedding which also reflects true semantics. Task specific word embeddings are generated from the given corpus using the embedding layer in Keras. It builds the embeddings by considering relationships between words in the window. While true semantics are taken from Word2Vec embeddings. The proposed model combines these two embeddings to generate true semantics and task specific word embeddings. Result analysis shows that the proposed system works better on many benchmark dataset

Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1589
Author(s):  
Yongkeun Hwang ◽  
Yanghoon Kim ◽  
Kyomin Jung

Neural machine translation (NMT) is one of the text generation tasks which has achieved significant improvement with the rise of deep neural networks. However, language-specific problems such as handling the translation of honorifics received little attention. In this paper, we propose a context-aware NMT to promote translation improvements of Korean honorifics. By exploiting the information such as the relationship between speakers from the surrounding sentences, our proposed model effectively manages the use of honorific expressions. Specifically, we utilize a novel encoder architecture that can represent the contextual information of the given input sentences. Furthermore, a context-aware post-editing (CAPE) technique is adopted to refine a set of inconsistent sentence-level honorific translations. To demonstrate the efficacy of the proposed method, honorific-labeled test data is required. Thus, we also design a heuristic that labels Korean sentences to distinguish between honorific and non-honorific styles. Experimental results show that our proposed method outperforms sentence-level NMT baselines both in overall translation quality and honorific translations.


2021 ◽  
Author(s):  
Guojun Huang ◽  
Cheng Wang ◽  
Xi Fu

Aims: Individualized patient profiling is instrumental for personalized management in hepatocellular carcinoma (HCC). This study built a model based on bidirectional deep neural networks (BiDNNs), an unsupervised machine-learning approach, to integrate multi-omics data and predict survival in HCC. Methods: DNA methylation and mRNA expression data for HCC samples from the TCGA database were integrated using BiDNNs. With optimal clusters as labels, a support vector machine model was developed to predict survival. Results: Using the BiDNN-based model, samples were clustered into two survival subgroups. The survival subgroup classification was an independent prognostic factor. BiDNNs were superior to multimodal autoencoders. Conclusion: This study constructed and validated a BiDNN-based model for predicting prognosis in HCC, with implications for individualized therapies in HCC.


2021 ◽  
Vol 13 (9) ◽  
pp. 239
Author(s):  
Danveer Rajpal ◽  
Akhil Ranjan Garg ◽  
Om Prakash Mahela ◽  
Hassan Haes Alhelou ◽  
Pierluigi Siano

Hindi is the official language of India and used by a large population for several public services like postal, bank, judiciary, and public surveys. Efficient management of these services needs language-based automation. The proposed model addresses the problem of handwritten Hindi character recognition using a machine learning approach. The pre-trained DCNN models namely; InceptionV3-Net, VGG19-Net, and ResNet50 were used for the extraction of salient features from the characters’ images. A novel approach of fusion is adopted in the proposed work; the DCNN-based features are fused with the handcrafted features received from Bi-orthogonal discrete wavelet transform. The feature size was reduced by the Principal Component Analysis method. The hybrid features were examined with popular classifiers namely; Multi-Layer Perceptron (MLP) and Support Vector Machine (SVM). The recognition cost was reduced by 84.37%. The model achieved significant scores of precision, recall, and F1-measure—98.78%, 98.67%, and 98.69%—with overall recognition accuracy of 98.73%.


Author(s):  
Abdullah Talha Kabakus

As a natural consequence of offering many advantages to their users, social media platforms have become a part of daily lives. Recent studies emphasize the necessity of an automated way of detecting the offensive posts in social media since these ‘toxic’ posts have become pervasive. To this end, a novel toxic post detection approach based on Deep Neural Networks was proposed within this study. Given that several word embedding methods exist, we shed light on which word embedding method produces better results when employed with the five most common types of deep neural networks, namely,  , , , , and a combination of  and . To this end, the word vectors for the given comments were obtained through four different methods, namely, () , () , () , and () the  layer of deep neural networks. Eventually, a total of twenty benchmark models were proposed and both trained and evaluated on a gold standard dataset which consists of  tweets. According to the experimental result, the best , , was obtained on the proposed  model without employing pre-trained word vectors which outperformed the state-of-the-art works and implies the effective embedding ability of s. Other key findings obtained through the conducted experiments are that the models, that constructed word embeddings through the  layers, obtained higher s and converged much faster than the models that utilized pre-trained word vectors.


2020 ◽  
Vol 34 (10) ◽  
pp. 13907-13908
Author(s):  
Edoardo Savini ◽  
Cornelia Caragea

Sarcasm detection plays an important role in natural language processing as it has been considered one of the most challenging subtasks in sentiment analysis and opinion mining applications. Our work aims to detect sarcasm in social media sites and discussion forums, exploiting the potential of deep neural networks and multi-task learning. Specifically, relying on the strong correlation between sarcasm and (implied negative) sentiment, we explore a multi-task learning framework that uses sentiment classification as an auxiliary task to inform the main task of sarcasm detection. Our proposed model outperforms many previous baseline methods on an existing large dataset annotated with sarcasm.


Author(s):  
Guangxu Xun ◽  
Yaliang Li ◽  
Wayne Xin Zhao ◽  
Jing Gao ◽  
Aidong Zhang

Conventional correlated topic models are able to capture correlation structure among latent topics by replacing the Dirichlet prior with the logistic normal distribution. Word embeddings have been proven to be able to capture semantic regularities in language. Therefore, the semantic relatedness and correlations between words can be directly calculated in the word embedding space, for example, via cosine values. In this paper, we propose a novel correlated topic model using word embeddings. The proposed model enables us to exploit the additional word-level correlation information in word embeddings and directly model topic correlation in the continuous word embedding space. In the model, words in documents are replaced with meaningful word embeddings, topics are modeled as multivariate Gaussian distributions over the word embeddings and topic correlations are learned among the continuous Gaussian topics. A Gibbs sampling solution with data augmentation is given to perform inference. We evaluate our model on the 20 Newsgroups dataset and the Reuters-21578 dataset qualitatively and quantitatively. The experimental results show the effectiveness of our proposed model.


Author(s):  
Ensaf Hussein Mohamed ◽  
Mohammed ElSaid Moussa ◽  
Mohamed Hassan Haggag

Sentiment analysis (SA) is a technique that lets people in different fields such as business, economy, research, government, and politics to know about people’s opinions, which greatly affects the process of decision-making. SA techniques are classified into: lexicon-based techniques, machine learning techniques, and a hybrid between both approaches. Each approach has its limitations and drawbacks, the machine learning approach depends on manual feature extraction, lexicon-based approach relies on sentiment lexicons that are usually unscalable, unreliable, and manually annotated by human experts. Nowadays, word-embedding techniques have been commonly used in SA classification. Currently, Word2Vec and GloVe are some of the most accurate and usable word embedding techniques, which can transform words into meaningful semantic vectors. However, these techniques ignore sentiment information of texts and require a huge corpus of texts for training and generating accurate vectors, which are used as inputs of deep learning models. In this paper, we propose an enhanced ensemble classifier framework. Our framework is based on our previously published lexicon-based method, bag-of-words, and pre-trained word embedding, first the sentence is preprocessed by removing stop-words, POS tagging, stemming and lemmatization, shortening exaggerated word. Second, the processed sentence is passed to three modules, our previous lexicon-based method (Sum Votes), bag-of-words module and semantic module (Word2Vec and Glove) and produced feature vectors. Finally, the previous features vectors are fed into 11 different classifiers. The proposed framework is tested and evaluated over four datasets with five different lexicons, the experiment results show that our proposed model outperforms the previous lexicon based and the machine learning methods individually.


2021 ◽  
pp. 233-252
Author(s):  
Upendar Rao Rayala ◽  
Karthick Seshadri

Sentiment analysis is perceived to be a multi-disciplinary research domain composed of machine learning, artificial intelligence, deep learning, image processing, and social networks. Sentiment analysis can be used to determine opinions of the public about products and to find the customers' interest and their feedback through social networks. To perform any natural language processing task, the input text/comments should be represented in a numerical form. Word embeddings represent the given text/sentences/words as a vector that can be employed in performing subsequent natural language processing tasks. In this chapter, the authors discuss different techniques that can improve the performance of sentiment analysis using concepts and techniques like traditional word embeddings, sentiment embeddings, emoticons, lexicons, and neural networks. This chapter also traces the evolution of word embedding techniques with a chronological discussion of the recent research advancements in word embedding techniques.


Author(s):  
M. B. Shete

Abstract: In the world of technology, there are various zones through which different companies may adopt technologies which sustenance decision-making, Artificial Intelligence is the most creative advancement, generally used to help various companies and institutions in business approaches, authoritative aspects and individual’s administration. As of late, consideration has progressively been paid to Human Resources (HR), since professional excellence and capabilities address a development factor and a genuine upper hand for organizations. Subsequent to having been acquainted with deals and showcasing offices, manmade brainpower is additionally beginning to direct representative related choices inside HR the board. The reason for existing is to help choices that are put together not with respect to emotional viewpoints but rather on target information investigation. The objective of this work is to break down how target factors impact representative weakening, to distinguish the fundamental driver that add to a specialist's choice to leave an organization, and to have the option to foresee whether a specific worker will leave the organization. After the testing, the proposed model of an algorithm for the prediction of workers in any industry, attrition is tested on actual dataset with almost 150 samples. With this algorithm best results are generated in terms of all experimental parameters. It uncovers the best review rate, since it estimates the capacity of a classifier to track down every one of the True positive rates and accomplishes a generally false positive rate. The introduced result will help us in distinguishing the conduct of representatives who can be attired throughout the following time. Trial results uncover that the strategic relapse approach can reach up to 86% exactness over another. There are the few algorithms that can be used for processing the data, KNearest Neighbour, logistic regression, decision Tree, random Forest, Support Vector Machine etc. Keywords: Employees Attrition, Machine Learning, Support vector machine (SVM), KNN (K-Nearest Neighbour)


2020 ◽  
Vol 23 (4) ◽  
pp. 274-284 ◽  
Author(s):  
Jingang Che ◽  
Lei Chen ◽  
Zi-Han Guo ◽  
Shuaiqun Wang ◽  
Aorigele

Background: Identification of drug-target interaction is essential in drug discovery. It is beneficial to predict unexpected therapeutic or adverse side effects of drugs. To date, several computational methods have been proposed to predict drug-target interactions because they are prompt and low-cost compared with traditional wet experiments. Methods: In this study, we investigated this problem in a different way. According to KEGG, drugs were classified into several groups based on their target proteins. A multi-label classification model was presented to assign drugs into correct target groups. To make full use of the known drug properties, five networks were constructed, each of which represented drug associations in one property. A powerful network embedding method, Mashup, was adopted to extract drug features from above-mentioned networks, based on which several machine learning algorithms, including RAndom k-labELsets (RAKEL) algorithm, Label Powerset (LP) algorithm and Support Vector Machine (SVM), were used to build the classification model. Results and Conclusion: Tenfold cross-validation yielded the accuracy of 0.839, exact match of 0.816 and hamming loss of 0.037, indicating good performance of the model. The contribution of each network was also analyzed. Furthermore, the network model with multiple networks was found to be superior to the one with a single network and classic model, indicating the superiority of the proposed model.


Sign in / Sign up

Export Citation Format

Share Document