Sentiment Analysis using Multiple Word Embedding for Words

Nowadays users express their opinions on different websites like e-commerce and special review websites. Analyzing customers' opinions and their responses is important for decision making. So the researchers worked on analyzing these reviews automatically using a classical machine learning approach like Support Vector Machine (SVM) and various modern deep neural networks. For these networks, words are represented by using vectors called word embeddings. The required word embeddings are taken from pre-trained Word2Vec or learned from a corpus of the given main task. But, each method has its demerits. In the case of pre-trained word embeddings, embeddings are learned from large general corpus so these embeddings are not task specific. While in the case of learning words from the corpus of the main task, it does not reflect the true semantics. To deal with these problems, we have proposed an embedding developer model. This model develops task specific word embedding which also reflects true semantics. Task specific word embeddings are generated from the given corpus using the embedding layer in Keras. It builds the embeddings by considering relationships between words in the window. While true semantics are taken from Word2Vec embeddings. The proposed model combines these two embeddings to generate true semantics and task specific word embeddings. Result analysis shows that the proposed system works better on many benchmark dataset

Download Full-text

Context-Aware Neural Machine Translation for Korean Honorific Expressions

Electronics ◽

10.3390/electronics10131589 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1589

Author(s):

Yongkeun Hwang ◽

Yanghoon Kim ◽

Kyomin Jung

Keyword(s):

Machine Translation ◽

Deep Neural Networks ◽

Contextual Information ◽

Context Aware ◽

Neural Machine Translation ◽

Translation Quality ◽

Sentence Level ◽

Proposed Model ◽

The Given ◽

The Relationship

Neural machine translation (NMT) is one of the text generation tasks which has achieved significant improvement with the rise of deep neural networks. However, language-specific problems such as handling the translation of honorifics received little attention. In this paper, we propose a context-aware NMT to promote translation improvements of Korean honorifics. By exploiting the information such as the relationship between speakers from the surrounding sentences, our proposed model effectively manages the use of honorific expressions. Specifically, we utilize a novel encoder architecture that can represent the contextual information of the given input sentences. Furthermore, a context-aware post-editing (CAPE) technique is adopted to refine a set of inconsistent sentence-level honorific translations. To demonstrate the efficacy of the proposed method, honorific-labeled test data is required. Thus, we also design a heuristic that labels Korean sentences to distinguish between honorific and non-honorific styles. Experimental results show that our proposed method outperforms sentence-level NMT baselines both in overall translation quality and honorific translations.

Download Full-text

Bidirectional deep neural networks to integrate RNA and DNA data for predicting outcome for patients with hepatocellular carcinoma

Future Oncology ◽

10.2217/fon-2021-0659 ◽

2021 ◽

Author(s):

Guojun Huang ◽

Cheng Wang ◽

Xi Fu

Keyword(s):

Hepatocellular Carcinoma ◽

Neural Networks ◽

Deep Neural Networks ◽

Support Vector ◽

Machine Model ◽

Mrna Expression Data ◽

Machine Learning Approach ◽

Patient Profiling ◽

Predicting Outcome ◽

Subgroup Classification

Aims: Individualized patient profiling is instrumental for personalized management in hepatocellular carcinoma (HCC). This study built a model based on bidirectional deep neural networks (BiDNNs), an unsupervised machine-learning approach, to integrate multi-omics data and predict survival in HCC. Methods: DNA methylation and mRNA expression data for HCC samples from the TCGA database were integrated using BiDNNs. With optimal clusters as labels, a support vector machine model was developed to predict survival. Results: Using the BiDNN-based model, samples were clustered into two survival subgroups. The survival subgroup classification was an independent prognostic factor. BiDNNs were superior to multimodal autoencoders. Conclusion: This study constructed and validated a BiDNN-based model for predicting prognosis in HCC, with implications for individualized therapies in HCC.

Download Full-text

A Fusion-Based Hybrid-Feature Approach for Recognition of Unconstrained Offline Handwritten Hindi Characters

Future Internet ◽

10.3390/fi13090239 ◽

2021 ◽

Vol 13 (9) ◽

pp. 239

Author(s):

Danveer Rajpal ◽

Akhil Ranjan Garg ◽

Om Prakash Mahela ◽

Hassan Haes Alhelou ◽

Pierluigi Siano

Keyword(s):

Character Recognition ◽

Large Population ◽

Principal Component ◽

Support Vector ◽

Discrete Wavelet ◽

Hybrid Features ◽

Efficient Management ◽

Novel Approach ◽

Proposed Model ◽

Machine Learning Approach

Hindi is the official language of India and used by a large population for several public services like postal, bank, judiciary, and public surveys. Efficient management of these services needs language-based automation. The proposed model addresses the problem of handwritten Hindi character recognition using a machine learning approach. The pre-trained DCNN models namely; InceptionV3-Net, VGG19-Net, and ResNet50 were used for the extraction of salient features from the characters’ images. A novel approach of fusion is adopted in the proposed work; the DCNN-based features are fused with the handcrafted features received from Bi-orthogonal discrete wavelet transform. The feature size was reduced by the Principal Component Analysis method. The hybrid features were examined with popular classifiers namely; Multi-Layer Perceptron (MLP) and Support Vector Machine (SVM). The recognition cost was reduced by 84.37%. The model achieved significant scores of precision, recall, and F1-measure—98.78%, 98.67%, and 98.69%—with overall recognition accuracy of 98.73%.

Download Full-text

Towards the Importance of the Type of Deep Neural Network and Employment of Pre-trained Word Vectors for Toxicity Detection: An Experimental Study

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2082 ◽

2021 ◽

Author(s):

Abdullah Talha Kabakus

Keyword(s):

Neural Networks ◽

Social Media ◽

Deep Neural Networks ◽

Word Embedding ◽

Experimental Result ◽

Daily Lives ◽

Gold Standard Dataset ◽

Detection Approach ◽

Proposed Model ◽

Social Media Platforms

As a natural consequence of offering many advantages to their users, social media platforms have become a part of daily lives. Recent studies emphasize the necessity of an automated way of detecting the offensive posts in social media since these ‘toxic’ posts have become pervasive. To this end, a novel toxic post detection approach based on Deep Neural Networks was proposed within this study. Given that several word embedding methods exist, we shed light on which word embedding method produces better results when employed with the five most common types of deep neural networks, namely, , , , , and a combination of and . To this end, the word vectors for the given comments were obtained through four different methods, namely, () , () , () , and () the layer of deep neural networks. Eventually, a total of twenty benchmark models were proposed and both trained and evaluated on a gold standard dataset which consists of tweets. According to the experimental result, the best , , was obtained on the proposed model without employing pre-trained word vectors which outperformed the state-of-the-art works and implies the effective embedding ability of s. Other key findings obtained through the conducted experiments are that the models, that constructed word embeddings through the layers, obtained higher s and converged much faster than the models that utilized pre-trained word vectors.

Download Full-text

A Multi-Task Learning Approach to Sarcasm Detection (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7226 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13907-13908

Author(s):

Edoardo Savini ◽

Cornelia Caragea

Keyword(s):

Language Processing ◽

Deep Neural Networks ◽

Opinion Mining ◽

Main Task ◽

Discussion Forums ◽

Large Dataset ◽

Learning Framework ◽

Task Learning ◽

Proposed Model ◽

Negative Sentiment

Sarcasm detection plays an important role in natural language processing as it has been considered one of the most challenging subtasks in sentiment analysis and opinion mining applications. Our work aims to detect sarcasm in social media sites and discussion forums, exploiting the potential of deep neural networks and multi-task learning. Specifically, relying on the strong correlation between sarcasm and (implied negative) sentiment, we explore a multi-task learning framework that uses sentiment classification as an auxiliary task to inform the main task of sarcasm detection. Our proposed model outperforms many previous baseline methods on an existing large dataset annotated with sarcasm.

Download Full-text

A Correlated Topic Model Using Word Embeddings

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/588 ◽

2017 ◽

Cited By ~ 20

Author(s):

Guangxu Xun ◽

Yaliang Li ◽

Wayne Xin Zhao ◽

Jing Gao ◽

Aidong Zhang

Keyword(s):

Data Augmentation ◽

Topic Model ◽

Semantic Relatedness ◽

Word Embedding ◽

Word Embeddings ◽

Word Level ◽

Logistic Normal Distribution ◽

Proposed Model ◽

Correlation Information ◽

Correlated Topic Model

Conventional correlated topic models are able to capture correlation structure among latent topics by replacing the Dirichlet prior with the logistic normal distribution. Word embeddings have been proven to be able to capture semantic regularities in language. Therefore, the semantic relatedness and correlations between words can be directly calculated in the word embedding space, for example, via cosine values. In this paper, we propose a novel correlated topic model using word embeddings. The proposed model enables us to exploit the additional word-level correlation information in word embeddings and directly model topic correlation in the continuous word embedding space. In the model, words in documents are replaced with meaningful word embeddings, topics are modeled as multivariate Gaussian distributions over the word embeddings and topic correlations are learned among the continuous Gaussian topics. A Gibbs sampling solution with data augmentation is given to perform inference. We evaluate our model on the 20 Newsgroups dataset and the Reuters-21578 dataset qualitatively and quantitatively. The experimental results show the effectiveness of our proposed model.

Download Full-text

An Enhanced Sentiment Analysis Framework Based on Pre-Trained Word Embedding

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026820500315 ◽

2020 ◽

Vol 19 (04) ◽

pp. 2050031 ◽

Cited By ~ 1

Author(s):

Ensaf Hussein Mohamed ◽

Mohammed ElSaid Moussa ◽

Mohamed Hassan Haggag

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Ensemble Classifier ◽

Word Embedding ◽

Machine Learning Techniques ◽

Bag Of Words ◽

Pos Tagging ◽

Learning Techniques ◽

Proposed Model ◽

Machine Learning Approach

Sentiment analysis (SA) is a technique that lets people in different fields such as business, economy, research, government, and politics to know about people’s opinions, which greatly affects the process of decision-making. SA techniques are classified into: lexicon-based techniques, machine learning techniques, and a hybrid between both approaches. Each approach has its limitations and drawbacks, the machine learning approach depends on manual feature extraction, lexicon-based approach relies on sentiment lexicons that are usually unscalable, unreliable, and manually annotated by human experts. Nowadays, word-embedding techniques have been commonly used in SA classification. Currently, Word2Vec and GloVe are some of the most accurate and usable word embedding techniques, which can transform words into meaningful semantic vectors. However, these techniques ignore sentiment information of texts and require a huge corpus of texts for training and generating accurate vectors, which are used as inputs of deep learning models. In this paper, we propose an enhanced ensemble classifier framework. Our framework is based on our previously published lexicon-based method, bag-of-words, and pre-trained word embedding, first the sentence is preprocessed by removing stop-words, POS tagging, stemming and lemmatization, shortening exaggerated word. Second, the processed sentence is passed to three modules, our previous lexicon-based method (Sum Votes), bag-of-words module and semantic module (Word2Vec and Glove) and produced feature vectors. Finally, the previous features vectors are fed into 11 different classifiers. The proposed framework is tested and evaluated over four datasets with five different lexicons, the experiment results show that our proposed model outperforms the previous lexicon based and the machine learning methods individually.

Download Full-text

Word Embedding Techniques for Sentiment Analyzers

10.4018/978-1-7998-8061-5.ch013 ◽

2021 ◽

pp. 233-252

Author(s):

Upendar Rao Rayala ◽

Karthick Seshadri

Keyword(s):

Social Networks ◽

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Word Embedding ◽

Word Embeddings ◽

The Public ◽

The Given ◽

Research Domain

Sentiment analysis is perceived to be a multi-disciplinary research domain composed of machine learning, artificial intelligence, deep learning, image processing, and social networks. Sentiment analysis can be used to determine opinions of the public about products and to find the customers' interest and their feedback through social networks. To perform any natural language processing task, the input text/comments should be represented in a numerical form. Word embeddings represent the given text/sentences/words as a vector that can be employed in performing subsequent natural language processing tasks. In this chapter, the authors discuss different techniques that can improve the performance of sentiment analysis using concepts and techniques like traditional word embeddings, sentiment embeddings, emoticons, lexicons, and neural networks. This chapter also traces the evolution of word embedding techniques with a chronological discussion of the recent research advancements in word embedding techniques.

Download Full-text

Prediction of Employee Attrition Using Machine Learning Approach

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37796 ◽

2021 ◽

Vol 9 (8) ◽

pp. 2479-2483

Author(s):

M. B. Shete

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

False Positive Rate ◽

Time Trial ◽

Support Vector ◽

Development Factor ◽

Proposed Model ◽

Machine Learning Approach ◽

Employee Attrition ◽

Positive Rate

Abstract: In the world of technology, there are various zones through which different companies may adopt technologies which sustenance decision-making, Artificial Intelligence is the most creative advancement, generally used to help various companies and institutions in business approaches, authoritative aspects and individual’s administration. As of late, consideration has progressively been paid to Human Resources (HR), since professional excellence and capabilities address a development factor and a genuine upper hand for organizations. Subsequent to having been acquainted with deals and showcasing offices, manmade brainpower is additionally beginning to direct representative related choices inside HR the board. The reason for existing is to help choices that are put together not with respect to emotional viewpoints but rather on target information investigation. The objective of this work is to break down how target factors impact representative weakening, to distinguish the fundamental driver that add to a specialist's choice to leave an organization, and to have the option to foresee whether a specific worker will leave the organization. After the testing, the proposed model of an algorithm for the prediction of workers in any industry, attrition is tested on actual dataset with almost 150 samples. With this algorithm best results are generated in terms of all experimental parameters. It uncovers the best review rate, since it estimates the capacity of a classifier to track down every one of the True positive rates and accomplishes a generally false positive rate. The introduced result will help us in distinguishing the conduct of representatives who can be attired throughout the following time. Trial results uncover that the strategic relapse approach can reach up to 86% exactness over another. There are the few algorithms that can be used for processing the data, KNearest Neighbour, logistic regression, decision Tree, random Forest, Support Vector Machine etc. Keywords: Employees Attrition, Machine Learning, Support vector machine (SVM), KNN (K-Nearest Neighbour)

Download Full-text

Drug Target Group Prediction with Multiple Drug Networks

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666190702103927 ◽

2020 ◽

Vol 23 (4) ◽

pp. 274-284 ◽

Cited By ~ 12

Author(s):

Jingang Che ◽

Lei Chen ◽

Zi-Han Guo ◽

Shuaiqun Wang ◽

Aorigele

Keyword(s):

Drug Target ◽

Low Cost ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector ◽

Multiple Drug ◽

Property A ◽

Multiple Networks ◽

Proposed Model ◽

The One

Background: Identification of drug-target interaction is essential in drug discovery. It is beneficial to predict unexpected therapeutic or adverse side effects of drugs. To date, several computational methods have been proposed to predict drug-target interactions because they are prompt and low-cost compared with traditional wet experiments. Methods: In this study, we investigated this problem in a different way. According to KEGG, drugs were classified into several groups based on their target proteins. A multi-label classification model was presented to assign drugs into correct target groups. To make full use of the known drug properties, five networks were constructed, each of which represented drug associations in one property. A powerful network embedding method, Mashup, was adopted to extract drug features from above-mentioned networks, based on which several machine learning algorithms, including RAndom k-labELsets (RAKEL) algorithm, Label Powerset (LP) algorithm and Support Vector Machine (SVM), were used to build the classification model. Results and Conclusion: Tenfold cross-validation yielded the accuracy of 0.839, exact match of 0.816 and hamming loss of 0.037, indicating good performance of the model. The contribution of each network was also analyzed. Furthermore, the network model with multiple networks was found to be superior to the one with a single network and classic model, indicating the superiority of the proposed model.

Download Full-text