Cross Lingual Sentiment Analysis: A Clustering-Based Bee Colony Instance Selection and Target-Based Feature Weighting Approach

The lack of sentiment resources in poor resource languages poses challenges for the sentiment analysis in which machine learning is involved. Cross-lingual and semi-supervised learning approaches have been deployed to represent the most common ways that can overcome this issue. However, performance of the existing methods degrades due to the poor quality of translated resources, data sparseness and more specifically, language divergence. An integrated learning model that uses a semi-supervised and an ensembled model while utilizing the available sentiment resources to tackle language divergence related issues is proposed. Additionally, to reduce the impact of translation errors and handle instance selection problem, we propose a clustering-based bee-colony-sample selection method for the optimal selection of most distinguishing features representing the target data. To evaluate the proposed model, various experiments are conducted employing an English-Arabic cross-lingual data set. Simulations results demonstrate that the proposed model outperforms the baseline approaches in terms of classification performances. Furthermore, the statistical outcomes indicate the advantages of the proposed training data sampling and target-based feature selection to reduce the negative effect of translation errors. These results highlight the fact that the proposed approach achieves a performance that is close to in-language supervised models.

Download Full-text

Sentiment analysis of MOOC reviews via ALBERT-BiLSTM model

MATEC Web of Conferences ◽

10.1051/matecconf/202133605008 ◽

2021 ◽

Vol 336 ◽

pp. 05008

Author(s):

Cheng Wang ◽

Sirui Huang ◽

Ya Zhou

Keyword(s):

Sentiment Analysis ◽

Online Courses ◽

Massive Open Online Courses ◽

Analysis Model ◽

Accuracy Rate ◽

Massive Open Online ◽

Data Set ◽

Proposed Model ◽

Contextual Feature

The accurate exploration of the sentiment information in comments for Massive Open Online Courses (MOOC) courses plays an important role in improving its curricular quality and promoting MOOC platform’s sustainable development. At present, most of the sentiment analyses of comments for MOOC courses are actually studies in the extensive sense, while relatively less attention is paid to such intensive issues as the polysemous word and the familiar word with an upgraded significance, which results in a low accuracy rate of the sentiment analysis model that is used to identify the genuine sentiment tendency of course comments. For this reason, this paper proposed an ALBERT-BiLSTM model for sentiment analysis of comments for MOOC courses. Firstly, ALBERT was used to dynamically generate word vectors. Secondly, the contextual feature vectors were obtained through BiLSTM pre-sequence and post-sequence, and the attention mechanism that could calculate the weight of different words in a sentence was applied together. Finally, the BiLSTM output vectors were input into Softmax for the classification of sentiments and prediction of the sentimental tendency. The experiment was performed based on the genuine data set of comments for MOOC courses. It was proved in the result that the proposed model was higher in accuracy rate than the already existing models.

Download Full-text

Deep Persian sentiment analysis: Cross-lingual training for low-resource languages

Journal of Information Science ◽

10.1177/0165551520962781 ◽

2020 ◽

pp. 016555152096278

Author(s):

Rouzbeh Ghasemi ◽

Seyed Arad Ashrafi Asli ◽

Saeedeh Momtazi

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Training Data ◽

Target Language ◽

Low Resource ◽

Proposed Model ◽

Significant Difference ◽

Cross Lingual

With the advent of deep neural models in natural language processing tasks, having a large amount of training data plays an essential role in achieving accurate models. Creating valid training data, however, is a challenging issue in many low-resource languages. This problem results in a significant difference between the accuracy of available natural language processing tools for low-resource languages compared with rich languages. To address this problem in the sentiment analysis task in the Persian language, we propose a cross-lingual deep learning framework to benefit from available training data of English. We deployed cross-lingual embedding to model sentiment analysis as a transfer learning model which transfers a model from a rich-resource language to low-resource ones. Our model is flexible to use any cross-lingual word embedding model and any deep architecture for text classification. Our experiments on English Amazon dataset and Persian Digikala dataset using two different embedding models and four different classification networks show the superiority of the proposed model compared with the state-of-the-art monolingual techniques. Based on our experiment, the performance of Persian sentiment analysis improves 22% in static embedding and 9% in dynamic embedding. Our proposed model is general and language-independent; that is, it can be used for any low-resource language, once a cross-lingual embedding is available for the source–target language pair. Moreover, by benefitting from word-aligned cross-lingual embedding, the only required data for a reliable cross-lingual embedding is a bilingual dictionary that is available between almost all languages and the English language, as a potential source language.

Download Full-text

Amikacin Pharmacokinetics To Optimize Dosing in Neonates with Perinatal Asphyxia Treated with Hypothermia

Antimicrobial Agents and Chemotherapy ◽

10.1128/aac.01282-17 ◽

2017 ◽

Vol 61 (12) ◽

Cited By ~ 10

Author(s):

Sinziana Cristea ◽

Anne Smits ◽

Aida Kulo ◽

Catherijne A. J. Knibbe ◽

Mirjam van Weissenbruch ◽

...

Keyword(s):

Perinatal Asphyxia ◽

Volume Of Distribution ◽

Stochastic Simulations ◽

Therapeutic Drug ◽

Published Data ◽

Data Set ◽

Dosing Interval ◽

Proposed Model ◽

Dosing Regimens ◽

The Impact

ABSTRACT Aminoglycoside pharmacokinetics (PK) is expected to change in neonates with perinatal asphyxia treated with therapeutic hypothermia (PATH). Several amikacin dosing guidelines have been proposed for treating neonates with (suspected) septicemia; however, none provide adjustments for cases of PATH. Therefore, we aimed to quantify the differences in amikacin PK between neonates with and without PATH to propose suitable dosing recommendations. Based on amikacin therapeutic drug monitoring data collected retrospectively from neonates with PATH, combined with a published data set, we assessed the impact of PATH on amikacin PK by using population modeling. Monte Carlo and stochastic simulations were performed to establish amikacin exposures in neonates with PATH after dosing according to the current guidelines and according to proposed model-derived dosing guidelines. Amikacin clearance was decreased 40.6% in neonates with PATH, with no changes in volume of distribution. Simulations showed that increasing the dosing interval by 12 h results in a decrease in the percentage of neonates reaching toxic trough levels (>5 mg/liter), from 40 to 76% to 14 to 25%, while still reaching efficacy targets compared to the results of current dosing regimens. Based on this study, a 12-h increase in the amikacin dosing interval in neonates with PATH is proposed to correct for the reduced clearance, yielding safe and effective exposures. As amikacin is renally excreted, further studies into other renally excreted drugs may be required, as their clearance may also be impaired.

Download Full-text

Assessing Regression-Based Sentiment Analysis Techniques in Financial Texts

10.5753/eniac.2019.9329 ◽

2019 ◽

Cited By ~ 1

Author(s):

Taynan Ferreira ◽

Francisco Paiva ◽

Roberto Silva ◽

Angel Paula ◽

Anna Costa ◽

...

Keyword(s):

Sentiment Analysis ◽

Feature Representation ◽

Support Vector ◽

Data Set ◽

Feature Representations ◽

Textual Data ◽

Enormous Amount ◽

Financial Domain ◽

Classification Tasks ◽

The Impact

Sentiment analysis (SA) is increasing its importance due to the enormous amount of opinionated textual data available today. Most of the researches have investigated different models, feature representation and hyperparameters in SA classification tasks. However, few studies were conducted to evaluate the impact of these features on regression SA tasks. In this paper, we conduct such assessment on a financial domain data set by investigating different feature representations and hyperparameters in two important models -- Support Vector Regression (SVR) and Convolution Neural Networks (CNN). We conclude presenting the most relevant feature representations and hyperparameters and how they impact outcomes on a regression SA task.

Download Full-text

Multivariate generalized linear mixed models for continuous bounded outcomes: Analyzing the body fat percentage data

Statistical Methods in Medical Research ◽

10.1177/09622802211043276 ◽

2021 ◽

pp. 096228022110432

Author(s):

Ricardo R Petterle ◽

Henrique A Laureano ◽

Guilherme P da Silva ◽

Wagner H Bonat

Keyword(s):

Maximum Likelihood ◽

Body Fat ◽

The Body ◽

Model Parameters ◽

Body Fat Percentage ◽

Fat Percentage ◽

Data Set ◽

Proposed Model ◽

Computational Implementation ◽

The Impact

We propose a multivariate regression model to handle multiple continuous bounded outcomes. We adopted the maximum likelihood approach for parameter estimation and inference. The model is specified by the product of univariate probability distributions and the correlation between the response variables is obtained through the correlation matrix of the random intercepts. For modeling continuous bounded variables on the interval [Formula: see text] we considered the beta and unit gamma distributions. The main advantage of the proposed model is that we can easily combine different marginal distributions for the response variable vector. The computational implementation is performed using Template Model Builder, which combines the Laplace approximation with automatic differentiation. Therefore, the proposed approach allows us to estimate the model parameters quickly and efficiently. We conducted a simulation study to evaluate the computational implementation and the properties of the maximum likelihood estimators under different scenarios. Moreover, we investigate the impact of distribution misspecification in the proposed model. Our model was motivated by a data set with multiple continuous bounded outcomes, which refer to the body fat percentage measured at five regions of the body. Simulation studies and data analysis showed that the proposed model provides a general and rich framework to deal with multiple continuous bounded outcomes.

Download Full-text

The Impact of Firm Characteristics and IT Governance on IT Material Weaknesses

Journal of Organizational and End User Computing ◽

10.4018/joeuc.2018040105 ◽

2018 ◽

Vol 30 (2) ◽

pp. 88-111

Author(s):

Peiqin Zhang ◽

Kexin Zhao ◽

Ram L. Kumar

Keyword(s):

Organizational Performance ◽

It Governance ◽

Secondary Data ◽

End Users ◽

Firm Characteristics ◽

Data Set ◽

Proposed Model ◽

Material Weaknesses ◽

General Material ◽

The Impact

Accurate and timely reporting of organizational performance is becoming increasingly important and highly regulated. However, organizations face a variety of challenges in seeking to provide accurate and reliable information due to the existence of IT control problems. Hence it is important for end users including auditors and managers to understand how to manage IT material weaknesses (ITMWs). While there is extensive accounting research on general material weaknesses (MWs), ITMWs are under researched. This article identifies key firm characteristics that appear to be related to ITMWs. In addition, the authors suggest that IT governance may help firms mitigate such problems. To gain a deeper understanding of IT governance effects, this article proposes a model which includes an innovative construct, ITGOV, operationalized using secondary data. The authors empirically validate the proposed model based on a data set of 1,112 firms. Their study illustrates the differences between ITMWs and general MWs. These results can also help end users computing by offering insights into better management of ITMWs.

Download Full-text

The impact of semantics on aspect level opinion mining

PeerJ Computer Science ◽

10.7717/peerj-cs.558 ◽

2021 ◽

Vol 7 ◽

pp. e558

Author(s):

Eman M. Aboelela ◽

Walaa Gad ◽

Rasha Ismail

Keyword(s):

Sentiment Analysis ◽

Semantic Similarity ◽

Opinion Mining ◽

Online Shopping ◽

Experimental Results ◽

Proposed Model ◽

Online Comments ◽

The Impact ◽

F Measure ◽

The Web

Recently, many users prefer online shopping to purchase items from the web. Shopping websites allow customers to submit comments and provide their feedback for the purchased products. Opinion mining and sentiment analysis are used to analyze products’ comments to help sellers and purchasers decide to buy products or not. However, the nature of online comments affects the performance of the opinion mining process because they may contain negation words or unrelated aspects to the product. To address these problems, a semantic-based aspect level opinion mining (SALOM) model is proposed. The SALOM extracts the product aspects based on the semantic similarity and classifies the comments. The proposed model considers the negation words and other types of product aspects such as aspects’ synonyms, hyponyms, and hypernyms to improve the accuracy of classification. Three different datasets are used to evaluate the proposed SALOM. The experimental results are promising in terms of Precision, Recall, and F-measure. The performance reaches 94.8% precision, 93% recall, and 92.6% f-measure.

Download Full-text

Sentiment Analysis Using XLM-R Transformer and Zero-shot Transfer Learning on Resource-poor Indian Language

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3461764 ◽

2021 ◽

Vol 20 (5) ◽

pp. 1-13

Author(s):

Akshi Kumar ◽

Victor Hugo C. Albuquerque

Keyword(s):

Sentiment Analysis ◽

Transfer Learning ◽

State Of The Art ◽

Classification Model ◽

Indian Language ◽

Sentence Level ◽

Proposed Model ◽

Resource Poor ◽

Linguistic Challenges ◽

Cross Lingual

Sentiment analysis on social media relies on comprehending the natural language and using a robust machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. The cultural miscellanies, geographically limited trending topic hash-tags, access to aboriginal language keyboards, and conversational comfort in native language compound the linguistic challenges of sentiment analysis. This research evaluates the performance of cross-lingual contextual word embeddings and zero-shot transfer learning in projecting predictions from resource-rich English to resource-poor Hindi language. The cross-lingual XLM-RoBERTa classification model is trained and fine-tuned using the English language Benchmark SemEval 2017 dataset Task 4 A and subsequently zero-shot transfer learning is used to evaluate the classification model on two Hindi sentence-level sentiment analysis datasets, namely, IITP-Movie and IITP-Product review datasets. The proposed model compares favorably to state-of-the-art approaches and gives an effective solution to sentence-level (tweet-level) analysis of sentiments in a resource-poor scenario. The proposed model compares favorably to state-of-the-art approaches and achieves an average performance accuracy of 60.93 on both the Hindi datasets.

Download Full-text

A Multi-Attention Network for Aspect-Level Sentiment Analysis

Future Internet ◽

10.3390/fi11070157 ◽

2019 ◽

Vol 11 (7) ◽

pp. 157 ◽

Cited By ~ 1

Author(s):

Qiuyue Zhang ◽

Ran Lu

Keyword(s):

Neural Networks ◽

Sentiment Analysis ◽

Specific Aspect ◽

Experimental Results ◽

Sequence Information ◽

Attention Network ◽

Attention Networks ◽

Recent Advances ◽

Proposed Model ◽

The Impact

Aspect-level sentiment analysis (ASA) aims at determining the sentiment polarity of specific aspect term with a given sentence. Recent advances in attention mechanisms suggest that attention models are useful in ASA tasks and can help identify focus words. Or combining attention mechanisms with neural networks are also common methods. However, according to the latest research, they often fail to extract text representations efficiently and to achieve interaction between aspect terms and contexts. In order to solve the complete task of ASA, this paper proposes a Multi-Attention Network (MAN) model which adopts several attention networks. This model not only preprocesses data by Bidirectional Encoder Representations from Transformers (BERT), but a number of measures have been taken. First, the MAN model utilizes the partial Transformer after transformation to obtain hidden sequence information. Second, because words in different location have different effects on aspect terms, we introduce location encoding to analyze the impact on distance from ASA tasks, then we obtain the influence of different words with aspect terms through the bidirectional attention network. From the experimental results of three datasets, we could find that the proposed model could achieve consistently superior results.

Download Full-text

Multi-Layer Attention Approach for Aspect based Sentiment Analysis

10.5121/csit.2020.101410 ◽

2020 ◽

Author(s):

Xinzhi Ai ◽

Xiaoge Li ◽

Feixiong Hu ◽

Shuting Zhi ◽

Likun Hu

Keyword(s):

Sentiment Analysis ◽

Semantic Information ◽

Short Term Memory ◽

Attention Mechanism ◽

Training Dataset ◽

Emotion Classification ◽

Data Set ◽

New Model ◽

Fine Grained ◽

Proposed Model

Based on the aspect-level sentiment analysis is typical of fine-grained emotional classification that assigns sentiment polarity for each of the aspects in a review. For better handle the emotion classification task, this paper put forward a new model which apply Long Short-Term Memory network combine multiple attention with aspect context. Where multiple attention mechanism (i.e., location attention, content attention and class attention) refers to takes the factors of context location, content semantics and class balancing into consideration. Therefore, the proposed model can adaptively integrate location and semantic information between the aspect targets and their contexts into sentimental features, and overcome the model data variance introduced by the imbalanced training dataset. In addition, the aspect context is encoded on both sides of the aspect target, so as to enhance the ability of the model to capture semantic information. The Multi-Attention mechanism (MATT) and Aspect Context (AC) allow our model to perform better when facing reviews with more complicated structures. The result of this experiment indicate that the accuracy of the new model is up to 80.6% and 75.1% for two datasets in SemEval-2014 Task 4 respectively, While the accuracy of the data set on twitter 71.1%, and 81.6% for the Chinese automotive-domain dataset. Compared with some previous models for sentiment analysis, our model shows a higher accuracy.

Download Full-text