Sentiment analysis methods for understanding large-scale texts: a case for using continuum-scored words and word shift graphs

Words are one of the most essential elements of expressing sentiments in context although they are not the only ones. Also, syntactic relationships between words, morphology, punctuation, and linguistic phenomena are influential. Merely considering the concept of words as isolated phenomena causes a lot of mistakes in sentiment analysis systems. So far, a large amount of research has been conducted on generating sentiment dictionaries containing only sentiment words. A number of these dictionaries have addressed the role of combinations of sentiment words, negators, and intensifiers, while almost none of them considered the heterogeneous effect of the occurrence of multiple linguistic phenomena in sentiment compounds. Regarding the weaknesses of the existing sentiment dictionaries, in addressing the heterogeneous effect of the occurrence of multiple intensifiers, this research presents a sentiment dictionary based on the analysis of sentiment compounds including sentiment words, negators, and intensifiers by considering the multiple intensifiers relative to the sentiment word and assigning a location-based coefficient to the intensifier, which increases the covered sentiment phrase in the dictionary, and enhanced efficiency of proposed dictionary-based sentiment analysis methods up to 7% compared to the latest methods.

Download Full-text

COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis

IEEE Transactions on Computational Social Systems ◽

10.1109/tcss.2021.3051189 ◽

2021 ◽

pp. 1-13

Author(s):

Usman Naseem ◽

Imran Razzak ◽

Matloob Khushi ◽

Peter W. Eklund ◽

Jinman Kim

Keyword(s):

Sentiment Analysis ◽

Large Scale ◽

Data Set ◽

Twitter Data

Download Full-text

On the combination of "off-the-shelf" sentiment analysis methods

Proceedings of the 31st Annual ACM Symposium on Applied Computing - SAC '16 ◽

10.1145/2851613.2851820 ◽

2016 ◽

Cited By ~ 4

Author(s):

Pollyanna Gonçalves ◽

Daniel Hasan Dalip ◽

Helen Costa ◽

Marcos André Gonçalves ◽

Fabrício Benevenuto

Keyword(s):

Sentiment Analysis ◽

Analysis Methods

Download Full-text

Network-Driven Analysis Methods and their Application to Drug Discovery

Handbook of Research on Computational and Systems Biology ◽

10.4018/978-1-60960-491-2.ch013 ◽

2011 ◽

pp. 294-315

Author(s):

Daniel Ziemek ◽

Christoph Brockel

Keyword(s):

Drug Discovery ◽

Large Scale ◽

Quantitative Methods ◽

Primary Data ◽

Success Rates ◽

Parallel Methods ◽

Development Face ◽

Large Scale Data ◽

Analysis Methods ◽

Genome Scale

Drug discovery and development face tremendous challenges to find promising intervention points for important diseases. Any therapeutic agent targeting such an intervention point must prove its efficacy and safety in patients. Success rates measured from first studies in human to registration average around 10% only. Over the last decade, massive knowledge on biological systems has been accumulated and genome-scale primary data are produced at an ever increasing rate. In parallel, methods to use that knowledge have matured. This chapter will present some of the problems facing the pharmaceutical industry and elaborate on the current state of network-driven analysis methods. It will focus especially on semi-quantitative methods that are applicable to large-scale data analysis and point out their potential use in many relevant drug discovery challenges.

Download Full-text

Dynamic Topic-Based Sentiment Analysis of Large-Scale Online News

Web Information Systems Engineering – WISE 2016 - Lecture Notes in Computer Science ◽

10.1007/978-3-319-48743-4_1 ◽

2016 ◽

pp. 3-18 ◽

Cited By ~ 4

Author(s):

Peng Liu ◽

Jon Atle Gulla ◽

Lemei Zhang

Keyword(s):

Sentiment Analysis ◽

Large Scale ◽

Online News

Download Full-text

Identifying significantly impacted pathways: a comprehensive review and assessment

Genome Biology ◽

10.1186/s13059-019-1790-4 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 26

Author(s):

Tuan-Minh Nguyen ◽

Adib Shafi ◽

Tin Nguyen ◽

Sorin Draghici

Keyword(s):

Pathway Analysis ◽

Null Hypothesis ◽

Large Scale ◽

Data Sets ◽

Actual Performance ◽

Large Scale Assessment ◽

Analysis Methods ◽

Biological Phenomena ◽

High Throughput Experiments ◽

The Given

Abstract Background Many high-throughput experiments compare two phenotypes such as disease vs. healthy, with the goal of understanding the underlying biological phenomena characterizing the given phenotype. Because of the importance of this type of analysis, more than 70 pathway analysis methods have been proposed so far. These can be categorized into two main categories: non-topology-based (non-TB) and topology-based (TB). Although some review papers discuss this topic from different aspects, there is no systematic, large-scale assessment of such methods. Furthermore, the majority of the pathway analysis approaches rely on the assumption of uniformity of p values under the null hypothesis, which is often not true. Results This article presents the most comprehensive comparative study on pathway analysis methods available to date. We compare the actual performance of 13 widely used pathway analysis methods in over 1085 analyses. These comparisons were performed using 2601 samples from 75 human disease data sets and 121 samples from 11 knockout mouse data sets. In addition, we investigate the extent to which each method is biased under the null hypothesis. Together, these data and results constitute a reliable benchmark against which future pathway analysis methods could and should be tested. Conclusion Overall, the result shows that no method is perfect. In general, TB methods appear to perform better than non-TB methods. This is somewhat expected since the TB methods take into consideration the structure of the pathway which is meant to describe the underlying phenomena. We also discover that most, if not all, listed approaches are biased and can produce skewed results under the null.

Download Full-text

Twitter analysis of the orthodontic patient experience with braces vs Invisalign

The Angle Orthodontist ◽

10.2319/062816-508.1 ◽

2016 ◽

Vol 87 (3) ◽

pp. 377-383 ◽

Cited By ~ 25

Author(s):

Daniel Noll ◽

Brendan Mahon ◽

Bhavna Shroff ◽

Caroline Carrico ◽

Steven J. Lindauer

Keyword(s):

Sentiment Analysis ◽

Patient Experience ◽

Large Scale ◽

Specific Content ◽

Significant Difference ◽

Data Collection Program ◽

Twitter Analysis ◽

Twitter Users ◽

Negative Sentiment ◽

Collection Program

ABSTRACT Objective: To examine the orthodontic patient experience having braces compared with Invisalign by means of a large-scale Twitter sentiment analysis. Materials and Methods: A custom data collection program was created that collected tweets containing the words “braces” or “Invisalign” for a period of 5 months. A hierarchal Naïve Bayes sentiment analysis classifier was developed to sort the tweets into five categories: positive, negative, neutral, advertisement, or not applicable. Each category was then analyzed for specific content. Results: A total of 419,363 tweets applicable to orthodontics were collected. Users posted significantly more positive tweets (61%) than they did negative tweets (39%; P ≤ .0001). There was no significant difference in the distribution of positive and negative sentiment between braces and Invisalign tweets (P = .4189). Positive orthodontics-related tweets often highlighted gratitude for a great smile accompanied with selfies. Negative orthodontic tweets frequently focused on pain. Conclusion: Twitter users expressed more positive than negative sentiment about orthodontic treatment with no significant difference in sentiment between braces and Invisalign tweets.

Download Full-text

REVISITING MODULES (AND CENTERS) IN STAPHYLOCOCCUS AUREUS METABOLIC NETWORK WITH LINK CLUSTERING

Journal of Biological System ◽

10.1142/s021833901150032x ◽

2012 ◽

Vol 20 (01) ◽

pp. 57-66

Author(s):

DE-WU DING ◽

LONG YING

Keyword(s):

Staphylococcus Aureus ◽

Community Structure ◽

Metabolic Network ◽

Structure Analysis ◽

Large Scale ◽

Metabolic Networks ◽

Clustering Algorithm ◽

Functional Modules ◽

Analysis Methods

Community structure analysis methods are important tools in modeling and analyzing large-scale metabolic networks. However, traditional community structure methods are mainly solved by clustering nodes, which results in each metabolite belonging to only a single community, which limits their usefulness in the study of metabolic networks. In the present paper, we analyze the community structure and functional modules in the Staphylococcus aureus (S. aureus) metabolic network, using a link clustering algorithm, and we obtain 10 functional modules with better biological insights, which give better results than our previous study. We also evaluate the essentiality of nodes in S. aureus metabolic networks. We suggest that link clustering could identify functional modules and key metabolites in metabolic networks.

Download Full-text

Towards enhancement of a lexicon-based approach for Saudi dialect sentiment analysis

Journal of Information Science ◽

10.1177/0165551516688143 ◽

2017 ◽

Vol 44 (2) ◽

pp. 184-202 ◽

Cited By ~ 12

Author(s):

Adel Assiri ◽

Ahmed Emam ◽

Hmood Al-Dossari

Keyword(s):

Sentiment Analysis ◽

Large Scale ◽

Training Data ◽

Linguistic Features ◽

Algorithm Development ◽

Domain Independence ◽

Low Performance

Sentiment analysis (SA) techniques are applied to assess aspects of language that are used to express feelings, evaluations and opinions in areas such as customer sentiment extraction. Most studies have focused on SA techniques for widely used languages such as English, but less attention has been paid to Arabic, particularly the Saudi dialect. Most Arabic SA studies have built systems using supervised approaches that are domain dependent; hence, they achieve low performance when applied to a new domain different from the learning domain, and they require manually labelled training data, which are usually difficult to obtain. In this article, we propose a novel lexicon-based algorithm for Saudi dialect SA that features domain independence. We created an annotated Saudi dialect dataset and built a large-scale lexicon for the Saudi dialect. Then, we developed our weighted lexicon-based algorithm. The proposed algorithm mines the associations between polarity and non-polarity words for the dataset and then weights these words based on their associations. During algorithm development, we also proposed novel rules for handling some linguistic features such as negation and supplication. Several experiments were performed to evaluate the performance of the proposed algorithm.

Download Full-text

Economic and Mathematical Modelling of the Effectiveness of the National System for Combatting Cyber Fraud and Legalisation of Criminal Proceeds Based on Survival Analysis Methods

Scientific Bulletin of Mukachevo State University Series “Economics” ◽

10.52566/msu-econ.8(1).2021.144-153 ◽

2021 ◽

Vol 8 (1) ◽

pp. 144-153

Author(s):

Olha V. Kuzmenko ◽

Tetiana V. Dotsenko ◽

Liliia O. Skrynka

Keyword(s):

Survival Analysis ◽

Mathematical Modelling ◽

Money Laundering ◽

Large Scale ◽

Modern World ◽

Time Interval ◽

Negative Consequences ◽

National System ◽

Analysis Methods ◽

Temporal Measurement

In modern world, the digitalisation of financial relations, the development of innovative technologies, and the emergence and use of cryptocurrencies for payments lead to an increase in the number of cyber frauds in the financial sector and their intellectualisation, increasing the illegal outflow of funds abroad. Ineffective decisions and inaction in counteracting these threats lead to large-scale negative consequences of both financial and social nature. The purpose of this study is to implement economic and mathematical modelling of the effectiveness of the national system for combatting cyber fraud and legalisation of criminal proceeds, which is based on the use of survival analysis methods. The study provides a bibliometric analysis of publications on the effectiveness of cyber fraud and combatting the legalisation of illegal funds, by building a bibliometric map of keywords, using VOSviewer software. This allowed identifying 7 clusters of basic categories of cyber fraud analysis, and changes in the vectors of research scientists showed a visual map of the contextual-temporal measurement of research into the effectiveness of cyber fraud in the publications of the Scopus database. The paper examines the effectiveness of the national system for combatting cyber fraud and money laundering based on survival tables. As a result of the study, the effectiveness of the national system for combatting cyber fraud and money laundering was analysed based on the Kaplan-Meier method. The study identified the dependences of the effectiveness of the national system for combatting cyber fraud and legalisation of criminal proceeds on the time interval after the discovery of violations. The practical value of applying the developed model is to form an analytical basis for further management decisions by the National Bank of Ukraine, the State Financial Monitoring Service, and the Security Service of Ukraine in terms of the effectiveness of the national system to combat cyber fraud and legalisation of criminal proceeds and the need to adjust it

Download Full-text