String Kernels for Polarity Classification: A Study Across Different Languages

HASKER: An efficient algorithm for string kernels. Application to polarity classification in various languages

Procedia Computer Science ◽

10.1016/j.procs.2017.08.207 ◽

2017 ◽

Vol 112 ◽

pp. 1755-1763 ◽

Cited By ~ 1

Author(s):

Marius Popescu ◽

Cristian Grozea ◽

Radu Tudor Ionescu

Keyword(s):

Efficient Algorithm ◽

String Kernels ◽

Polarity Classification

Download Full-text

Single and Cross-domain Polarity Classification using String Kernels

10.18653/v1/e17-2089 ◽

2017 ◽

Cited By ~ 2

Author(s):

Rosa M. Giménez-Pérez ◽

Marc Franco-Salvador ◽

Paolo Rosso

Keyword(s):

String Kernels ◽

Cross Domain ◽

Polarity Classification

Download Full-text

BUAP: Polarity Classification of Short Texts

10.3115/v1/s14-2023 ◽

2014 ◽

Author(s):

David Pinto ◽

Darnes Vilariño ◽

Saul Leon ◽

Miguel Jasso ◽

Cupertino Lucero

Keyword(s):

Polarity Classification

Download Full-text

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Download Full-text

Accelerating Legacy String Kernels via Bounded Automata Learning

Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems ◽

10.1145/3373376.3378503 ◽

2020 ◽

Author(s):

Kevin Angstadt ◽

Jean-Baptiste Jeannin ◽

Westley Weimer

Keyword(s):

String Kernels ◽

Automata Learning

Download Full-text

Two New Large Corpora for Vietnamese Aspect-based Sentiment Analysis at Sentence Level

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3446678 ◽

2021 ◽

Vol 20 (4) ◽

pp. 1-22

Author(s):

Dang Van Thin ◽

Ngan Luu-Thuy Nguyen ◽

Tri Minh Truong ◽

Lac Si Le ◽

Duy Tin Vo

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

Network Architectures ◽

Low Resource ◽

The Neural Network ◽

Sentence Level ◽

Push Forward ◽

Polarity Classification ◽

Learning Architectures ◽

Single Approach

Aspect-based sentiment analysis has been studied in both research and industrial communities over recent years. For the low-resource languages, the standard benchmark corpora play an important role in the development of methods. In this article, we introduce two benchmark corpora with the largest sizes at sentence-level for two tasks: Aspect Category Detection and Aspect Polarity Classification in Vietnamese. Our corpora are annotated with high inter-annotator agreements for the restaurant and hotel domains. The release of our corpora would push forward the low-resource language processing community. In addition, we deploy and compare the effectiveness of supervised learning methods with a single and multi-task approach based on deep learning architectures. Experimental results on our corpora show that the multi-task approach based on BERT architecture outperforms the neural network architectures and the single approach. Our corpora and source code are published on this footnoted site. 1

Download Full-text

FastSK: fast sequence analysis with gapped string kernels

Bioinformatics ◽

10.1093/bioinformatics/btaa817 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i857-i865

Author(s):

Derrick Blakely ◽

Eamon Collins ◽

Ritambhara Singh ◽

Andrew Norton ◽

Jack Lanchantin ◽

...

Keyword(s):

Sequence Analysis ◽

Dna Sequences ◽

English Language ◽

Computation Time ◽

Entity Recognition ◽

Supplementary Information ◽

Support Vector ◽

Homology Detection ◽

Scalable Algorithm ◽

String Kernels

Abstract Motivation Gapped k-mer kernels with support vector machines (gkm-SVMs) have achieved strong predictive performance on regulatory DNA sequences on modestly sized training sets. However, existing gkm-SVM algorithms suffer from slow kernel computation time, as they depend exponentially on the sub-sequence feature length, number of mismatch positions, and the task’s alphabet size. Results In this work, we introduce a fast and scalable algorithm for calculating gapped k-mer string kernels. Our method, named FastSK, uses a simplified kernel formulation that decomposes the kernel calculation into a set of independent counting operations over the possible mismatch positions. This simplified decomposition allows us to devise a fast Monte Carlo approximation that rapidly converges. FastSK can scale to much greater feature lengths, allows us to consider more mismatches, and is performant on a variety of sequence analysis tasks. On multiple DNA transcription factor binding site prediction datasets, FastSK consistently matches or outperforms the state-of-the-art gkmSVM-2.0 algorithms in area under the ROC curve, while achieving average speedups in kernel computation of ∼100× and speedups of ∼800× for large feature lengths. We further show that FastSK outperforms character-level recurrent and convolutional neural networks while achieving low variance. We then extend FastSK to 7 English-language medical named entity recognition datasets and 10 protein remote homology detection datasets. FastSK consistently matches or outperforms these baselines. Availability and implementation Our algorithm is available as a Python package and as C++ source code at https://github.com/QData/FastSK Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Polarity Classification of Arabic Sentiments

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2016070103 ◽

2016 ◽

Vol 11 (3) ◽

pp. 32-49 ◽

Cited By ~ 5

Author(s):

Mohammed N. Al-Kabi ◽

Heider A. Wahsheh ◽

Izzat M. Alsmadi

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Operating Characteristic ◽

Opinion Mining ◽

Online Social Network ◽

The Social ◽

Polarity Classification ◽

Arabic Sentiment Analysis ◽

Modern Standard

Sentiment Analysis/Opinion Mining is associated with social media and usually aims to automatically identify the polarities of different points of views of the users of the social media about different aspects of life. The polarity of a sentiment reflects the point view of its author about a certain issue. This study aims to present a new method to identify the polarity of Arabic reviews and comments whether they are written in Modern Standard Arabic (MSA), or one of the Arabic Dialects, and/or include Emoticons. The proposed method is called Detection of Arabic Sentiment Analysis Polarity (DASAP). A modest dataset of Arabic comments, posts, and reviews is collected from Online social network websites (i.e. Facebook, Blogs, YouTube, and Twitter). This dataset is used to evaluate the effectiveness of the proposed method (DASAP). Receiver Operating Characteristic (ROC) prediction quality measurements are used to evaluate the effectiveness of DASAP based on the collected dataset.

Download Full-text

Semantic orientation for polarity classification in Spanish reviews

Expert Systems with Applications ◽

10.1016/j.eswa.2013.06.076 ◽

2013 ◽

Vol 40 (18) ◽

pp. 7250-7257 ◽

Cited By ~ 41

Author(s):

M. Dolores Molina-González ◽

Eugenio Martínez-Cámara ◽

María-Teresa Martín-Valdivia ◽

José M. Perea-Ortega

Keyword(s):

Semantic Orientation ◽

Polarity Classification

Download Full-text

Polarity classification using structure-based vector representations of text

Decision Support Systems ◽

10.1016/j.dss.2015.04.002 ◽

2015 ◽

Vol 74 ◽

pp. 46-56 ◽

Cited By ~ 10

Author(s):

Alexander Hogenboom ◽

Flavius Frasincar ◽

Franciska de Jong ◽

Uzay Kaymak

Keyword(s):

Polarity Classification ◽

Vector Representations

Download Full-text