Sentiment Classification of News Text Data Using Intelligent Model

Frontiers in Psychology ◽

10.3389/fpsyg.2021.758967 ◽

2021 ◽

Vol 12 ◽

Author(s):

Shitao Zhang

Keyword(s):

Language Processing ◽

Dictionary Learning ◽

Learning Algorithm ◽

Sentiment Classification ◽

Training Data ◽

Text Data ◽

Discriminative Performance ◽

Traffic Jam ◽

Intelligent Model

Text sentiment classification is a fundamental sub-area in natural language processing. The sentiment classification algorithm is highly domain-dependent. For example, the phrase “traffic jam” expresses negative sentiment in the sentence “I was stuck in a traffic jam on the elevated for 2 h.” But in the domain of transportation, the phrase “traffic jam” in the sentence “Bread and water are essential terms in traffic jams” is without any sentiment. The most common method is to use the domain-specific data samples to classify the text in this domain. However, text sentiment analysis based on machine learning relies on sufficient labeled training data. Aiming at the problem of sentiment classification of news text data with insufficient label news data and the domain adaptation of text sentiment classifiers, an intelligent model, i.e., transfer learning discriminative dictionary learning algorithm (TLDDL) is proposed for cross-domain text sentiment classification. Based on the framework of dictionary learning, the samples from the different domains are projected into a subspace, and a domain-invariant dictionary is built to connect two different domains. To improve the discriminative performance of the proposed algorithm, the discrimination information preserved term and principal component analysis (PCA) term are combined into the objective function. The experiments are performed on three public text datasets. The experimental results show that the proposed algorithm improves the sentiment classification performance of texts in the target domain.

Download Full-text

Concept of TF-IDF, Common Bag of Word and Word Embedding for Effective Sentiment Classification

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f4582.049620 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2198-2201

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Sentiment Classification ◽

Word Embedding ◽

Text Representation ◽

Human Beings ◽

Text Data

Sentiment Classification is one of the well-known and most popular domain of machine learning and natural language processing. An algorithm is developed to understand the opinion of an entity similar to human beings. This research fining article presents a similar to the mention above. Concept of natural language processing is considered for text representation. Later novel word embedding model is proposed for effective classification of the data. Tf-IDF and Common BoW representation models were considered for representation of text data. Importance of these models are discussed in the respective sections. The proposed is testing using IMDB datasets. 50% training and 50% testing with three random shuffling of the datasets are used for evaluation of the model.

Download Full-text

Amrita-CEN-Senti-DB:Twitter Dataset for Sentimental Analysis and Application of Classical Machine Learning and Deep Learning

10.36227/techrxiv.12058968 ◽

2020 ◽

Author(s):

vinayakumar R

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Language Processing ◽

Learning Algorithm ◽

Support Vector ◽

Text Data ◽

Inverse Document Frequency ◽

Logistics Regression ◽

Document Frequency

Social media is a platform in which tons and tons of text are generated each and every day. The data is so large that cannot be easily understood, so this has paved a path to a new field in the information technology which is natural language processing. In this paper, the text data which is used for the classification is tweets that determines the state of the person according of the sentiments which is positive, negative and neutral. Emotions are the way of expression of the person’s feelings which has a high influence on the decision making tasks. Here we have proposed the text representation, Term Frequency Inverse Document Frequency (tfidf), Keras embedding along with the machine learning and deep learning algorithms for the purpose of the classification of the sentiments, out of which Logistics Regression machine learning based methods out performs well when the features is taken in the limited amount as the features increases Support Vector Machine (SVM) which is also one of the machine learning algorithm out performs well making a benchmark accuracy for this dataset as the 75.8%. For the research purpose the dataset has been made publically available.

Download Full-text

Amrita-CEN-Senti-DB:Twitter Dataset for Sentimental Analysis and Application of Classical Machine Learning and Deep Learning

10.36227/techrxiv.12058968.v1 ◽

2020 ◽

Author(s):

vinayakumar R

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Language Processing ◽

Learning Algorithm ◽

Support Vector ◽

Text Data ◽

Inverse Document Frequency ◽

Logistics Regression ◽

Document Frequency

Download Full-text

Application of Latent Dirichlet Allocation (LDA) for clustering financial tweets

E3S Web of Conferences ◽

10.1051/e3sconf/202129701071 ◽

2021 ◽

Vol 297 ◽

pp. 01071

Author(s):

Sifi Fatima-Zahrae ◽

Sabbar Wafae ◽

El Mzabi Amal

Keyword(s):

Language Processing ◽

Latent Dirichlet Allocation ◽

Sentiment Classification ◽

Research Areas ◽

Preprocessing Method ◽

Long Time ◽

Standard Text ◽

The Given ◽

Dirichlet Allocation

Sentiment classification is one of the hottest research areas among the Natural Language Processing (NLP) topics. While it aims to detect sentiment polarity and classification of the given opinion, requires a large number of aspect extractions. However, extracting aspect takes human effort and long time. To reduce this, Latent Dirichlet Allocation (LDA) method have come out recently to deal with this issue.In this paper, an efficient preprocessing method for sentiment classification is presented and will be used for analyzing user’s comments on Twitter social network. For this purpose, different text preprocessing techniques have been used on the dataset to achieve an acceptable standard text. Latent Dirichlet Allocation has been applied on the obtained data after this fast and accurate preprocessing phase. The implementation of different sentiment analysis methods and the results of these implementations have been compared and evaluated. The experimental results show that the combined uses of the preprocessing method of this paper and Latent Dirichlet Allocation have an acceptable results compared to other basic methods.

Download Full-text

Adapting SVM for data sparseness and imbalance: a case study in information extraction

Natural Language Engineering ◽

10.1017/s1351324908004968 ◽

2009 ◽

Vol 15 (2) ◽

pp. 241-271 ◽

Cited By ~ 31

Author(s):

YAOYONG LI ◽

KALINA BONTCHEVA ◽

HAMISH CUNNINGHAM

Keyword(s):

Active Learning ◽

Language Learning ◽

Information Extraction ◽

Language Processing ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Passive Learning ◽

Wide Range

AbstractSupport Vector Machines (SVM) have been used successfully in many Natural Language Processing (NLP) tasks. The novel contribution of this paper is in investigating two techniques for making SVM more suitable for language learning tasks. Firstly, we propose an SVM with uneven margins (SVMUM) model to deal with the problem of imbalanced training data. Secondly, SVM active learning is employed in order to alleviate the difficulty in obtaining labelled training data. The algorithms are presented and evaluated on several Information Extraction (IE) tasks, where they achieved better performance than the standard SVM and the SVM with passive learning, respectively. Moreover, by combining SVMUM with the active learning algorithm, we achieve the best reported results on the seminars and jobs corpora, which are benchmark data sets used for evaluation and comparison of machine learning algorithms for IE. In addition, we also evaluate the token based classification framework for IE with three different entity tagging schemes. In comparison to previous methods dealing with the same problems, our methods are both effective and efficient, which are valuable features for real-world applications. Due to the similarity in the formulation of the learning problem for IE and for other NLP tasks, the two techniques are likely to be beneficial in a wide range of applications1.

Download Full-text

Classification of Tourist Comment Using Word2vec and Random Forest Algorithm

Journal of Environmental Management and Tourism ◽

10.14505//jemt.v9.8(32).11 ◽

2019 ◽

Vol 9 (8) ◽

pp. 1725

Author(s):

Isra Nurul HABIBI ◽

Abba Suganda GIRSANG

Keyword(s):

Social Media ◽

Text Classification ◽

Learning Algorithm ◽

Training Data ◽

Machine Learning Algorithm ◽

Grouped Data ◽

Semantic Relationship ◽

Vector Representation ◽

Random Forest Algorithm

Text classification is one of the ways to classify sentences. The grouped data are comments from social media with training data from sites that provide points /scores for each review given such as tripadvisor.co.id. The word2vec method is used to extract words into numbers so that the machine learning algorithm can be applied to classify data. Word2vec is an unsupervised task that is capable of utilizing unlabeled data to convert a word into its vector representation that can also find the semantic relationship between words by counting their distance. The goal from this paper is that data from social media such as Twitter or Instagram can also quickly find out the total /weight of a tourist place from the comment given. The experiment shows that the result of F1 Score on data without removing stop words and eliminate the train data, give a better result 0,85.

Download Full-text

Comparing ELM with SVM in the Field of Sentiment Classification of Social Media Text Data

Proceedings in Adaptation, Learning and Optimization - Proceedings of ELM 2018 ◽

10.1007/978-3-030-23307-5_36 ◽

2019 ◽

pp. 336-344

Author(s):

Zhihuan Chen ◽

Zhaoxia Wang ◽

Zhiping Lin ◽

Ting Yang

Keyword(s):

Social Media ◽

Sentiment Classification ◽

Text Data ◽

Social Media Text

Download Full-text

Identification of Mimo dynamic system using inverse Mimo Neural Narx model

Science and Technology Development Journal ◽

10.32508/stdj.v16i2.1506 ◽

2013 ◽

Vol 16 (2) ◽

pp. 13-25

Author(s):

Anh Pham Huy Ho ◽

Nam Thanh Nguyen

Keyword(s):

Learning Algorithm ◽

Back Propagation ◽

Artificial Muscle ◽

Training Data ◽

The Novel ◽

Robot Arm ◽

Dynamic Inverse ◽

Intelligent Model ◽

First Time ◽

Narx Model

This paper investigates the application of proposed neural MIMO NARX model to a nonlinear 2-axes pneumatic artificial muscle (PAM) robot arm as to improve its performance in modeling and identification. The contact force variations and nonlinear coupling effects of both joints of the 2-axes PAM robot arm are modeled thoroughly through the novel dynamic inverse neural MIMO NARX model exploiting experimental input-output training data. For the first time, the dynamic neural inverse MIMO NARX Model of the 2-axes PAM robot arm has been investigated. The results show that this proposed dynamic intelligent model trained by Back Propagation learning algorithm yields both of good performance and accuracy. The novel dynamic neural MIMO NARX model proves efficient for modeling and identification not only the 2-axes PAM robot arm but also other nonlinear dynamic systems.

Download Full-text

Extraction of Sea Ice Cover by Sentinel-1 SAR Based on SVM with Unsupervised Generation of Training Data

10.20944/preprints202005.0336.v1 ◽

2020 ◽

Author(s):

Xiaoming Li ◽

Yan Sun ◽

Qiang Zhang

Keyword(s):

Machine Learning ◽

Sea Ice ◽

Learning Algorithm ◽

Texture Features ◽

Open Water ◽

Ice Cover ◽

Training Data ◽

Support Vector ◽

Training Samples

In this paper, we focus on developing a novel method to extract sea ice cover (i.e., discrimination/classification of sea ice and open water) using Sentinel-1 (S1) cross-polarization (vertical-horizontal, VH or horizontal-vertical, HV) data in extra wide (EW) swath mode based on the machine learning algorithm support vector machine (SVM). The classification basis includes the S1 radar backscatter coefficients and texture features that are calculated from S1 data using the gray level co-occurrence matrix (GLCM). Different from previous methods where appropriate samples are manually selected to train the SVM to classify sea ice and open water, we proposed a method of unsupervised generation of the training samples based on two GLCM texture features, i.e. entropy and homogeneity, that have contrasting characteristics on sea ice and open water. We eliminate the most uncertainty of selecting training samples in machine learning and achieve automatic classification of sea ice and open water by using S1 EW data. The comparison shows good agreement between the SAR-derived sea ice cover using the proposed method and a visual inspection, of which the accuracy reaches approximately 90% - 95% based on a few cases. Besides this, compared with the analyzed sea ice cover data Ice Mapping System (IMS) based on 728 S1 EW images, the accuracy of extracted sea ice cover by using S1 data is more than 80%.

Download Full-text

Deep Learning Based Truth Discovery Algorithm for Research the Genuineness of Given Text Corpus

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1112.0782s319 ◽

2019 ◽

Vol 8 (2S3) ◽

pp. 605-611

Keyword(s):

Deep Learning ◽

Language Processing ◽

Learning Algorithm ◽

Multiple Sources ◽

Text Data ◽

Text Corpus ◽

Automatic Feature Extraction ◽

Deep Learning Algorithm ◽

Truth Discovery ◽

Improved Accuracy

Lot of research has gone into Natural language processing and the state of the art algorithms in deep learning that unambiguously helps in converting an English text into a data structure without loss of meaning. Also with the advent of neural networks for learning word representations as vectors has helped a lot in revolutionizing the automatic feature extraction from text data corpus. A combination of word embedding and the use of a deep learning algorithm like a convolution neural network helped in better accuracy for text classification. In this era of Internet of things and the voluminous amounts of data that is overwhelming the users determining the veracity of the data is a very challenging task. There are many truth discovery algorithms in literature that help in resolving the conflicts that arise due to multiple sources of data. These algorithms help in estimating the trustworthiness of the data and reliability of the sources. In this paper, a convolution based truth discovery with multitasking is proposed to estimate the genuineness of the data for a given text corpus. The proposed algorithm has been tested on analysing the genuineness of Quora questions dataset and experimental results showed an improved accuracy and speed over other existing approaches.

Download Full-text