Suicide Note Sentiment Classification: A Supervised Approach Augmented by Web Data

Objective To create a sentiment classification system for the Fifth i2b2/VA Challenge Track 2, which can identify thirteen subjective categories and two objective categories. Design We developed a hybrid system using Support Vector Machine (SVM) classifiers with augmented training data from the Internet. Our system consists of three types of classification-based systems: the first system uses spanning n-gram features for subjective categories, the second one uses bag-of-n-gram features for objective categories, and the third one uses pattern matching for infrequent or subtle emotion categories. The spanning n-gram features are selected by a feature selection algorithm that leverages emotional corpus from weblogs. Special normalization of objective sentences is generalized with shallow parsing and external web knowledge. We utilize three sources of web data: the weblog of LiveJournal which helps to improve the feature selection, the eBay List which assists in special normalization of information and instructions categories, and the suicide project web which provides unlabeled data with similar properties as suicide notes. Measurements The performance is evaluated by the overall micro-averaged precision, recall and F-measure. Result Our system achieved an overall micro-averaged F-measure of 0.59. Happiness_peacefulness had the highest F-measure of 0.81. We were ranked as the second best out of 26 competing teams. Conclusion Our results indicated that classifying fine-grained sentiments at sentence level is a non-trivial task. It is effective to divide categories into different groups according to their semantic properties. In addition, our system performance benefits from external knowledge extracted from publically available web data of other purposes; performance can be further enhanced when more training data is available.

Download Full-text

An Improved Intelligent Approach to Enhance the Sentiment Classifier for Knowledge Discovery Using Machine Learning

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910999200528114552 ◽

2020 ◽

Vol 10 (4) ◽

pp. 582-593

Author(s):

Midde Venkateswarlu Naik ◽

D. Vasumathi ◽

A.P. Siva Kumar

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Global Warming ◽

Particle Swarm Optimization ◽

Sentiment Analysis ◽

Optimization Technique ◽

Particle Swarm ◽

Sentiment Classification ◽

Support Vector ◽

Swarm Optimization

Aims: The proposed research work is on an evolutionary enhanced method for sentiment or emotion classification on unstructured review text in the big data field. The sentiment analysis plays a vital role for current generation of people for extracting valid decision points about any aspect such as movie ratings, education institute or politics ratings, etc. The proposed hybrid approach combined the optimal feature selection using Particle Swarm Optimization (PSO) and sentiment classification through Support Vector Machine (SVM). The current approach performance is evaluated with statistical measures, such as precision, recall, sensitivity, specificity, and was compared with the existing approaches. The earlier authors have achieved an accuracy of sentiment classifier in the English text up to 94% as of now. In the proposed scheme, an average accuracy of sentiment classifier on distinguishing datasets outperformed as 99% by tuning various parameters of SVM, such as constant c value and kernel gamma value in association with PSO optimization technique. The proposed method utilized three datasets, such as airline sentiment data, weather, and global warming datasets, that are publically available. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Background: The sentiment analysis plays a vital role for current generation people for extracting valid decisions about any aspect such as movie rating, education institute or even politics ratings, etc. Sentiment Analysis (SA) or opinion mining has become fascinated scientifically as a research domain for the present environment. The key area is sentiment classification on semi-structured or unstructured data in distinguish languages, which has become a major research aspect. User-Generated Content [UGC] from distinguishing sources has been hiked significantly with rapid growth in a web environment. The huge user-generated data over social media provides substantial value for discovering hidden knowledge or correlations, patterns, and trends or sentiment extraction about any specific entity. SA is a computational analysis to determine the actual opinion of an entity which is expressed in terms of text. SA is also called as computation of emotional polarity expressed over social media as natural text in miscellaneous languages. Usually, the automatic superlative sentiment classifier model depends on feature selection and classification algorithms. Methods: The proposed work used Support vector machine as classification technique and particle swarm optimization technique as feature selection purpose. In this methodology, we tune various permutations and combination parameters in order to obtain expected desired results with kernel and without kernel technique for sentiment classification on three datasets, including airline, global warming, weather sentiment datasets, that are freely hosted for research practices. Results: In the proposed scheme, The proposed method has outperformed with 99.2% of average accuracy to classify the sentiment on different datasets, among other machine learning techniques. The attained high accuracy in classifying sentiment or opinion about review text proves superior effectiveness over existing sentiment classifiers. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Conclusion: The objective of the research issue sentiment classifier accuracy has been hiked with the help of Kernel-based Support Vector Machine (SVM) based on parameter optimization. The optimal feature selection to classify sentiment or opinion towards review documents has been determined with the help of a particle swarm optimization approach. The proposed method utilized three datasets to simulate the results, such as airline sentiment data, weather sentiment data, and global warming data that are freely available datasets.

Download Full-text

A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model

10.1101/205047 ◽

2017 ◽

Cited By ~ 1

Author(s):

Manato Akiyama ◽

Kengo Sato ◽

Yasubumi Sakakibara

Keyword(s):

Machine Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Prediction Accuracy ◽

Secondary Structure Prediction ◽

Training Data ◽

Support Vector ◽

Rna Secondary Structure Prediction ◽

Fine Grained

AbstractMotivation: A popular approach for predicting RNA secondary structure is the thermodynamic nearest neighbor model that finds a thermodynamically most stable secondary structure with the minimum free energy (MFE). For further improvement, an alternative approach that is based on machine learning techniques has been developed. The machine learning based approach can employ a fine-grained model that includes much richer feature representations with the ability to fit the training data. Although a machine learning based fine-grained model achieved extremely high performance in prediction accuracy, a possibility of the risk of overfitting for such model has been reported.Results: In this paper, we propose a novel algorithm for RNA secondary structure prediction that integrates the thermodynamic approach and the machine learning based weighted approach. Ourfine-grained model combines the experimentally determined thermodynamic parameters with a large number of scoring parameters for detailed contexts of features that are trained by the structured support vector machine (SSVM) with the ℓ1 regularization to avoid overfitting. Our benchmark shows that our algorithm achieves the best prediction accuracy compared with existing methods, and heavy overfitting cannot be observed.Availability: The implementation of our algorithm is available at https://github.com/keio-bioinformatics/mxfold.Contact:[email protected]

Download Full-text

Implementation of n-gram Methodology for Rotten Tomatoes Review Dataset Sentiment Analysis

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/ijkdb.2017010103 ◽

2017 ◽

Vol 7 (1) ◽

pp. 30-41 ◽

Cited By ~ 12

Author(s):

Prayag Tiwari ◽

Brojo Kishore Mishra ◽

Sachin Kumar ◽

Vivek Kumar

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Maximum Entropy ◽

Learning Strategies ◽

Supervised Machine Learning ◽

Support Vector ◽

N Gram ◽

F Measure ◽

Blog Posts

Sentiment Analysis intends to get the basic perspective of the content, which may be anything that holds a subjective supposition, for example, an online audit, Comments on Blog posts, film rating and so forth. These surveys and websites might be characterized into various extremity gatherings, for example, negative, positive, and unbiased keeping in mind the end goal to concentrate data from the info dataset. Supervised machine learning strategies group these reviews. In this paper, three distinctive machine learning calculations, for example, Support Vector Machine (SVM), Maximum Entropy (ME) and Naive Bayes (NB), have been considered for the arrangement of human conclusions. The exactness of various strategies is basically inspected keeping in mind the end goal to get to their execution on the premise of parameters, e.g. accuracy, review, f-measure, and precision.

Download Full-text

Detection Of Spam Comments On Instagram Using Complementary Naïve Bayes

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.47046 ◽

2019 ◽

Vol 13 (3) ◽

pp. 263

Author(s):

Nur Azizul Haqimi ◽

Nur Rokhman ◽

Sigit Priyanta

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Test Data ◽

Training Data ◽

Classification Method ◽

Support Vector ◽

Test Results ◽

Imbalanced Dataset ◽

Web Based ◽

F Measure

Instagram (IG) is a web-based and mobile social media application where users can share photos or videos with available features. Upload photos or videos with captions that contain an explanation of the photo or video that can reap spam comments. Comments on spam containing comments that are not relevant to the caption and photos. The problem that arises when identifying spam is non-spam comments are more dominant than spam comments so that it leads to the problem of the imbalanced dataset. A balanced dataset can influence the performance of a classification method. This is the focus of research related to the implementation of the CNB method in dealing with imbalance datasets for the detection of Instagram spam comments. The study used TF-IDF weighting with Support Vector Machine (SVM) as a comparison classification. Based on the test results with 2500 training data and 100 test data on the imbalanced dataset (25% spam and 75% non-spam), the CNB accuracy was 92%, precision 86% and f-measure 93%. Whereas SVM produces 87% accuracy, 79% precision, 88% f-measure. In conclusion, the CNB method is more suitable for detecting spam comments in cases of imbalanced datasets.

Download Full-text

Modified framework for sarcasm detection and classification in sentiment analysis

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v13.i3.pp1175-1183 ◽

2019 ◽

Vol 13 (3) ◽

pp. 1175 ◽

Cited By ~ 1

Author(s):

Mohd Suhairi Md Suhaimin ◽

Mohd Hanafi Ahmad Hijazi ◽

Rayner Alfred ◽

Frans Coenen

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Classification Performance ◽

Sentiment Classification ◽

Support Vector ◽

Linear Support Vector Machine ◽

Social Media Data ◽

Classification Framework ◽

Media Data ◽

F Measure

<span>Sentiment analysis is directed at identifying people's opinions, beliefs, views and emotions in the context of the entities and attributes that appear in text. The presence of sarcasm, however, can significantly hamper sentiment analysis. In this paper a sentiment classification framework is presented that incorporates sarcasm detection. The framework was evaluated using a non-linear Support Vector Machine and Malay social media data. The results obtained demonstrated that the proposed sarcasm detection process could successfully detect the presence of sarcasm in that better sentiment classification performance was recorded. A best average F-measure score of 0.905 was recorded using the framework; a significantly better result than when sentiment classification was performed without sarcasm detection.</span>

Download Full-text

Penguin Search Optimization Based Feature Selection for Automated Opinion Mining

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2629.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 648-653

Keyword(s):

Feature Selection ◽

Language Processing ◽

Opinion Mining ◽

Sentiment Classification ◽

Support Vector ◽

Svm Classifier ◽

Np Hard ◽

Search Optimization ◽

Selection Approach ◽

Feature Selection Approach

Twitter sentiment analysis is a vital concept in determining the public opinions about products, services, events or personality. Analyzing the medical tweets on a specific topic can provide immense benefits in medical industry. However, the medical tweets require efficient feature selection approach to produce significantly accurate results. Penguin search optimization algorithm (PeSOA) has the ability to resolve NP-hard problems. This paper aims at developing an automated opinion mining framework by modeling the feature selection problem as NP-hard optimization problem and using PeSOA based feature selection approach to solve it. Initially, the medical tweets based on cancer and drugs keywords are extracted and pre-processed to filter the relevant informative tweets. Then the features are extracted based on the Natural Language Processing (NLP) concepts and the optimal features are selected using PeSOA whose results are fed as input to three baseline classifiers to achieve optimal and accurate sentiment classification. The experimental results obtained through MATLAB simulations on cancer and drug tweets using k-Nearest Neighbor (KNN), Naïve Bayes (NB) and Support Vector Machine (SVM) indicate that the proposed PeSOA feature selection based tweet opinion mining has improved the classification performance significantly. It shows that the PeSOA feature selection with the SVM classifier provides superior sentiment classification than the other classifiers

Download Full-text

A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720018400255 ◽

2018 ◽

Vol 16 (06) ◽

pp. 1840025 ◽

Cited By ~ 5

Author(s):

Manato Akiyama ◽

Kengo Sato ◽

Yasubumi Sakakibara

Keyword(s):

Machine Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Prediction Accuracy ◽

Secondary Structure Prediction ◽

Training Data ◽

Support Vector ◽

Rna Secondary Structure Prediction ◽

Fine Grained

A popular approach for predicting RNA secondary structure is the thermodynamic nearest-neighbor model that finds a thermodynamically most stable secondary structure with minimum free energy (MFE). For further improvement, an alternative approach that is based on machine learning techniques has been developed. The machine learning-based approach can employ a fine-grained model that includes much richer feature representations with the ability to fit the training data. Although a machine learning-based fine-grained model achieved extremely high performance in prediction accuracy, a possibility of the risk of overfitting for such a model has been reported. In this paper, we propose a novel algorithm for RNA secondary structure prediction that integrates the thermodynamic approach and the machine learning-based weighted approach. Our fine-grained model combines the experimentally determined thermodynamic parameters with a large number of scoring parameters for detailed contexts of features that are trained by the structured support vector machine (SSVM) with the [Formula: see text] regularization to avoid overfitting. Our benchmark shows that our algorithm achieves the best prediction accuracy compared with existing methods, and heavy overfitting cannot be observed. The implementation of our algorithm is available at https://github.com/keio-bioinformatics/mxfold .

Download Full-text

Learning from Web Data Using Adversarial Discriminative Neural Networks for Fine-Grained Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301273 ◽

2019 ◽

Vol 33 ◽

pp. 273-280 ◽

Cited By ~ 2

Author(s):

Xiaoxiao Sun ◽

Liyi Chen ◽

Jufeng Yang

Keyword(s):

Neural Networks ◽

Large Scale ◽

State Of The Art ◽

Training Data ◽

Web Data ◽

Fine Grained ◽

Learning Framework ◽

Attractive Option ◽

Public Datasets ◽

Noisy Labels

Fine-grained classification is absorbed in recognizing the subordinate categories of one field, which need a large number of labeled images, while it is expensive to label these images. Utilizing web data has been an attractive option to meet the demands of training data for convolutional neural networks (CNNs), especially when the well-labeled data is not enough. However, directly training on such easily obtained images often leads to unsatisfactory performance due to factors such as noisy labels. This has been conventionally addressed by reducing the noise level of web data. In this paper, we take a fundamentally different view and propose an adversarial discriminative loss to advocate representation coherence between standard and web data. This is further encapsulated in a simple, scalable and end-to-end trainable multi-task learning framework. We experiment on three public datasets using large-scale web data to evaluate the effectiveness and generalizability of the proposed approach. Extensive experiments demonstrate that our approach performs favorably against the state-of-the-art methods.

Download Full-text

Sentiment classification using hybrid feature selection and ensemble classifier

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189738 ◽

2021 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Achin Jain ◽

Vanita Jain

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Information Gain ◽

Statistical Significance ◽

Sentiment Classification ◽

Ensemble Classification ◽

Support Vector ◽

Svm Classifier ◽

Feature Selection Technique ◽

Restaurant Reviews

This paper presents a Hybrid Feature Selection Technique for Sentiment Classification. We have used a Genetic Algorithm and a combination of existing Feature Selection methods, namely: Information Gain (IG), CHI Square (CHI), and GINI Index (GINI). First, we have obtained features from three different selection approaches as mentioned above and then performed the UNION SET Operation to extract the reduced feature set. Then, Genetic Algorithm is applied to optimize the feature set further. This paper also presents an Ensemble Approach based on the error rate obtained different domain datasets. To test our proposed Hybrid Feature Selection and Ensemble Classification approach, we have considered four Support Vector Machine (SVM) classifier variants. We have used UCI ML Datasets of three domains namely: IMDB Movie Review, Amazon Product Review and Yelp Restaurant Reviews. The experimental results show that our proposed approach performed best in all three domain datasets. Further, we also presented T-Test for Statistical Significance between classifiers and comparison is also done based on Precision, Recall, F1-Score, AUC and model execution time.

Download Full-text