Multilabel Classification for Emotion Analysis of Multilingual Tweets

Emotion Analysis of text targets to detect and recognize types of feelings expressed in text. Emotion analysis is successor of Sentiment analysis. The latter does coarse-level analysis and classify the text into positive and negative categories while former does fine-grain analysis and classify text in specific emotion categories like happy, surprise, angry. Analysis of text at fine-level provides deeper insight compared to coarse-level analysis. In this paper, tweets are classified in discrete eight basic emotions namely joy, trust, fear, surprise, sadness, anticipation, anger, disgust specified in Plutchik’s wheel of emotions [1]. Tweets for three languages collected out of which one is English language and rest two are Indian languages namely Gujarati and Hindi. The collected tweets are related to Indian politics and are annotated manually. Supervised Learning and Hybrid approach are used for classification of tweets. Supervised learning uses tf-idf as features while hybrid approach uses primary and secondary features. Primary features are generated using tf-idf weighting and two different algorithms of feature generation are proposed which generate secondary features using SenticNet resource. Multilabel classification is performed to classify tweets in emotion categories. Results of experiments show effectiveness of hybrid approach.

Download Full-text

Graph node rank based important keyword detection from Twitter

Applied Computing and Informatics ◽

10.1016/j.aci.2018.08.002 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Mukesh Kumar ◽

Palak Rehan

Keyword(s):

Real Time ◽

English Language ◽

Hybrid Approach ◽

Levenshtein Distance ◽

Social Media Networks ◽

The Social ◽

Keyword Detection ◽

Product Launches ◽

Process Detection ◽

Good Resource

Social media networks like Twitter, Facebook, WhatsApp etc. are most commonly used medium for sharing news, opinions and to stay in touch with peers. Messages on twitter are limited to 140 characters. This led users to create their own novel syntax in tweets to express more in lesser words. Free writing style, use of URLs, markup syntax, inappropriate punctuations, ungrammatical structures, abbreviations etc. makes it harder to mine useful information from them. For each tweet, we can get an explicit time stamp, the name of the user, the social network the user belongs to, or even the GPS coordinates if the tweet is created with a GPS-enabled mobile device. With these features, Twitter is, in nature, a good resource for detecting and analyzing the real time events happening around the world. By using the speed and coverage of Twitter, we can detect events, a sequence of important keywords being talked, in a timely manner which can be used in different applications like natural calamity relief support, earthquake relief support, product launches, suspicious activity detection etc. The keyword detection process from Twitter can be seen as a two step process: detection of keyword in the raw text form (words as posted by the users) and keyword normalization process (reforming the users’ unstructured words in the complete meaningful English language words). In this paper a keyword detection technique based upon the graph, spanning tree and Page Rank algorithm is proposed. A text normalization technique based upon hybrid approach using Levenshtein distance, demetaphone algorithm and dictionary mapping is proposed to work upon the unstructured keywords as produced by the proposed keyword detector. The proposed normalization technique is validated using the standard lexnorm 1.2 dataset. The proposed system is used to detect the keywords from Twiter text being posted at real time. The detected and normalized keywords are further validated from the search engine results at later time for detection of events.

Download Full-text

Method for automatic key concepts extraction

The Electronic Library ◽

10.1108/el-01-2018-0012 ◽

2019 ◽

Vol 37 (1) ◽

pp. 2-15 ◽

Cited By ~ 2

Author(s):

Sudarsana Desul ◽

Madurai Meenachi N. ◽

Thejas Venkatesh ◽

Vijitha Gunta ◽

Gowtham R. ◽

...

Keyword(s):

Coming Out ◽

English Language ◽

Hybrid Approach ◽

Ontology Learning ◽

Specific Domain ◽

Content Type ◽

Concept Extraction ◽

Nuclear Domain ◽

Wide Range ◽

Key Concepts

PurposeOntology of a domain mainly consists of a set of concepts and their semantic relations. It is typically constructed and maintained by using ontology editors with substantial human intervention. It is desirable to perform the task automatically, which has led to the development of ontology learning techniques. One of the main challenges of ontology learning from the text is to identify key concepts from the documents. A wide range of techniques for key concept extraction have been proposed but are having the limitations of low accuracy, poor performance, not so flexible and applicability to a specific domain. The propose of this study is to explore a new method to extract key concepts and to apply them to literature in the nuclear domain.Design/methodology/approachIn this article, a novel method for key concept extraction is proposed and applied to the documents from the nuclear domain. A hybrid approach was used, which includes a combination of domain, syntactic name entity knowledge and statistical based methods. The performance of the developed method has been evaluated from the data obtained using two out of three voting logic from three domain experts by using 120 documents retrieved from SCOPUS database.FindingsThe work reported pertains to extracting concepts from the set of selected documents and aids the search for documents relating to given concepts. The results of a case study indicated that the method developed has demonstrated better metrics than Text2Onto and CFinder. The method described has the capability of extracting valid key concepts from a set of candidates with long phrases.Research limitations/implicationsThe present study is restricted to literature coming out in the English language and applied to the documents from nuclear domain. It has the potential to extend to other domains also.Practical implicationsThe work carried out in the current study has the potential of leading to updating International Nuclear Information System thesaurus for ontology in the nuclear domain. This can lead to efficient search methods.Originality/valueThis work is the first attempt to automatically extract key concepts from the nuclear documents. The proposed approach will address and fix the most of the problems that are existed in the current methods and thereby increase the performance.

Download Full-text

A comparative review of the challenges encountered in sentiment analysis of Indian regional language tweets vs English language tweets

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.21.12394 ◽

2018 ◽

Vol 7 (2.21) ◽

pp. 319

Author(s):

Saini Jacob Soman ◽

P Swaminathan ◽

R Anandan ◽

K Kalaivani

Keyword(s):

Sentiment Analysis ◽

Life Events ◽

Social Networking Sites ◽

Positive Emotions ◽

English Language ◽

Indian Languages ◽

Regional Language ◽

Comparative Review ◽

Computational Research ◽

Regional Languages

With the developed use of online medium these days for sharing views, sentiments and opinions about products, services, organization and people, micro blogging and social networking sites are acquiring a huge popularity. One of the biggest social media sites namely Twitter is used by several people to share their life events, views and opinion about different areas and concepts. Sentiment analysis is the computational research of reviews, opinions, attitudes, views and peoples’ emotions about different products, services, firms and topics through categorizing them as negative and positive emotions. Sentiment analysis of tweets is a challenging task. This paper makes a critical review on the comparison of the challenges associated with sentiment analysis of Tweets in English Language versus Indian Regional Languages. Five Indian languages namely Tamil, Malayalam, Telugu, Hindi and Bengali have been considered in this research and several challenges associated with the analysis of Twitter sentiments in those languages have been identified and conceptualized in the form of a framework in this research through systematic review.

Download Full-text

The Error Analysis for Enterprise Software Application Using Analytic Hierarchy Process and Supervised Learning: A Hybrid Approach on Root Cause Analysis

Software Engineering Perspectives in Intelligent Systems - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-63319-6_59 ◽

2020 ◽

pp. 633-643

Author(s):

Hoo Meng Wong ◽

Sagaya Sabestinal Amalathas

Keyword(s):

Error Analysis ◽

Analytic Hierarchy Process ◽

Supervised Learning ◽

Hybrid Approach ◽

Enterprise Software ◽

Analytic Hierarchy ◽

Cause Analysis ◽

Software Application ◽

Root Cause ◽

Hierarchy Process

Download Full-text

Improving Aphasic Speech Recognition by Using Novel Semi-Supervised Learning Methods on AphasiaBank for English and Spanish

Applied Sciences ◽

10.3390/app11198872 ◽

2021 ◽

Vol 11 (19) ◽

pp. 8872

Author(s):

Iván G. Torre ◽

Mónica Romero ◽

Aitor Álvarez

Keyword(s):

Speech Recognition ◽

Supervised Learning ◽

Automatic Speech Recognition ◽

English Language ◽

Spanish Language ◽

Learning Methods ◽

Text Data ◽

Lower Performance ◽

Recognition Systems ◽

Fine Tune

Automatic speech recognition in patients with aphasia is a challenging task for which studies have been published in a few languages. Reasonably, the systems reported in the literature within this field show significantly lower performance than those focused on transcribing non-pathological clean speech. It is mainly due to the difficulty of recognizing a more unintelligible voice, as well as due to the scarcity of annotated aphasic data. This work is mainly focused on applying novel semi-supervised learning methods to the AphasiaBank dataset in order to deal with these two major issues, reporting improvements for the English language and providing the first benchmark for the Spanish language for which less than one hour of transcribed aphasic speech was used for training. In addition, the influence of reinforcing the training and decoding processes with out-of-domain acoustic and text data is described by using different strategies and configurations to fine-tune the hyperparameters and the final recognition systems. The interesting results obtained encourage extending this technological approach to other languages and scenarios where the scarcity of annotated data to train recognition models is a challenging reality.

Download Full-text

A Hybrid Approach of Pattern Extraction and Semi-supervised Learning for Vietnamese Named Entity Recognition

Computational Collective Intelligence. Technologies and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-642-34630-9_9 ◽

2012 ◽

pp. 83-93 ◽

Cited By ~ 1

Author(s):

Duc-Thuan Vo ◽

Cheol-Young Ock

Keyword(s):

Supervised Learning ◽

Hybrid Approach ◽

Named Entity Recognition ◽

Entity Recognition ◽

Pattern Extraction ◽

Named Entity

Download Full-text

A novel approach to the creation of a labelling lexicon for improving emotion analysis in text

The Electronic Library ◽

10.1108/el-04-2020-0110 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Alejandra Segura Navarrete ◽

Claudia Martinez-Araneda ◽

Christian Vidal-Castro ◽

Clemente Rubio-Manzano

Keyword(s):

Hybrid Approach ◽

Software Tool ◽

Gender Violence ◽

Target Language ◽

Automatic Process ◽

Emotion Analysis ◽

Content Type ◽

Novel Approach ◽

Emotion Lexicon ◽

And Gender

Purpose This paper aims to describe the process used to create an emotion lexicon enriched with the emotional intensity of words and focuses on improving the emotion analysis process in texts. Design/methodology/approach The process includes setting, preparation and labelling stages. In the first stage, a lexicon is selected. It must include a translation to the target language and labelling according to Plutchik’s eight emotions. The second stage starts with the validation of the translations. Then, it is expanded with the synonyms of the emotion synsets of each word. In the labelling stage, the similarity of words is calculated and displayed using WordNet similarity. Findings The authors’ approach shows better performance to identification of the predominant emotion for the selected corpus. The most relevant is the improvement obtained in the results of the emotion analysis in a hybrid approach compared to the results obtained in a purist approach. Research limitations/implications The proposed lexicon can still be enriched by incorporating elements such as emojis, idioms and colloquial expressions. Practical implications This work is part of a research project that aids in solving problems in a digital society, such as detecting cyberbullying, abusive language and gender violence in texts or exercising parental control. Detection of depressive states in young people and children is added. Originality/value This semi-automatic process can be applied to any language to generate an emotion lexicon. This resource will be available in a software tool that implements a crowdsourcing strategy allowing the intensity to be re-labelled and new words to be automatically incorporated into the lexicon.

Download Full-text

The development of stemming algorithm for the Uzbek language

Кибернетика и программирование ◽

10.25136/2644-5522.2021.1.35847 ◽

2021 ◽

pp. 1-12

Author(s):

Ilkhom Izatovich Bakaev

Keyword(s):

English Language ◽

Hybrid Approach ◽

Data Retrieval ◽

Natural Languages ◽

Effective Implementation ◽

Linguistic Rules ◽

Word Forms ◽

The Subject ◽

Normal Word ◽

The Given

The automatic processing of unstructured texts in natural languages is one of the relevant problems of computer analysis and text synthesis. Within this problem, the author singles out a task of text normalization, which usually suggests such processes as tokenization, stemming, and lemmatization. The existing stemming algorithms for the most part are oriented towards the synthetic languages with inflectional morphemes. The Uzbek language represents an example of agglutinative language, characterized by polysemanticity of affixal and auxiliary morphemes. Although the Uzbek language largely differs from, for example, English language, it is successfully processed by stemming algorithms. There are virtually no examples of effective implementation of stemming algorithms for the Uzbek language; therefore, this questions is the subject of scientific interest and defines the goal of this work. In the course of this research, the author solved the task of bringing the given texts in the Uzbek language to normal form, which on the preliminary stage were tokenized and cleared of stop words. To author developed the method of normalization of texts in the Uzbek language based on the stemming algorithm. The development of stemming algorithm employed hybrid approach with application of algorithmic method, lexicon of linguistic rules and database of the normal word forms of the Uzbek language. The precision of the proposed algorithm depends on the precision of tokenization algorithm. At the same time, the article did not explore the question of finding the roots of paired words separated by spaces, as this task is solved at the stage of tokenization. The algorithm can be integrated into various automated systems for machine translation, information extraction, data retrieval, etc.

Download Full-text

Self-access Language Learning Programme: The case of the English Language Voluntary Intensive Independent Catch-up Study

Studies in Self-Access Learning Journal ◽

10.37237/040205 ◽

2013 ◽

pp. 125-140

Author(s):

Salomi Papadima-Sophocleous

Keyword(s):

Language Learning ◽

English Language ◽

First Year ◽

Test Results ◽

Language Competence ◽

Self Confidence ◽

Greek Cypriot ◽

The Common ◽

Catch Up ◽

Level Analysis

This study investigated whether and to what extent an English Language Voluntary Intensive Independent Catch-up Study (ELVIICS), a Self-Access Language Learning (SALL) programme, was effective in helping first-year Greek-Cypriot students fill in the gaps in their English language learning and come closer to the required language competence level of the Common European Framework of Reference (CEFR) B1 level. It also examined students’ perceptions of such learning. The students followed the ELVIICS at their own pace, time and space until they felt they had reached the aimed level. Analysis of the achievement test results revealed that students’ language competence improved and reached the required level. Additional quantitative data also revealed that students felt ELVIICS also helped them improve their self-confidence, computer skills and autonomous learning. Moreover, students claimed that ELVIICS assisted them in getting through and successfully completing their compulsory course.

Download Full-text

PREDICTING LAND PRICES AND MEASURING UNCERTAINTY BY COMBINING SUPERVISED AND UNSUPERVISED LEARNING

International Journal of Strategic Property Management ◽

10.3846/ijspm.2021.14293 ◽

2021 ◽

Vol 25 (2) ◽

pp. 169-178

Author(s):

Changro Lee

Keyword(s):

Neural Network ◽

Principal Component Analysis ◽

Deep Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Hybrid Approach ◽

Principal Component ◽

Component Analysis ◽

Land Prices ◽

Property Valuation

Despite the popularity deep learning has been gaining, measuring the uncertainty within the result has not met expectations in many deep learning applications and this includes property valuation. In real-world tasks, however, rather than simply requiring predictions, assurance of the certainty of the predictions is also demanded. In this study, supervised learning is combined with unsupervised learning to bridge this gap. A method based on principal component analysis, a popular tool of unsupervised learning, was developed and used to represent the uncertainty in property valuation. Then, a neural network, a representative algorithm to implement supervised learning, was constructed, and trained to predict land prices. Finally, the uncertainty that was measured using principal component analysis was incorporated into the price predicted by the neural network. This hybrid approach is shown to be likely to improve the credibility of the valuation work. The findings of this study are expected to generate interest in the integration of the two learning approaches, thereby promoting the rapid adoption of deep learning tools in the property valuation industry.

Download Full-text