scholarly journals Multilabel Classification for Emotion Analysis of Multilingual Tweets

Emotion Analysis of text targets to detect and recognize types of feelings expressed in text. Emotion analysis is successor of Sentiment analysis. The latter does coarse-level analysis and classify the text into positive and negative categories while former does fine-grain analysis and classify text in specific emotion categories like happy, surprise, angry. Analysis of text at fine-level provides deeper insight compared to coarse-level analysis. In this paper, tweets are classified in discrete eight basic emotions namely joy, trust, fear, surprise, sadness, anticipation, anger, disgust specified in Plutchik’s wheel of emotions [1]. Tweets for three languages collected out of which one is English language and rest two are Indian languages namely Gujarati and Hindi. The collected tweets are related to Indian politics and are annotated manually. Supervised Learning and Hybrid approach are used for classification of tweets. Supervised learning uses tf-idf as features while hybrid approach uses primary and secondary features. Primary features are generated using tf-idf weighting and two different algorithms of feature generation are proposed which generate secondary features using SenticNet resource. Multilabel classification is performed to classify tweets in emotion categories. Results of experiments show effectiveness of hybrid approach.

2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Mukesh Kumar ◽  
Palak Rehan

Social media networks like Twitter, Facebook, WhatsApp etc. are most commonly used medium for sharing news, opinions and to stay in touch with peers. Messages on twitter are limited to 140 characters. This led users to create their own novel syntax in tweets to express more in lesser words. Free writing style, use of URLs, markup syntax, inappropriate punctuations, ungrammatical structures, abbreviations etc. makes it harder to mine useful information from them. For each tweet, we can get an explicit time stamp, the name of the user, the social network the user belongs to, or even the GPS coordinates if the tweet is created with a GPS-enabled mobile device. With these features, Twitter is, in nature, a good resource for detecting and analyzing the real time events happening around the world. By using the speed and coverage of Twitter, we can detect events, a sequence of important keywords being talked, in a timely manner which can be used in different applications like natural calamity relief support, earthquake relief support, product launches, suspicious activity detection etc. The keyword detection process from Twitter can be seen as a two step process: detection of keyword in the raw text form (words as posted by the users) and keyword normalization process (reforming the users’ unstructured words in the complete meaningful English language words). In this paper a keyword detection technique based upon the graph, spanning tree and Page Rank algorithm is proposed. A text normalization technique based upon hybrid approach using Levenshtein distance, demetaphone algorithm and dictionary mapping is proposed to work upon the unstructured keywords as produced by the proposed keyword detector. The proposed normalization technique is validated using the standard lexnorm 1.2 dataset. The proposed system is used to detect the keywords from Twiter text being posted at real time. The detected and normalized keywords are further validated from the search engine results at later time for detection of events.


2019 ◽  
Vol 37 (1) ◽  
pp. 2-15 ◽  
Author(s):  
Sudarsana Desul ◽  
Madurai Meenachi N. ◽  
Thejas Venkatesh ◽  
Vijitha Gunta ◽  
Gowtham R. ◽  
...  

PurposeOntology of a domain mainly consists of a set of concepts and their semantic relations. It is typically constructed and maintained by using ontology editors with substantial human intervention. It is desirable to perform the task automatically, which has led to the development of ontology learning techniques. One of the main challenges of ontology learning from the text is to identify key concepts from the documents. A wide range of techniques for key concept extraction have been proposed but are having the limitations of low accuracy, poor performance, not so flexible and applicability to a specific domain. The propose of this study is to explore a new method to extract key concepts and to apply them to literature in the nuclear domain.Design/methodology/approachIn this article, a novel method for key concept extraction is proposed and applied to the documents from the nuclear domain. A hybrid approach was used, which includes a combination of domain, syntactic name entity knowledge and statistical based methods. The performance of the developed method has been evaluated from the data obtained using two out of three voting logic from three domain experts by using 120 documents retrieved from SCOPUS database.FindingsThe work reported pertains to extracting concepts from the set of selected documents and aids the search for documents relating to given concepts. The results of a case study indicated that the method developed has demonstrated better metrics than Text2Onto and CFinder. The method described has the capability of extracting valid key concepts from a set of candidates with long phrases.Research limitations/implicationsThe present study is restricted to literature coming out in the English language and applied to the documents from nuclear domain. It has the potential to extend to other domains also.Practical implicationsThe work carried out in the current study has the potential of leading to updating International Nuclear Information System thesaurus for ontology in the nuclear domain. This can lead to efficient search methods.Originality/valueThis work is the first attempt to automatically extract key concepts from the nuclear documents. The proposed approach will address and fix the most of the problems that are existed in the current methods and thereby increase the performance.


2018 ◽  
Vol 7 (2.21) ◽  
pp. 319
Author(s):  
Saini Jacob Soman ◽  
P Swaminathan ◽  
R Anandan ◽  
K Kalaivani

With the developed use of online medium these days for sharing views, sentiments and opinions about products, services, organization and people, micro blogging and social networking sites are acquiring a huge popularity. One of the biggest social media sites namely Twitter is used by several people to share their life events, views and opinion about different areas and concepts. Sentiment analysis is the computational research of reviews, opinions, attitudes, views and peoples’ emotions about different products, services, firms and topics through categorizing them as negative and positive emotions. Sentiment analysis of tweets is a challenging task. This paper makes a critical review on the comparison of the challenges associated with sentiment analysis of Tweets in English Language versus Indian Regional Languages. Five Indian languages namely Tamil, Malayalam, Telugu, Hindi and Bengali have been considered in this research and several challenges associated with the analysis of Twitter sentiments in those languages have been identified and conceptualized in the form of a framework in this research through systematic review.  


2021 ◽  
Vol 11 (19) ◽  
pp. 8872
Author(s):  
Iván G. Torre ◽  
Mónica Romero ◽  
Aitor Álvarez

Automatic speech recognition in patients with aphasia is a challenging task for which studies have been published in a few languages. Reasonably, the systems reported in the literature within this field show significantly lower performance than those focused on transcribing non-pathological clean speech. It is mainly due to the difficulty of recognizing a more unintelligible voice, as well as due to the scarcity of annotated aphasic data. This work is mainly focused on applying novel semi-supervised learning methods to the AphasiaBank dataset in order to deal with these two major issues, reporting improvements for the English language and providing the first benchmark for the Spanish language for which less than one hour of transcribed aphasic speech was used for training. In addition, the influence of reinforcing the training and decoding processes with out-of-domain acoustic and text data is described by using different strategies and configurations to fine-tune the hyperparameters and the final recognition systems. The interesting results obtained encourage extending this technological approach to other languages and scenarios where the scarcity of annotated data to train recognition models is a challenging reality.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Alejandra Segura Navarrete ◽  
Claudia Martinez-Araneda ◽  
Christian Vidal-Castro ◽  
Clemente Rubio-Manzano

Purpose This paper aims to describe the process used to create an emotion lexicon enriched with the emotional intensity of words and focuses on improving the emotion analysis process in texts. Design/methodology/approach The process includes setting, preparation and labelling stages. In the first stage, a lexicon is selected. It must include a translation to the target language and labelling according to Plutchik’s eight emotions. The second stage starts with the validation of the translations. Then, it is expanded with the synonyms of the emotion synsets of each word. In the labelling stage, the similarity of words is calculated and displayed using WordNet similarity. Findings The authors’ approach shows better performance to identification of the predominant emotion for the selected corpus. The most relevant is the improvement obtained in the results of the emotion analysis in a hybrid approach compared to the results obtained in a purist approach. Research limitations/implications The proposed lexicon can still be enriched by incorporating elements such as emojis, idioms and colloquial expressions. Practical implications This work is part of a research project that aids in solving problems in a digital society, such as detecting cyberbullying, abusive language and gender violence in texts or exercising parental control. Detection of depressive states in young people and children is added. Originality/value This semi-automatic process can be applied to any language to generate an emotion lexicon. This resource will be available in a software tool that implements a crowdsourcing strategy allowing the intensity to be re-labelled and new words to be automatically incorporated into the lexicon.


Author(s):  
Ilkhom Izatovich Bakaev

The automatic processing of unstructured texts in natural languages is one of the relevant problems of computer analysis and text synthesis. Within this problem, the author singles out a task of text normalization, which usually suggests such processes as tokenization, stemming, and lemmatization. The existing stemming algorithms for the most part are oriented towards the synthetic languages with inflectional morphemes. The Uzbek language represents an example of agglutinative language, characterized by polysemanticity of affixal and auxiliary morphemes. Although the Uzbek language largely differs from, for example, English language, it is successfully processed by stemming algorithms. There are virtually no examples of effective implementation of stemming algorithms for the Uzbek language; therefore, this questions is the subject of scientific interest and defines the goal of this work. In the course of this research, the author solved the task of bringing the given texts in the Uzbek language to normal form, which on the preliminary stage were tokenized and cleared of stop words. To author developed the method of normalization of texts in the Uzbek language based on the stemming algorithm. The development of stemming algorithm employed hybrid approach with application of algorithmic method, lexicon of linguistic rules and database of the normal word forms of the Uzbek language. The precision of the proposed algorithm depends on the precision of tokenization algorithm. At the same time, the article did not explore the question of finding the roots of paired words separated by spaces, as this task is solved at the stage of tokenization. The algorithm can be integrated into various automated systems for machine translation, information extraction, data retrieval, etc.


2013 ◽  
pp. 125-140
Author(s):  
Salomi Papadima-Sophocleous

This study investigated whether and to what extent an English Language Voluntary Intensive Independent Catch-up Study (ELVIICS), a Self-Access Language Learning (SALL) programme, was effective in helping first-year Greek-Cypriot students fill in the gaps in their English language learning and come closer to the required language competence level of the Common European Framework of Reference (CEFR) B1 level. It also examined students’ perceptions of such learning. The students followed the ELVIICS at their own pace, time and space until they felt they had reached the aimed level. Analysis of the achievement test results revealed that students’ language competence improved and reached the required level. Additional quantitative data also revealed that students felt ELVIICS also helped them improve their self-confidence, computer skills and autonomous learning. Moreover, students claimed that ELVIICS assisted them in getting through and successfully completing their compulsory course.


2021 ◽  
Vol 25 (2) ◽  
pp. 169-178
Author(s):  
Changro Lee

Despite the popularity deep learning has been gaining, measuring the uncertainty within the result has not met expectations in many deep learning applications and this includes property valuation. In real-world tasks, however, rather than simply requiring predictions, assurance of the certainty of the predictions is also demanded. In this study, supervised learning is combined with unsupervised learning to bridge this gap. A method based on principal component analysis, a popular tool of unsupervised learning, was developed and used to represent the uncertainty in property valuation. Then, a neural network, a representative algorithm to implement supervised learning, was constructed, and trained to predict land prices. Finally, the uncertainty that was measured using principal component analysis was incorporated into the price predicted by the neural network. This hybrid approach is shown to be likely to improve the credibility of the valuation work. The findings of this study are expected to generate interest in the integration of the two learning approaches, thereby promoting the rapid adoption of deep learning tools in the property valuation industry.


Sign in / Sign up

Export Citation Format

Share Document