scholarly journals Leveraging writing systems changes for deep learning based Chinese affective analysis

2019 ◽  
Vol 10 (11) ◽  
pp. 3313-3325
Author(s):  
Rong Xiang ◽  
Qin Lu ◽  
Ying Jiao ◽  
Yufei Zheng ◽  
Wenhao Ying ◽  
...  

Abstract Affective analysis of social media text is in great demand. Online text written in Chinese communities often contains mixed scripts including major text written in Chinese, an ideograph-based writing system, and minor text using Latin letters, an alphabet-based writing system. This phenomenon is referred to as writing systems changes (WSCs). Past studies have shown that WSCs often reflect unfiltered immediate affections. However, the use of WSCs poses more challenges in Natural Language Processing tasks because WSCs can break the syntax of the major text. In this work, we present our work to use WSCs as an effective feature in a hybrid deep learning model with attention network. The WSCs scripts are first identified by their encoding range. Then, the document representation of the text is learned through a Long Short-Term Memory model and the minor text is learned by a separate Convolution Neural Network model. To further highlight the WSCs components, an attention mechanism is adopted to re-weight the feature vector before the classification layer. Experiments show that the proposed hybrid deep learning method which better incorporates WSCs features can further improve performance compared to the state-of-the-art classification models. The experimental result indicates that WSCs can serve as effective information in affective analysis of the social media text.

2020 ◽  
Vol 29 (05) ◽  
pp. 2050014
Author(s):  
Anupam Jamatia ◽  
Steve Durairaj Swamy ◽  
Björn Gambäck ◽  
Amitava Das ◽  
Swapan Debbarma

Sentiment analysis is a circumstantial analysis of text, identifying the social sentiment to better understand the source material. The article addresses sentiment analysis of an English-Hindi and English-Bengali code-mixed textual corpus collected from social media. Code-mixing is an amalgamation of multiple languages, which previously mainly was associated with spoken language. However, social media users also deploy it to communicate in ways that tend to be somewhat casual. The coarse nature of social media text poses challenges for many language processing applications. Here, the focus is on the low predictive nature of traditional machine learners when compared to Deep Learning counterparts, including the contextual language representation model BERT (Bidirectional Encoder Representations from Transformers), on the task of extracting user sentiment from code-mixed texts. Three deep learners (a BiLSTM CNN, a Double BiLSTM and an Attention-based model) attained accuracy 20–60% greater than traditional approaches on code-mixed data, and were for comparison also tested on monolingual English data.


Author(s):  
Pushkar Dubey

Social networks are the main resources to gather information about people’s opinion towards different topics as they spend hours daily on social media and share their opinion. Twitter is one of the social media that is gaining popularity. Twitter offers organizations a fast and effective way to analyze customers’ perspectives toward the critical to success in the market place. Developing a program for sentiment analysis is an approach to be used to computationally measure customers’ perceptions. .We use natural language processing and machine learning concepts to create a model for analysis . In this paper we are discussing how we can create a model for analysis of twittes which is trained by various nlp , machine learning and Deep learning Approach.


Information ◽  
2020 ◽  
Vol 11 (6) ◽  
pp. 312 ◽  
Author(s):  
Asma Baccouche ◽  
Sadaf Ahmed ◽  
Daniel Sierra-Sosa ◽  
Adel Elmaghraby

Identifying internet spam has been a challenging problem for decades. Several solutions have succeeded to detect spam comments in social media or fraudulent emails. However, an adequate strategy for filtering messages is difficult to achieve, as these messages resemble real communications. From the Natural Language Processing (NLP) perspective, Deep Learning models are a good alternative for classifying text after being preprocessed. In particular, Long Short-Term Memory (LSTM) networks are one of the models that perform well for the binary and multi-label text classification problems. In this paper, an approach merging two different data sources, one intended for Spam in social media posts and the other for Fraud classification in emails, is presented. We designed a multi-label LSTM model and trained it on the joint datasets including text with common bigrams, extracted from each independent dataset. The experiment results show that our proposed model is capable of identifying malicious text regardless of the source. The LSTM model trained with the merged dataset outperforms the models trained independently on each dataset.


2020 ◽  
Vol 17 (6) ◽  
pp. 935-946
Author(s):  
Jihene Younes ◽  
Hadhemi Achour ◽  
Emna Souissi ◽  
Ahmed Ferchichi

Language identification is an important task in natural language processing that consists in determining the language of a given text. It has increasingly picked the interest of researchers for the past few years, especially for code-switching informal textual content. In this paper, we focus on the identification of the Romanized user-generated Tunisian dialect on the social web. We segment and annotate a corpus extracted from social media and propose a deep learning approach for the identification task. We use a Bidirectional Long Short-Term Memory neural network with Conditional Random Fields decoding (BLSTM-CRF). For word embeddings, we combine word-character BLSTM vector representation and Fast Text embeddings that takes into consideration character n-gram features. The overall accuracy obtained is 98.65%.


Author(s):  
Gauri Jain ◽  
Manisha Sharma ◽  
Basant Agarwal

This article describes how spam detection in the social media text is becoming increasing important because of the exponential increase in the spam volume over the network. It is challenging, especially in case of text within the limited number of characters. Effective spam detection requires more number of efficient features to be learned. In the current article, the use of a deep learning technology known as a convolutional neural network (CNN) is proposed for spam detection with an added semantic layer on the top of it. The resultant model is known as a semantic convolutional neural network (SCNN). A semantic layer is composed of training the random word vectors with the help of Word2vec to get the semantically enriched word embedding. WordNet and ConceptNet are used to find the word similar to a given word, in case it is missing in the word2vec. The architecture is evaluated on two corpora: SMS Spam dataset (UCI repository) and Twitter dataset (Tweets scrapped from public live tweets). The authors' approach outperforms the-state-of-the-art results with 98.65% accuracy on SMS spam dataset and 94.40% accuracy on Twitter dataset.


10.29007/dlff ◽  
2019 ◽  
Author(s):  
Alena Fenogenova ◽  
Viktor Kazorin ◽  
Ilia Karpov ◽  
Tatyana Krylova

Automatic morphological analysis is one of the fundamental and significant tasks of NLP (Natural Language Processing). Due to special features of Internet texts, as they can be both normative texts (news, fiction, nonfiction) and less formal texts (such as blogs and texts from social networks), the morphological tagging has become non-trivial and an actual task. In this paper we describe our experiments in tagging of Internet texts presenting our approach based on deep learning. The new social media test set was created, that allows to compare our system with state-of-the-art open source analyzers on the social media texts material.


2019 ◽  
Vol 28 (3) ◽  
pp. 399-408 ◽  
Author(s):  
Anupam Jamatia ◽  
Amitava Das ◽  
Björn Gambäck

Abstract This article addresses language identification at the word level in Indian social media corpora taken from Facebook, Twitter and WhatsApp posts that exhibit code-mixing between English-Hindi, English-Bengali, as well as a blend of both language pairs. Code-mixing is a fusion of multiple languages previously mainly associated with spoken language, but which social media users also deploy when communicating in ways that tend to be rather casual. The coarse nature of code-mixed social media text makes language identification challenging. Here, the performance of deep learning on this task is compared to feature-based learning, with two Recursive Neural Network techniques, Long Short Term Memory (LSTM) and bidirectional LSTM, being contrasted to a Conditional Random Fields (CRF) classifier. The results show the deep learners outscoring the CRF, with the bidirectional LSTM demonstrating the best language identification performance.


Author(s):  
Anto Arockia Rosaline R. ◽  
Parvathi R.

Text analytics is the process of extracting high quality information from the text. A set of statistical, linguistic, and machine learning techniques are used to represent the information content from various textual sources such as data analysis, research, or investigation. Text is the common way of communication in social media. The understanding of text includes a variety of tasks including text classification, slang, and other languages. Traditional Natural Language Processing (NLP) techniques require extensive pre-processing techniques to handle the text. When a word “Amazon” occurs in the social media text, there should be a meaningful approach to find out whether it is referring to forest or Kindle. Most of the time, the NLP techniques fail in handling the slang and spellings correctly. Messages in Twitter are so short such that it is difficult to build semantic connections between them. Some messages such as “Gud nite” actually do not contain any real words but are still used for communication.


Author(s):  
Sarojini Yarramsetti ◽  
Anvar Shathik J ◽  
Renisha. P.S.

In this digital world, experience sharing, knowledge exploration, taught posting and other related social exploitations are common to every individual as well as social media/network such as FaceBook, Twitter, etc plays a vital role in such kinds of activities. In general, many social network based sentimental feature extraction details and logics are available as well as many researchers work on that domain for last few years. But all those research specification are narrowed in the sense of building a way for estimating the opinions and sentiments with respect to the tweets and posts the user raised on the social network or any other related web interfacing medium. Many social network schemes provides an ability to the users to push the voice tweets and voice messages, so that the voice messages may contain some harmful as well as normal and important contents. In this paper, a new methodology is designed called Intensive Deep Learning based Voice Estimation Principle (IDLVEP), in which it is used to identify the voice message content and extract the features based on the Natural Language Processing (NLP) logic. The association of such Deep Learning and Natural Language Processing provides an efficient approach to build the powerful data processing model to identify the sentimental features from the social networking medium. This hybrid logic provides support for both text based and voice based tweet sentimental feature estimations. The Natural Language Processing principles assists the proposed approach of IDLVEP to extracts the voice content from the input message and provides a raw text content, based on that the deep learning principles classify the messages with respect to the estimation of harmful or normal tweets. The tweets raised by the user are initially sub-divided into two categories such as voice tweets and text tweets. The voice tweets will be taken care by the NLP principles and the text enabled tweets will be handled by means of deep learning principles, in which the voice tweets are also extracted and taken care by the deep learning principle only. The social network has two different faces such as provides support to developments as well as the same it provides a way to access that for harmful things. So, that this approach of IDLVEP identifies the harmful contents from the user tweets and remove that in an intelligent manner by using the proposed approach classification strategies. This paper concentrates on identifying the sentimental features from the user tweets and provides the harm free social network environment to the society.


2021 ◽  
Vol 2021 (1) ◽  
Author(s):  
Yanyang Guo ◽  
Hanzhou Wu ◽  
Xinpeng Zhang

AbstractSocial media plays an increasingly important role in providing information and social support to users. Due to the easy dissemination of content, as well as difficulty to track on the social network, we are motivated to study the way of concealing sensitive messages in this channel with high confidentiality. In this paper, we design a steganographic visual stories generation model that enables users to automatically post stego status on social media without any direct user intervention and use the mutual-perceived joint attention (MPJA) to maintain the imperceptibility of stego text. We demonstrate our approach on the visual storytelling (VIST) dataset and show that it yields high-quality steganographic texts. Since the proposed work realizes steganography by auto-generating visual story using deep learning, it enables us to move steganography to the real-world online social networks with intelligent steganographic bots.


Sign in / Sign up

Export Citation Format

Share Document