scholarly journals Enhancement of Text Analysis Using Context-Aware Normalization of Social Media Informal Text

2021 ◽  
Vol 11 (17) ◽  
pp. 8172
Author(s):  
Jebran Khan ◽  
Sungchang Lee

We proposed an application and data variations-independent, generic social media Textual Variations Handler (TVH) to deal with a wide range of noise in textual data generated in various social media (SM) applications for enhanced text analysis. The aim is to build an effective hybrid normalization technique that ensures the use of useful information of the noisy text in its intended form instead of filtering them out to analyze SM text better. The proposed TVH performs context-aware text normalization based on intended meaning to avoid the wrong word substitution. We integrate the TVH with state-of-the-art (SOTA) deep-learning-based text analysis methods to enhance their performance for noisy SM text data. The proposed scheme shows promising improvement in the text analysis of informal SM text in terms of precision, recall, accuracy, and F1-score in simulation.

2019 ◽  
Vol 9 (11) ◽  
pp. 2347 ◽  
Author(s):  
Hannah Kim ◽  
Young-Seob Jeong

As the number of textual data is exponentially increasing, it becomes more important to develop models to analyze the text data automatically. The texts may contain various labels such as gender, age, country, sentiment, and so forth. Using such labels may bring benefits to some industrial fields, so many studies of text classification have appeared. Recently, the Convolutional Neural Network (CNN) has been adopted for the task of text classification and has shown quite successful results. In this paper, we propose convolutional neural networks for the task of sentiment classification. Through experiments with three well-known datasets, we show that employing consecutive convolutional layers is effective for relatively longer texts, and our networks are better than other state-of-the-art deep learning models.


2018 ◽  
Vol 9 (2) ◽  
pp. 111-120
Author(s):  
Argha Roy ◽  
Shyamali Guria ◽  
Suman Halder ◽  
Sayani Banerjee ◽  
Sourav Mandal

Recently, the web has been crowded with growing volumes of various texts on every aspect of human life. It is difficult to rapidly access, analyze, and compose important decisions using efficient methods for raw textual data in the form of social media, blogs, feedback, reviews, etc., which receive textual inputs directly. It proposes an efficient method for summarization of various reviews of tourists on a specific tourist spot towards analyzing their sentiments towards the place. A classification technique automatically arranges documents into predefined categories and a summarization algorithm produces the exact condensed input such that output is most significant concepts of source documents. Finally, sentiment analysis is done in summarized opinion using NLP and text analysis techniques to show overall sentiment about the spot. Therefore, interested tourists can plan to visit the place do not go through all the reviews, rather they go through summarized documents with the overall sentiment about target place.


Author(s):  
Harshala Bhoir ◽  
K. Jayamalini

Visual sentiment analysis is the way to automatically recognize positive and negative emotions from images, videos, graphics, stickers etc. To estimate the polarity of the sentiment evoked by images in terms of positive or negative sentiment, most of the state-of-the-art works exploit the text associated to a social post provided by the user. However, such textual data is typically noisy due to the subjectivity of the user which usually includes text useful to maximize the diffusion of the social post. Proposed system will extract and employ an Objective Text description of images automatically extracted from the visual content rather than the classic Subjective Text provided by the user. The proposed System will extract three views visual view, subjective text view and objective text view of social media image and will give sentiment polarity positive, negative or neutral based on hypothesis table.


Author(s):  
Sushila Sonare ◽  
Megha Kamble

Now-a-days, it is very common that the customers share their thoughts about any product, brand and their experience in social media. The analysts collect these reviews and process it, to extract meaningful information about the product. The beauty of social media is, it’s involved in all the domains. So the analysts got reviews from different social media and platforms for almost all kind of thing. The Sentiment Analysis is applied to predict outcomes for getting useful information, for ex.; like predict the blockbuster for a movie, rating for any new launches and many more. This type of prediction is really helpful for the customer to buy any goods or take any services in this competitive world. This paper is focused on e-commerce website reviews which are normally in text form with some special characters and some symbols (emojis). Each word in this text set got some meaning in terms of context, emotion and prior experience. These characteristics contribute to some of the features of text data for prediction. The objective of this paper is to compile existing research works on text analysis and emotion based analysis. The open issues and challenges of document based sentiment analysis are also discussed. The paper concluded with proposing a new approach of multi class classification. Ternary classification for classes positive, negative and neutral is suggested primarily for product based text and emoji reviews on Twitter social media.


2019 ◽  
Vol 28 (08) ◽  
pp. 1950127 ◽  
Author(s):  
Pei Li ◽  
Ze Deng

Text classification is an important way to handle and organize textual data. Among existing methods of text classification, semi-supervised clustering is a main-stream technique. In the era of ‘Big data’, the current semi-supervised clustering approaches for text classification generally do not apply for excessive costs in scalability and computing performance for massive text data. Aiming at this problem, this study proposes a scalable text classification algorithm for large-scale text collections, namely D-TESC by modifying a state-of-the-art semi-supervised clustering approach for text classification in a centralized fashion (TESC). D-TESC can process the textual data in a distributed manner to meet a great scalability. The experimental results indicate that (1) the D-TESC algorithm has a comparable classification quality with TESC, and (2) outperforms TESC by average 7.2 times by using eight CPU threads in terms of scalability.


2021 ◽  
Author(s):  
Gabriele Scalia ◽  
Chiara Francalanci ◽  
Barbara Pernici

AbstractInformation extracted from social media has proven to be very useful in the domain of emergency management. An important task in emergency management is rapid crisis mapping, which aims to produce timely and reliable maps of affected areas. During an emergency, the volume of emergency-related posts is typically large, but only a small fraction is relevant and help rapid mapping effectively. Furthermore, posts are not useful for mapping purposes unless they are correctly geolocated and, on average, less than 2% of posts are natively georeferenced. This paper presents an algorithm, called CIME, that aims to identify and geolocate emergency-related posts that are relevant for mapping purposes. While native geocoordinates are most often missing, many posts contain geographical references in their metadata, such as texts or links that can be used by CIME to filter and geolocate information. In addition, social media creates a social network and each post can be enhanced with indirect information from the post’s network of relationships with other posts (for example, a retweet can be associated with other geographical references which are useful to geolocate the original tweet). To exploit all this information, CIME uses the concept of context, defined as the information characterizing a post both directly (the post’s metadata) and indirectly (the post’s network of relationships). The algorithm was evaluated on a recent major emergency event demonstrating better performance with respect to the state of the art in terms of total number of geolocated posts, geolocation accuracy and relevance for rapid mapping.


Author(s):  
Sushila Sonare ◽  
◽  
Dr. Megha Kamble ◽  

Now-a-days, it is very common that the customers share their thoughts about any product, brand and their experience in social media. The analysts collect these reviews and process it, to extract meaningful information about the product. The beauty of social media is, it’s involved in all the domains. So the analysts got reviews from different social media and platforms for almost all kind of thing. The Sentiment Analysis is applied to predict outcomes for getting useful information, for ex.; like predict the blockbuster for a movie, rating for any new launches and many more. This type of prediction is really helpful for the customer to buy any goods or take any services in this competitive world. This paper is focused on e-commerce website reviews which are normally in text form with some special characters and some symbols (emojis). Each word in this text set got some meaning in terms of context, emotion and prior experience. These characteristics contribute to some of the features of text data for prediction. The objective of this paper is to compile existing research works on text analysis and emotion based analysis. The open issues and challenges of document based sentiment analysis are also discussed. The paper concluded with proposing a new approach of multi class classification. Ternary classification for classes positive, negative and neutral is suggested primarily for product based text and emoji reviews on Twitter social media.


Author(s):  
Argha Roy ◽  
Shyamali Guria ◽  
Suman Halder ◽  
Sayani Banerjee ◽  
Sourav Mandal

Recently, the web has been crowded with growing volumes of various texts on every aspect of human life. It is difficult to rapidly access, analyze, and compose important decisions using efficient methods for raw textual data in the form of social media, blogs, feedback, reviews, etc., which receive textual inputs directly. It proposes an efficient method for summarization of various reviews of tourists on a specific tourist spot towards analyzing their sentiments towards the place. A classification technique automatically arranges documents into predefined categories and a summarization algorithm produces the exact condensed input such that output is most significant concepts of source documents. Finally, sentiment analysis is done in summarized opinion using NLP and text analysis techniques to show overall sentiment about the spot. Therefore, interested tourists can plan to visit the place do not go through all the reviews, rather they go through summarized documents with the overall sentiment about target place.


Author(s):  
Ziqian Lin ◽  
Jie Feng ◽  
Ziyang Lu ◽  
Yong Li ◽  
Depeng Jin

Crowd flow prediction is of great importance in a wide range of applications from urban planning, traffic control to public safety. It aims to predict the inflow (the traffic of crowds entering a region in a given time interval) and outflow (the traffic of crowds leaving a region for other places) of each region in the city with knowing the historical flow data. In this paper, we propose DeepSTN+, a deep learning-based convolutional model, to predict crowd flows in the metropolis. First, DeepSTN+ employs the ConvPlus structure to model the longrange spatial dependence among crowd flows in different regions. Further, PoI distributions and time factor are combined to express the effect of location attributes to introduce prior knowledge of the crowd movements. Finally, we propose an effective fusion mechanism to stabilize the training process, which further improves the performance. Extensive experimental results based on two real-life datasets demonstrate the superiority of our model, i.e., DeepSTN+ reduces the error of the crowd flow prediction by approximately 8%∼13% compared with the state-of-the-art baselines.


2022 ◽  
Vol 9 ◽  
Author(s):  
Suleman Khan ◽  
Saqib Hakak ◽  
N. Deepa ◽  
B. Prabadevi ◽  
Kapal Dev ◽  
...  

Since its emergence in December 2019, there have been numerous posts and news regarding the COVID-19 pandemic in social media, traditional print, and electronic media. These sources have information from both trusted and non-trusted medical sources. Furthermore, the news from these media are spread rapidly. Spreading a piece of deceptive information may lead to anxiety, unwanted exposure to medical remedies, tricks for digital marketing, and may lead to deadly factors. Therefore, a model for detecting fake news from the news pool is essential. In this work, the dataset which is a fusion of news related to COVID-19 that has been sourced from data from several social media and news sources is used for classification. In the first step, preprocessing is performed on the dataset to remove unwanted text, then tokenization is carried out to extract the tokens from the raw text data collected from various sources. Later, feature selection is performed to avoid the computational overhead incurred in processing all the features in the dataset. The linguistic and sentiment features are extracted for further processing. Finally, several state-of-the-art machine learning algorithms are trained to classify the COVID-19-related dataset. These algorithms are then evaluated using various metrics. The results show that the random forest classifier outperforms the other classifiers with an accuracy of 88.50%.


Sign in / Sign up

Export Citation Format

Share Document