An Experimental Study of Spammer Detection on Chinese Microblogs

With the development of Web 2.0, social media such as Twitter and Sina Weibo have become an essential platform for disseminating hot events. Simultaneously, due to the free policy of microblogging services, users can post user-generated content freely on microblogging platforms. Accordingly, more and more hot events on microblogging platforms have been labeled as spammers. Spammers will not only hurt the healthy development of social media but also introduce many economic and social problems. Therefore, the government and enterprises must distinguish whether a hot event on microblogging platforms is a spammer or is a naturally-developing event. In this paper, we focus on the hot event list on Sina Weibo and collect the relevant microblogs of each hot event to study the detecting methods of spammers. Notably, we develop an integral feature set consisting of user profile, user behavior, and user relationships to reflect various factors affecting the detection of spammers. Then, we employ typical machine learning methods to conduct extensive experiments on detecting spammers. We use a real data set crawled from the most prominent Chinese microblogging platform, Sina Weibo, and evaluate the performance of 10 machine learning models with five sampling methods. The results in terms of various metrics show that the Random Forest model and the over-sampling method achieve the best accuracy in detecting spammers and non-spammers.

Download Full-text

Exploring fake news identification using word and sentence embeddings

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189865 ◽

2021 ◽

pp. 1-8

Author(s):

V.T Priyanga ◽

J.P Sanjanasri ◽

Vijay Krishna Menon ◽

E.A Gopalakrishnan ◽

K.P Soman

Keyword(s):

Machine Learning ◽

Social Media ◽

Network Analysis ◽

Supervised Machine Learning ◽

Breeding Ground ◽

Fake News ◽

Data Set ◽

Highly Correlated ◽

Use Of Social Media ◽

The Liar

The widespread use of social media like Facebook, Twitter, Whatsapp, etc. has changed the way News is created and published; accessing news has become easy and inexpensive. However, the scale of usage and inability to moderate the content has made social media, a breeding ground for the circulation of fake news. Fake news is deliberately created either to increase the readership or disrupt the order in the society for political and commercial benefits. It is of paramount importance to identify and filter out fake news especially in democratic societies. Most existing methods for detecting fake news involve traditional supervised machine learning which has been quite ineffective. In this paper, we are analyzing word embedding features that can tell apart fake news from true news. We use the LIAR and ISOT data set. We churn out highly correlated news data from the entire data set by using cosine similarity and other such metrices, in order to distinguish their domains based on central topics. We then employ auto-encoders to detect and differentiate between true and fake news while also exploring their separability through network analysis.

Download Full-text

Intelligent Detection of False Information in Arabic Tweets Utilizing Hybrid Harris Hawks Based Feature Selection and Machine Learning Models

Symmetry ◽

10.3390/sym13040556 ◽

2021 ◽

Vol 13 (4) ◽

pp. 556

Author(s):

Thaer Thaher ◽

Mahmoud Saheb ◽

Hamza Turabieh ◽

Hamouda Chantar

Keyword(s):

Machine Learning ◽

Social Media ◽

Feature Selection ◽

Language Processing ◽

User Profile ◽

Vital Role ◽

Classification Model ◽

Fake News ◽

False Information ◽

Social Media Platforms

Fake or false information on social media platforms is a significant challenge that leads to deliberately misleading users due to the inclusion of rumors, propaganda, or deceptive information about a person, organization, or service. Twitter is one of the most widely used social media platforms, especially in the Arab region, where the number of users is steadily increasing, accompanied by an increase in the rate of fake news. This drew the attention of researchers to provide a safe online environment free of misleading information. This paper aims to propose a smart classification model for the early detection of fake news in Arabic tweets utilizing Natural Language Processing (NLP) techniques, Machine Learning (ML) models, and Harris Hawks Optimizer (HHO) as a wrapper-based feature selection approach. Arabic Twitter corpus composed of 1862 previously annotated tweets was utilized by this research to assess the efficiency of the proposed model. The Bag of Words (BoW) model is utilized using different term-weighting schemes for feature extraction. Eight well-known learning algorithms are investigated with varying combinations of features, including user-profile, content-based, and words-features. Reported results showed that the Logistic Regression (LR) with Term Frequency-Inverse Document Frequency (TF-IDF) model scores the best rank. Moreover, feature selection based on the binary HHO algorithm plays a vital role in reducing dimensionality, thereby enhancing the learning model’s performance for fake news detection. Interestingly, the proposed BHHO-LR model can yield a better enhancement of 5% compared with previous works on the same dataset.

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text

IMAGE BASED RECOGNITION OF DYNAMIC TRAFFIC SITUATIONS BY EVALUATING THE EXTERIOR SURROUNDING AND INTERIOR SPACE OF VEHICLES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xl-3-w3-161-2015 ◽

2015 ◽

Vol XL-3/W3 ◽

pp. 161-168

Author(s):

A. Hanel ◽

H. Klöden ◽

L. Hoegner ◽

U. Stilla

Keyword(s):

Machine Learning ◽

Real Data ◽

Traffic Situation ◽

Dynamic Traffic ◽

Interior Space ◽

Data Set ◽

Road Users ◽

Vehicle Fleet ◽

New Strategies

Today, cameras mounted in vehicles are used to observe the driver as well as the objects around a vehicle. In this article, an outline of a concept for image based recognition of dynamic traffic situations is shown. A dynamic traffic situation will be described by road users and their intentions. Images will be taken by a vehicle fleet and aggregated on a server. On these images, new strategies for machine learning will be applied iteratively when new data has arrived on the server. The results of the learning process will be models describing the traffic situation and will be transmitted back to the recording vehicles. The recognition will be performed as a standalone function in the vehicles and will use the received models. It can be expected, that this method can make the detection and classification of objects around the vehicles more reliable. In addition, the prediction of their actions for the next seconds should be possible. As one example how this concept is used, a method to recognize the illumination situation of a traffic scene is described. This allows to handle different appearances of objects depending on the illumination of the scene. Different illumination classes will be defined to distinguish different illumination situations. Intensity based features are extracted from the images and used by a classifier to assign an image to an illumination class. This method is being tested for a real data set of daytime and nighttime images. It can be shown, that the illumination class can be classified correctly for more than 80% of the images.

Download Full-text

Identifying the factors affecting in the Intentions to continue using of the government social media

Journal of Economics and Administrative Sciences ◽

10.33095/jeas.v23i98.273 ◽

2017 ◽

Vol 23 (98) ◽

pp. 89

Author(s):

عبد العظيم دريفش جبار ◽

قاسم متعب جلود

Keyword(s):

Social Media ◽

Factors Affecting ◽

The Government

المستخلص: تسارعت وتيرة إنشاء منظمات القطاع العام في العالم لمواقع خاصة بها لتكون نافذتها تطل من خلالها على جمهورها، وتترجم بواسطتها ما يجول في خاطرهم وتحويلها الى أفعال تلبي احتياجاتهم المتغيرة على الدوام ولم تكن المنظمات العراقية خارج المألوف. لذلك، أصبح البحث في اشباع هذه الاحتياجات ضرورة ملحة لمسايرة تلك الاحتياجات ولضـمان تفاعلهم معها الذي يعد حيويا لاستمرار رخاء مواقع المنظمات المعنية وازدهارها. ولذلك، جاء البحث الحالي محاولا الخوض في هذه الناحية مرتكزا على نظرية الاشباعات والاستخدامات واطار الفرد المثير الاستجابة. ولغرض الوصول الى مبتغاهما، طور الباحثان استبانة مصممة لهذا الغرض انتخبت عينة البحث ابعادها المستقلة الخمس التي تشكل الاشباعات المرغوبة من المواقع المعنية، واستمد المتغيرين الوسيطين من ادب أنظمة المعلومات، ورسم المتغير التابع ليلبي متطلبات البحث. وتضمنت ثمان ابعاد فسرتها ست وعشرين فقرة، استخدم مقياس خماسي الابعاد واختبرت صدقها وثباتها على وفق ما هو معمول به. كشفت النتائج عن قدرة المقياس على أداء المهمة المصمم لها وبينت ان عينة البحث التي تكونت من 152 طالبا من طلبة كلية الإدارة والاقتصاد في جامعة ذي قار ان زيارتهم لمواقع التواصل الاجتماعي الحكومية المنظمات الحكومية للحصول على المعلومات واستهلاك محتواها وبفعل تأثير الشبكة والتفاعل الاجتماعي اقترنت بخبرة التدفق، وان العوامل متقدمة الذكر مضاف اليها التعبير عن الذات مستبعدا منها التفاعل الاجتماعي تلازم الشعور بالانتماء. وأوضحت النتائج أيضا ان جميع المتغيرات المستقلة التي تمثل عوامل الإشباع والوسيطة التي تمثل الحالة الداخلية أثرت معنويا في المتغير التابع (النوايا) باستثناء التعبير عن الذات. اختتم البحث بأهم الاستنتاجات التي افرزتها القراءة النظرية وما استمد من الجوانب الميدانية، وأوصى المهتمين بإدارة أنظمة المعلومات والمستخدمين بما يساعد على توجيه مواقع التواصل الاجتماعي الحكومية وجعلها مثار استقطاب المستخدمين الفاعلين لضمان رخائها.

Download Full-text

Demographic and socio-economic factors affecting birth preparation and complication readiness (BPCR) practices in Nepal

Nepal Population Journal ◽

10.3126/npj.v18i17.26374 ◽

2018 ◽

Vol 18 (17) ◽

pp. 23-32

Author(s):

Sunil Kumar Acharya

Keyword(s):

Economic Status ◽

Economic Factors ◽

Data Set ◽

Complication Readiness ◽

Factors Affecting ◽

Logistics Regression ◽

Analysis Technique ◽

The Government ◽

Status Of Women ◽

Socio Economic Factors

BPCR practices by women in Nepal are still low. Still a relatively high percentage of women do not make BPCR to its fullest extent. Researches in developing countries show that various demographic, social and economic factors influence the BPCR practices by pregnant women. This paper examines the likelihood of BPCR practices based on women’s demographic, social and economic status in Nepal. NDHS 2011 data set has been utilized by applying bivariate logistics regression analysis technique to examine the effects of these variables on BPCR practices in Nepal. The analysis result shows high variations and gaps in BPCR practice based on demographic, social and economic status of women. Against this finding the study recommends implementation of appropriate policy and program measures by the government and other agencies to address the existing variations and gaps in BPCR practices among subgroups of women in Nepal. Further research studies focusing on the existing barriers on BPCR practice need to be conducted in Nepal especially among the women who are disadvantaged and marginalized.

Download Full-text

Quantitative Methods for Analyzing Intimate Partner Violence in Microblogs: Observational Study

Journal of Medical Internet Research ◽

10.2196/15347 ◽

2020 ◽

Vol 22 (11) ◽

pp. e15347

Author(s):

Christopher Michael Homan ◽

J Nicolas Schrading ◽

Raymond W Ptucha ◽

Catherine Cerulli ◽

Cecilia Ovesdotter Alm

Keyword(s):

Machine Learning ◽

Social Media ◽

Intimate Partner Violence ◽

Language Processing ◽

Partner Violence ◽

Quantitative Methods ◽

Intimate Partner ◽

Support Vector ◽

Data Set ◽

Part Of Speech

Background Social media is a rich, virtually untapped source of data on the dynamics of intimate partner violence, one that is both global in scale and intimate in detail. Objective The aim of this study is to use machine learning and other computational methods to analyze social media data for the reasons victims give for staying in or leaving abusive relationships. Methods Human annotation, part-of-speech tagging, and machine learning predictive models, including support vector machines, were used on a Twitter data set of 8767 #WhyIStayed and #WhyILeft tweets each. Results Our methods explored whether we can analyze micronarratives that include details about victims, abusers, and other stakeholders, the actions that constitute abuse, and how the stakeholders respond. Conclusions Our findings are consistent across various machine learning methods, which correspond to observations in the clinical literature, and affirm the relevance of natural language processing and machine learning for exploring issues of societal importance in social media.

Download Full-text

Analysis on Present Mathematical Model for Predicting the Crop Production

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l7946.1091220 ◽

2020 ◽

Vol 9 (12) ◽

pp. 168-170

Keyword(s):

Mathematical Model ◽

Mathematical Models ◽

Real World ◽

Crop Production ◽

Ghg Emissions ◽

Data Set ◽

Factors Affecting ◽

The Government ◽

The Mathematical Model ◽

Government Website

India is a worldwide agriculture business powerhouse. Future of agriculture-based products depends on the crop production. A mathematical model might be characterized as a lot of equations that speak to the conduct of a framework. By using mathematical model in agriculture field, we can predict the production of crop in particular area. There are various factors affecting crops such as Rainfall, GHG Emissions, Temperature, Urbanization, climate, humidity etc. A mathematical model is a simplified representation of a real-world system. It forms the system using mathematical principles in the form of a condition or a set of conditions. Suppose we need to increase the crop production, at that time the mathematical model plays a major role and our work can be easier, more significant by using the mathematical model. Through the mathematical model we predict the crop production in upcoming years. .AI, ML, IOT play a major role to predict the future of agriculture, but without mathematical models it is not possible to predict crop production accurately. To solve the real-world agriculture problem, mathematical models play a major role for accurate results. Correlation Analysis, Multiple Regression analysis and fuzzy logic simulation standards have been utilized for building a grain production benefit depending model from crop production. Prediction of crop is beneficiary to the farmer to analyze the crop management. By using the present agriculture data set which is available on the government website, we can build a mathematical model.

Download Full-text

Concerns Expressed by Chinese Social Media Users During the COVID-19 Pandemic: Content Analysis of Sina Weibo Microblogging Data (Preprint)

10.2196/preprints.22152 ◽

2020 ◽

Author(s):

Junze Wang ◽

Ying Zhou ◽

Wei Zhang ◽

Richard Evans ◽

Chengyan Zhu

Keyword(s):

Social Media ◽

Latent Dirichlet Allocation ◽

Vaccine Development ◽

User Behavior ◽

Chinese Government ◽

Web Crawler ◽

Health Crisis ◽

User Behavior Analysis ◽

Sina Weibo ◽

Social Media Platforms

BACKGROUND The COVID-19 pandemic has created a global health crisis that is affecting economies and societies worldwide. During times of uncertainty and unexpected change, people have turned to social media platforms as communication tools and primary information sources. Platforms such as Twitter and Sina Weibo have allowed communities to share discussion and emotional support; they also play important roles for individuals, governments, and organizations in exchanging information and expressing opinions. However, research that studies the main concerns expressed by social media users during the pandemic is limited. OBJECTIVE The aim of this study was to examine the main concerns raised and discussed by citizens on Sina Weibo, the largest social media platform in China, during the COVID-19 pandemic. METHODS We used a web crawler tool and a set of predefined search terms (New Coronavirus Pneumonia, New Coronavirus, and COVID-19) to investigate concerns raised by Sina Weibo users. Textual information and metadata (number of likes, comments, retweets, publishing time, and publishing location) of microblog posts published between December 1, 2019, and July 32, 2020, were collected. After segmenting the words of the collected text, we used a topic modeling technique, latent Dirichlet allocation (LDA), to identify the most common topics posted by users. We analyzed the emotional tendencies of the topics, calculated the proportional distribution of the topics, performed user behavior analysis on the topics using data collected from the number of likes, comments, and retweets, and studied the changes in user concerns and differences in participation between citizens living in different regions of mainland China. RESULTS Based on the 203,191 eligible microblog posts collected, we identified 17 topics and grouped them into 8 themes. These topics were pandemic statistics, domestic epidemic, epidemics in other countries worldwide, COVID-19 treatments, medical resources, economic shock, quarantine and investigation, patients’ outcry for help, work and production resumption, psychological influence, joint prevention and control, material donation, epidemics in neighboring countries, vaccine development, fueling and saluting antiepidemic action, detection, and study resumption. The mean sentiment was positive for 11 topics and negative for 6 topics. The topic with the highest mean of retweets was domestic epidemic, while the topic with the highest mean of likes was quarantine and investigation. CONCLUSIONS Concerns expressed by social media users are highly correlated with the evolution of the global pandemic. During the COVID-19 pandemic, social media has provided a platform for Chinese government departments and organizations to better understand public concerns and demands. Similarly, social media has provided channels to disseminate information about epidemic prevention and has influenced public attitudes and behaviors. Government departments, especially those related to health, can create appropriate policies in a timely manner through monitoring social media platforms to guide public opinion and behavior during epidemics.

Download Full-text

Graph-Based Semi-Supervised Learning With Big Data

Cognitive Analytics ◽

10.4018/978-1-7998-2460-2.ch012 ◽

2020 ◽

pp. 214-244

Author(s):

Prithish Banerjee ◽

Mark Vere Culp ◽

Kenneth Jospeh Ryan ◽

George Michailidis

Keyword(s):

Machine Learning ◽

Big Data ◽

Supervised Learning ◽

Prior Knowledge ◽

Linear Algebra ◽

Real Data ◽

Data Set ◽

Regression Problems ◽

Classification And Regression ◽

Empirical Demonstration

This chapter presents some popular graph-based semi-supervised approaches. These techniques apply to classification and regression problems and can be extended to big data problems using recently developed anchor graph enhancements. The background necessary for understanding this Chapter includes linear algebra and optimization. No prior knowledge in methods of machine learning is necessary. An empirical demonstration of the techniques for these methods is also provided on real data set benchmarks.

Download Full-text