Bots Recognition in Social Networks Using the Random Forest Algorithm

Online social networks are of essence, as a tool for communication, for millions of people in their real world. However, online social networks also serve an arena of information war. One tool for infowar is bots, which are thought of as software designed to simulate the real user’s behaviour in online social networks.The paper objective is to develop a model for recognition of bots in online social networks. To develop this model, a machine-learning algorithm “Random Forest” was used. Since implementation of machine-learning algorithms requires the maximum data amount, the Twitter online social network was used to solve the problem of bot recognition. This online social network is regularly used in many studies on the recognition of bots.For learning and testing the Random Forest algorithm, a Twitter account dataset was used, which involved above 3,000 users and over 6,000 bots. While learning and testing the Random Forest algorithm, the optimal hyper-parameters of the algorithm were determined at which the highest value of the F1 metric was reached. As a programming language that allowed the above actions to be implemented, was chosen Python, which is frequently used in solving problems related to machine learning.To compare the developed model with the other authors’ models, testing was based on the two Twitter account datasets, which involved as many as half of bots and half of real users. As a result of testing on these datasets, F1-metrics of 0.973 and 0.923 were obtained. The obtained F1-metric values are quite high as compared with the papers of other authors.As a result, in this paper a model of high accuracy rates was obtained that can recognize bots in the Twitter online social network.

Download Full-text

Machine Learning Algorithms in Fraud Detection: Case Study on Retail Consumer Financing Company

Asia Pacific Fraud Journal ◽

10.21532/apfjournal.v6i2.216 ◽

2021 ◽

Vol 6 (2) ◽

pp. 213

Author(s):

Nadya Intan Mustika ◽

Bagus Nenda ◽

Dona Ramadhan

Keyword(s):

Machine Learning ◽

Random Forest ◽

Historical Data ◽

Learning Algorithm ◽

Fraud Detection ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Random Forest Algorithm ◽

Data Set

This study aims to implement a machine learning algorithm in detecting fraud based on historical data set in a retail consumer financing company. The outcome of machine learning is used as samples for the fraud detection team. Data analysis is performed through data processing, feature selection, hold-on methods, and accuracy testing. There are five machine learning methods applied in this study: Logistic Regression, K-Nearest Neighbor (KNN), Decision Tree, Random Forest, and Support Vector Machine (SVM). Historical data are divided into two groups: training data and test data. The results show that the Random Forest algorithm has the highest accuracy with a training score of 0.994999 and a test score of 0.745437. This means that the Random Forest algorithm is the most accurate method for detecting fraud. Further research is suggested to add more predictor variables to increase the accuracy value and apply this method to different financial institutions and different industries.

Download Full-text

Implementing machine learning in bipolar diagnosis in China

Translational Psychiatry ◽

10.1038/s41398-019-0638-8 ◽

2019 ◽

Vol 9 (1) ◽

Author(s):

Yantao Ma ◽

Jun Ji ◽

Yun Huang ◽

Huimin Gao ◽

Zhiying Li ◽

...

Keyword(s):

Machine Learning ◽

Bipolar Disorder ◽

Random Forest ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Algorithm ◽

Linear Discriminant ◽

Cohort Data ◽

Selection Operator

AbstractBipolar disorder (BPD) is often confused with major depression, and current diagnostic questionnaires are subjective and time intensive. The aim of this study was to develop a new Bipolar Diagnosis Checklist in Chinese (BDCC) by using machine learning to shorten the Affective Disorder Evaluation scale (ADE) based on an analysis of registered Chinese multisite cohort data. In order to evaluate the importance of each item of the ADE, a case-control study of 360 bipolar disorder (BPD) patients, 255 major depressive disorder (MDD) patients and 228 healthy (no psychiatric diagnosis) controls (HCs) was conducted, spanning 9 Chinese health facilities participating in the Comprehensive Assessment and Follow-up Descriptive Study on Bipolar Disorder (CAFÉ-BD). The BDCC was formed by selected items from the ADE according to their importance as calculated by a random forest machine learning algorithm. Five classical machine learning algorithms, namely, a random forest algorithm, support vector regression (SVR), the least absolute shrinkage and selection operator (LASSO), linear discriminant analysis (LDA) and logistic regression, were used to retrospectively analyze the aforementioned cohort data to shorten the ADE. Regarding the area under the receiver operating characteristic (ROC) curve (AUC), the BDCC had high AUCs of 0.948, 0.921, and 0.923 for the diagnosis of MDD, BPD, and HC, respectively, despite containing only 15% (17/113) of the items from the ADE. Traditional scales can be shortened using machine learning analysis. By shortening the ADE using a random forest algorithm, we generated the BDCC, which can be more easily applied in clinical practice to effectively enhance both BPD and MDD diagnosis.

Download Full-text

Random Forest Algorithm to Investigate the Case of Acute Coronary Syndrome

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i2.3000 ◽

2021 ◽

Vol 5 (2) ◽

pp. 369-378

Author(s):

Eka Pandu Cynthia ◽

M. Afif Rizky A. ◽

Alwis Nazir ◽

Fadhilah Syafria

Keyword(s):

Machine Learning ◽

Acute Coronary Syndrome ◽

Random Forest ◽

Data Science ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Random Forest Algorithm ◽

Coronary Syndrome ◽

Use Of Data

This paper explains the use of the Random Forest Algorithm to investigate the Case of Acute Coronary Syndrome (ACS). The objectives of this study are to review the evaluation of the use of data science techniques and machine learning algorithms in creating a model that can classify whether or not cases of acute coronary syndrome occur. The research method used in this study refers to the IBM Foundational Methodology for Data Science, include: i) inventorying dataset about ACS, ii) preprocessing for the data into four sub-processes, i.e. requirements, collection, understanding, and preparation, iii) determination of RFA, i.e. the "n" of the tree which will form a forest and forming trees from the random forest that has been created, and iv) determination of the model evaluation and result in analysis based on Python programming language. Based on the experiments that the learning have been conducted using a random forest machine-learning algorithm with an n-estimator value of 100 and each tree's depth (max depth) with a value of 4, learning scenarios of 70:30, 80:20, and 90:10 on 444 cases of acute coronary syndrome data. The results show that the 70:30 scenario model has the best results, with an accuracy value of 83.45%, a precision value of 85%, and a recall value of 92.4%. Conclusions obtained from the experiment results were evaluated with various statistical metrics (accuracy, precision, and recall) in each learning scenario on 444 cases of acute coronary syndrome data with a cross-validation value of 10 fold.

Download Full-text

How migrants manifest their transnational identity through online social networks: comparative findings from a case of Koreans in Germany

Comparative Migration Studies ◽

10.1186/s40878-020-00218-w ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Sunyoung Park ◽

Lasse Gerrits

Keyword(s):

Social Networks ◽

Social Network ◽

Online Social Networks ◽

Online Social Network ◽

Point Of View ◽

Comparative Case Study ◽

Transnational Identity ◽

Transnational Identities ◽

Web 3.0 ◽

Situational Contexts

AbstractAlthough migration has long been an imperative topic in social sciences, there are still needs of study on migrants’ unique and dynamic transnational identity, which heavily influences the social integration in the host society. In Online Social Network (OSN), where the contemporary migrants actively communicate and share their stories the most, different challenges against migrants’ belonging and identity and how they cope or reconcile may evidently exist. This paper aims to scrutinise how migrants are manifesting their belonging and identity via different technological types of online social networks, to understand the relations between online social networks and migrants’ multi-faceted transnational identity. The research introduces a comparative case study on an online social movement led by Koreans in Germany via their online communities, triggered by a German TV advertisement considered as stereotyping East Asians given by white supremacy’s point of view. Starting with virtual ethnography on three OSNs representing each of internet generations (Web 1.0 ~ Web 3.0), two-step Qualitative Data Analysis is carried out to examine how Korean migrants manifest their belonging and identity via their views on “who we are” and “who are others”. The analysis reveals how Korean migrants’ transnational identities differ by their expectation on the audience and the members in each online social network, which indicates that the distinctive features of the online platform may encourage or discourage them in shaping transnational identity as a group identity. The paper concludes with the two main emphases: first, current OSNs comprising different generational technologies play a significant role in understanding the migrants’ dynamic social values, and particularly, transnational identities. Second, the dynamics of migrants’ transnational identity engages diverse social and situational contexts. (keywords: transnational identity, migrants’ online social networks, stereotyping migrants, technological evolution of online social network).

Download Full-text

Quantifying Virality of Information in Online Social Networks

International Journal of Virtual Communities and Social Networking ◽

10.4018/jvcsn.2012010103 ◽

2012 ◽

Vol 4 (1) ◽

pp. 32-45 ◽

Cited By ~ 1

Author(s):

Abhishek Vaish ◽

Rajiv Krishna G. ◽

Akshay Saxena ◽

Dharmaprakash M. ◽

Utkarsh Goel

Keyword(s):

Social Networks ◽

Social Network ◽

Online Social Networks ◽

Online Social Network ◽

Critical Factors ◽

Asset Valuation ◽

Valuation Technique ◽

Information Item ◽

Viral Nature ◽

Over Time

The aim of this research is to propose a model through which the viral nature of an information item in an online social network can be quantified. Further, the authors propose an alternate technique for information asset valuation by accommodating virality in it which not only complements the existing valuation system, but also improves the accuracy of the results. They use a popularly available YouTube dataset to collect attributes and measure critical factors such as share-count, appreciation, user rating, controversiality, and comment rate. These variables are used with a proposed formula to obtain viral index of each video on a given date. The authors then identify a conventional and a hybrid asset valuation technique to demonstrate how virality can fit in to provide accurate results.The research demonstrates the dependency of virality on critical social network factors. With the help of a second dataset acquired, the authors determine the pattern virality of an information item takes over time.

Download Full-text

Defining Inspection Techniques for Detecting Privacy Problems in Online Social Networks

Journal of Interactive Systems ◽

10.5753/jis.2019.552 ◽

2019 ◽

Vol 10 ◽

pp. 35

Author(s):

Andrey Rodrigues ◽

Natasha M. C. Valentim ◽

Eduardo Feitosa

Keyword(s):

Social Networks ◽

Social Network ◽

Empirical Study ◽

Qualitative Analysis ◽

Online Social Networks ◽

Online Social Network ◽

Effective Alternative ◽

Daily Lives ◽

Study Participants ◽

Inspection Techniques

In the last few years, Online Social Networks (OSN) have experienced growth in the number of users, becoming an increasingly embedded part of people’s daily lives. Privacy expectations of OSNs are higher as more members start realizing potential privacy problems they face by interacting with these systems. Inspection methods can be an effective alternative for addressing privacy problems because they detect possible defects that could be causing the system to behave in an undesirable way. Therefore, we proposed a set of privacy inspection techniques called PIT-OSN (Privacy Inspection Techniques for Online Social Network). This paper presents the description and evolution of PIT-OSN through the results of a preliminary empirical study. We discuss the quantitative and qualitative results and their impact on improving the techniques. Results indicate that our techniques assist non-expert inspectors uncover privacy problems effectively, and are considered easy to use and useful by the study participants. Finally, the qualitative analysis helped us improve some technique steps that might be unclear.

Download Full-text

FLOOD MAPPING USING RANDOM FOREST AND IDENTIFYING THE ESSENTIAL CONDITIONING FACTORS; A CASE STUDY IN FREDERICTON, NEW BRUNSWICK, CANADA

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-3-2020-609-2020 ◽

2020 ◽

Vol V-3-2020 ◽

pp. 609-615 ◽

Cited By ~ 1

Author(s):

M. Esfandiari ◽

S. Jabari ◽

H. McGrath ◽

D. Coleman

Keyword(s):

Machine Learning ◽

Random Forest ◽

New Brunswick ◽

Urban Areas ◽

Learning Algorithm ◽

Satellite Image ◽

Machine Learning Algorithms ◽

Slope Aspect ◽

Flood Peak ◽

Conditioning Factors

Abstract. Flood is one of the most damaging natural hazards in urban areas in many places around the world as well as the city of Fredericton, New Brunswick, Canada. Recently, Fredericton has been flooded in two consecutive years in 2018 and 2019. Due to the complicated behaviour of water when a river overflows its bank, estimating the flood extent is challenging. The issue gets even more challenging when several different factors are affecting the water flow, like the land texture or the surface flatness, with varying degrees of intensity. Recently, machine learning algorithms and statistical methods are being used in many research studies for generating flood susceptibility maps using topographical, hydrological, and geological conditioning factors. One of the major issues that researchers have been facing is the complexity and the number of features required to input in a machine-learning algorithm to produce acceptable results. In this research, we used Random Forest to model the 2018 flood in Fredericton and analyzed the effect of several combinations of 12 different flood conditioning factors. The factors were tested against a Sentinel-2 optical satellite image available around the flood peak day. The highest accuracy was obtained using only 5 factors namely, altitude, slope, aspect, distance from the river, and land-use/cover with 97.57% overall accuracy and 95.14% kappa coefficient.

Download Full-text

Building Online Social Networks to Engage Female Students in Information Systems

International Journal of Web-Based Learning and Teaching Technologies ◽

10.4018/ijwltt.2015100103 ◽

2015 ◽

Vol 10 (4) ◽

pp. 33-51

Author(s):

Jaymeen R. Shah ◽

Hsun-Ming Lee

Keyword(s):

Social Networks ◽

Social Network ◽

Information Systems ◽

Online Social Networks ◽

Social Networking Sites ◽

Role Models ◽

Online Social Network ◽

Research Process ◽

Female Students ◽

The Social

During the next decade, enrollment growth in Information Systems (IS) related majors is unlikely to meet the predicted demand for qualified IS graduates. Gender imbalance in the IS related program makes the situation worse as enrollment and retention of women in the IS major has been proportionately low compared to male. In recent years, majority of high school and college students have integrated social networking sites in their daily life and habitually use these sites. Providing female students access to role models via an online social network may enhance their motivation to continue as an IS major and pursue a career in IS field. For this study, the authors follow the action research process – exploration of information systems development. In particular, a Facebook application was developed to build the social network connecting role models and students. Using the application, a basic framework is tested based on the gender of participants. The results suggest that it is necessary to have adequate number of role models accessible to students as female role-models tend to select fewer students to develop relationships with a preference for female students. Female students likely prefer composite role models from a variety of sources. This pilot study yields valuable lessons to provide informal learning fostered by role modeling via online social networks. The Facebook application may be further expanded to enhance female students' interests in IS related careers.

Download Full-text

Predicting Bank Operational Efficiency Using Machine Learning Algorithm: Comparative Study of Decision Tree, Random Forest, and Neural Networks

Advances in Fuzzy Systems ◽

10.1155/2020/8581202 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Peter Appiahene ◽

Yaw Marfo Missah ◽

Ussiph Najim

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Banking Sector ◽

Banking Industry ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

And Performance

The financial crisis that hit Ghana from 2015 to 2018 has raised various issues with respect to the efficiency of banks and the safety of depositors’ in the banking industry. As part of measures to improve the banking sector and also restore customers’ confidence, efficiency and performance analysis in the banking industry has become a hot issue. This is because stakeholders have to detect the underlying causes of inefficiencies within the banking industry. Nonparametric methods such as Data Envelopment Analysis (DEA) have been suggested in the literature as a good measure of banks’ efficiency and performance. Machine learning algorithms have also been viewed as a good tool to estimate various nonparametric and nonlinear problems. This paper presents a combined DEA with three machine learning approaches in evaluating bank efficiency and performance using 444 Ghanaian bank branches, Decision Making Units (DMUs). The results were compared with the corresponding efficiency ratings obtained from the DEA. Finally, the prediction accuracies of the three machine learning algorithm models were compared. The results suggested that the decision tree (DT) and its C5.0 algorithm provided the best predictive model. It had 100% accuracy in predicting the 134 holdout sample dataset (30% banks) and a P value of 0.00. The DT was followed closely by random forest algorithm with a predictive accuracy of 98.5% and a P value of 0.00 and finally the neural network (86.6% accuracy) with a P value 0.66. The study concluded that banks in Ghana can use the result of this study to predict their respective efficiencies. All experiments were performed within a simulation environment and conducted in R studio using R codes.

Download Full-text

Exploring Adversarial Attacks and Defences for Fake Twitter Account Detection

Technologies ◽

10.3390/technologies8040064 ◽

2020 ◽

Vol 8 (4) ◽

pp. 64

Author(s):

Panagiotis Kantartopoulos ◽

Nikolaos Pitropakis ◽

Alexios Mylonas ◽

Nicolas Kylilis

Keyword(s):

Machine Learning ◽

Social Networks ◽

Social Media ◽

Test Phase ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Fake News ◽

Specific Subject ◽

Learning Techniques ◽

Twitter Account

Social media has become very popular and important in people’s lives, as personal ideas, beliefs and opinions are expressed and shared through them. Unfortunately, social networks, and specifically Twitter, suffer from massive existence and perpetual creation of fake users. Their goal is to deceive other users employing various methods, or even create a stream of fake news and opinions in order to influence an idea upon a specific subject, thus impairing the platform’s integrity. As such, machine learning techniques have been widely used in social networks to address this type of threat by automatically identifying fake accounts. Nonetheless, threat actors update their arsenal and launch a range of sophisticated attacks to undermine this detection procedure, either during the training or test phase, rendering machine learning algorithms vulnerable to adversarial attacks. Our work examines the propagation of adversarial attacks in machine learning based detection for fake Twitter accounts, which is based on AdaBoost. Moreover, we propose and evaluate the use of k-NN as a countermeasure to remedy the effects of the adversarial attacks that we have implemented.

Download Full-text