Homophily-Based Link Prediction in The Facebook Online Social Network: A Rough Sets Approach

AbstractOnline social networks are highly dynamic and sparse. One of the main problems in analyzing these networks is the problem of predicting the existence of links between users on these networks: the link prediction problem. Many studies have been conducted to predict links using a variety of techniques like the decision tree and the logistic regression approaches. In this work, we will illustrate the use of the rough set theory in predicting links over the Facebook social network based on homophilic features. Other supervised learning algorithms are also employed in our experiments and compared with the rough set classifier, such as naive Bayes, J48 decision tree, support vector machine, logistic regression, and multilayer perceptron neural network. Moreover, we studied the influence of the “common groups” and “common page likes” homophilic features on predicting friendship between users of Facebook, and also studied the effect of using the Jaccard coefficient in measuring the similarity between users’ homophilic attributes compared with using the overlap coefficient. We conducted our experiments on two different datasets obtained from the Facebook online social network, where users in each dataset live within the same geographical region. The results showed that the rough set classifier significantly outperformed the other classifiers in all experiments. The results also demonstrated that the common groups and the common page likes features have a significant influence on predicting the friendship between users of Facebook. Finally, the results revealed that using the overlap coefficient homophilic features provided better results than that of the Jaccard coefficient features.

Download Full-text

Anchor Link Prediction in Online Social Network Using Graph Embedding and Binary Classification

Computational Collective Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-030-63007-2_18 ◽

2020 ◽

pp. 229-240

Author(s):

Vang V. Le ◽

Tin T. Tran ◽

Phuong N. H. Pham ◽

Vaclav Snasel

Keyword(s):

Social Network ◽

Link Prediction ◽

Binary Classification ◽

Online Social Network ◽

Graph Embedding

Download Full-text

A Novel Method to Dynamically Fix Threshold for Node Neighbourhood Based Link Prediction Techniques

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2020010102 ◽

2020 ◽

Vol 15 (1) ◽

pp. 17-34

Author(s):

Anand Kumar Gupta ◽

Neetu Sardana

Keyword(s):

Social Network ◽

Link Prediction ◽

Binary Classification ◽

Online Social Network ◽

Threshold Value ◽

Classification Problem ◽

Similarity Score ◽

The Social ◽

Comparative Results ◽

Prediction Techniques

The objective of an online social network is to amplify the stream of information among the users. This goal can be accomplished by maximizing interconnectivity among users using link prediction techniques. Existing link prediction techniques uses varied heuristics such as similarity score to predict possible connections. Link prediction can be considered a binary classification problem where probable class outcomes are presence and absence of connections. One of the challenges in classification is to decide threshold value. Since the social network is exceptionally dynamic in nature and each user possess different features, it is difficult to choose a static, common threshold which decides whether two non-connected users will form interconnectivity. This article proposes a novel technique, FIXT, that dynamically decides the threshold value for predicting the possibility of new link formation. The article evaluates the performance of FIXT with six baseline techniques. The comparative results depict that FIXT achieves accuracy up to 93% and outperforms baseline techniques.

Download Full-text

Comparison of Support Vector Machine, Bayesian Logistic Regression, and Alternating Decision Tree Algorithms for Shallow Landslide Susceptibility Mapping along a Mountainous Road in the West of Iran

Applied Sciences ◽

10.3390/app10155047 ◽

2020 ◽

Vol 10 (15) ◽

pp. 5047 ◽

Cited By ~ 7

Author(s):

Viet-Ha Nhu ◽

Danesh Zandi ◽

Himan Shahabi ◽

Kamran Chapi ◽

Ataollah Shirzadi ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Logistic Regression ◽

Decision Tree ◽

Shallow Landslide ◽

Machine Learning Algorithms ◽

Support Vector ◽

Svm Algorithm ◽

Alternating Decision Tree ◽

Bayesian Logistic Regression

This paper aims to apply and compare the performance of the three machine learning algorithms–support vector machine (SVM), bayesian logistic regression (BLR), and alternating decision tree (ADTree)–to map landslide susceptibility along the mountainous road of the Salavat Abad saddle, Kurdistan province, Iran. We identified 66 shallow landslide locations, based on field surveys, by recording the locations of the landslides by a global position System (GPS), Google Earth imagery and black-and-white aerial photographs (scale 1: 20,000) and 19 landslide conditioning factors, then tested these factors using the information gain ratio (IGR) technique. We checked the validity of the models using statistical metrics, including sensitivity, specificity, accuracy, kappa, root mean square error (RMSE), and area under the receiver operating characteristic curve (AUC). We found that, although all three machine learning algorithms yielded excellent performance, the SVM algorithm (AUC = 0.984) slightly outperformed the BLR (AUC = 0.980), and ADTree (AUC = 0.977) algorithms. We observed that not only all three algorithms are useful and effective tools for identifying shallow landslide-prone areas but also the BLR algorithm can be used such as the SVM algorithm as a soft computing benchmark algorithm to check the performance of the models in future.

Download Full-text

Performance of Logistic Regression and Support Vector Machines for Seismic Vulnerability Assessment and Mapping: A Case Study of the 12 September 2016 ML5.8 Gyeongju Earthquake, South Korea

Sustainability ◽

10.3390/su11247038 ◽

2019 ◽

Vol 11 (24) ◽

pp. 7038 ◽

Cited By ~ 4

Author(s):

Jihye Han ◽

Soyoung Park ◽

Seongheon Kim ◽

Sanghun Son ◽

Seonghyeok Lee ◽

...

Keyword(s):

Logistic Regression ◽

South Korea ◽

Vulnerability Assessment ◽

Seismic Vulnerability ◽

Spatial Databases ◽

Support Vector ◽

Seismic Vulnerability Assessment ◽

The Common ◽

Gyeongju Earthquake

In this study, we performed seismic vulnerability assessment and mapping of the ML5.8 Gyeongju Earthquake in Gyeongju, South Korea, as a case study. We applied logistic regression (LR) and four kernel models based on the support vector machine (SVM) learning method to derive suitable models for assessing seismic vulnerabilities; the results of each model were then mapped and evaluated. Dependent variables were quantified using buildings damaged in the 9.12 Gyeongju Earthquake, and independent variables were constructed and used as spatial databases by selecting 15 sub-indicators related to earthquakes. Success and prediction rates were calculated using receiver operating characteristic (ROC) curves. The success rates of the models (LR, SVM models based on linear, polynomial, radial basis function, and sigmoid kernels) were 0.652, 0.649, 0.842, 0.998, and 0.630, respectively, and the prediction rates were 0.714, 0.651, 0.804, 0.919, and 0.629, respectively. Among the five models, RBF-SVM showed the highest performance. Seismic vulnerability maps were created for each of the five models and were graded as safe, low, moderate, high, or very high. Finally, we examined the distribution of building classes among the 23 administrative districts of Gyeongju. The common vulnerable regions among all five maps were Jungbu-dong and Hwangnam-dong, and the common safe region among all five maps was Gangdong-myeon.

Download Full-text

Link prediction applied to an open large-scale online social network

Proceedings of the 21st ACM conference on Hypertext and hypermedia - HT '10 ◽

10.1145/1810617.1810641 ◽

2010 ◽

Cited By ~ 6

Author(s):

Dan Corlette ◽

Frank M. Shipman

Keyword(s):

Social Network ◽

Link Prediction ◽

Large Scale ◽

Online Social Network

Download Full-text

Twitter Bots Detection Using Machine Learning Techniques

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.36637 ◽

2021 ◽

Vol 9 (VII) ◽

pp. 1536-1541

Author(s):

Deekshith S G

Keyword(s):

Social Network ◽

Online Social Network ◽

Identity Theft ◽

Machine Learning Techniques ◽

Support Vector ◽

Random Forest Algorithm ◽

Critical Problem ◽

Learning Techniques ◽

The Social ◽

Crucial Part

The social network, a crucial part of our life is plagued by online impersonation and fake accounts. Fake profiles are mostly used by the intruders to carry out malicious activities such as harming person , identity theft and privacy intrusion in Online Social Network(OSN). Hence identifying an account is genuine or fake is one of the critical problem in OSN .In this paper we proposed many classification algorithm like Support Vector Machine algorithm ,KNN, and Random Forest algorithm. It also studies the comparison of classification methods on Spam User dataset which is used to select the best.

Download Full-text

Heart Rate Variability-Derived Features Based on Deep Neural Network for Distinguishing Different Anaesthesia States

10.21203/rs.3.rs-41792/v2 ◽

2020 ◽

Author(s):

Jian Zhan ◽

Zuo-xi Wu ◽

Zhen-xin Duan ◽

Gui-ying Yang ◽

Zhi-yong Du ◽

...

Keyword(s):

Neural Network ◽

Heart Rate ◽

Support Vector Machine ◽

Logistic Regression ◽

Decision Tree ◽

Deep Neural Network ◽

Low Frequency ◽

Support Vector ◽

Low Frequency Power ◽

Frequency Power

Abstract Background: Estimating the depth of anaesthesia (DoA) is critical in modern anaesthetic practice. Multiple DoA monitors based on electroencephalograms (EEGs) have been widely used for DoA monitoring; however, these monitors may be inaccurate under certain conditions. In this work, the hypothesis that heart rate variability (HRV)-derived features based on a deep neural network can distinguish different anaesthesia states was investigated.Methods: A novel method of distinguishing different anaesthesia states was developed based on four HRV-derived time and frequency domain features combined with a deep neural network. Four features were extracted from an electrocardiogram, including the HRV high-frequency power, low-frequency power, high-to-low-frequency power ratio, and sample entropy. Next, these features were used as inputs for the deep neural network, which used the expert assessment of consciousness level as the reference output. Finally, the deep neural network was compared with the logistic regression, support vector machine, and decision tree models. The datasets of 23 anaesthesia patients were used to assess the proposed method.Results: The accuracies of the four models, in distinguishing the anaesthesia states, were 86.2% (logistic regression), 87.5% (support vector machine), 87.2% (decision tree), and 90.1% (deep neural network). The accuracy of deep neural network was higher than those of the logistic regression (p < 0.05), support vector machine (p < 0.05), and decision tree (p < 0.05) approaches. Our method outperformed the logistic regression, support vector machine, and decision tree methods.Conclusions: The incorporation of four HRV-derived time and frequency domain features and a deep neural network could accurately distinguish between different anaesthesia states; however, this study is a pilot of a feasibility study, providing a method to supplement DoA monitoring based on EEG features to improve the accuracy of DoA estimation.

Download Full-text

IDENTIFIKASI JENIS IKAN MENGGUNAKAN MODEL HYBRID DEEP LEARNING DAN ALGORITMA KLASIFIKASI

Sebatik ◽

10.46984/sebatik.v24i2.1057 ◽

2020 ◽

Vol 24 (2) ◽

Author(s):

Anifuddin Azis

Keyword(s):

Neural Networks ◽

Support Vector Machine ◽

Logistic Regression ◽

Deep Learning ◽

Random Forest ◽

Decision Tree ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Output

Indonesia merupakan negara dengan keanekaragaman hayati terbesar kedua di dunia setelah Brazil. Indonesia memiliki sekitar 25.000 spesies tumbuhan dan 400.000 jenis hewan dan ikan. Diperkirakan 8.500 spesies ikan hidup di perairan Indonesia atau merupakan 45% dari jumlah spesies yang ada di dunia, dengan sekitar 7.000an adalah spesies ikan laut. Untuk menentukan berapa jumlah spesies tersebut dibutuhkan suatu keahlian di bidang taksonomi. Dalam pelaksanaannya mengidentifikasi suatu jenis ikan bukanlah hal yang mudah karena memerlukan suatu metode dan peralatan tertentu, juga pustaka mengenai taksonomi. Pemrosesan video atau citra pada data ekosistem perairan yang dilakukan secara otomatis mulai dikembangkan. Dalam pengembangannya, proses deteksi dan identifikasi spesies ikan menjadi suatu tantangan dibandingkan dengan deteksi dan identifikasi pada objek yang lain. Metode deep learning yang berhasil dalam melakukan klasifikasi objek pada citra mampu untuk menganalisa data secara langsung tanpa adanya ekstraksi fitur pada data secara khusus. Sistem tersebut memiliki parameter atau bobot yang berfungsi sebagai ektraksi fitur maupun sebagai pengklasifikasi. Data yang diproses menghasilkan output yang diharapkan semirip mungkin dengan data output yang sesungguhnya. CNN merupakan arsitektur deep learning yang mampu mereduksi dimensi pada data tanpa menghilangkan ciri atau fitur pada data tersebut. Pada penelitian ini akan dikembangkan model hybrid CNN (Convolutional Neural Networks) untuk mengekstraksi fitur dan beberapa algoritma klasifikasi untuk mengidentifikasi spesies ikan. Algoritma klasifikasi yang digunakan pada penelitian ini adalah : Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree, K-Nearest Neighbor (KNN), Random Forest, Backpropagation.

Download Full-text

A Rough Set and Cellular Genetic Fusion Algorithm for Acute Critical Disease Prediction

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2020.6.3894 ◽

2020 ◽

Vol 15 (6) ◽

Author(s):

Hongxin Wang ◽

Lijing Jia ◽

Heng Zhuang ◽

Xueyan Li ◽

Yuzhuo Zhao ◽

...

Keyword(s):

Genetic Algorithm ◽

Support Vector Machine ◽

Logistic Regression ◽

Rough Set ◽

Rough Set Theory ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Disease Prediction ◽

Fusion Algorithm

This study is to solve the problems of an overly-broad scale of medical indicators, lack of retrospective research samples, insufficient depth of data mining, and low disease prediction accuracy. In this paper, we propose an intelligent screening algorithm that combines a genetic algorithm, cellular automata, and rough set theory. This algorithm can achieve high accuracy in predicting patient outcomes with a small number of indicators. And we compare it with the traditional genetic algorithm. We built the prediction model with 64 indicators based on the logistic regression (AUC 0.8628), support vector machine (AUC 0.5319), Naïve Bayes (AUC 0.7102), and AdaBoost algorithms (AUC 0.9095). Using the cellular genetic algorithm for attribute screening not only effectively reduces the number of indicators but also achieve almost the same accuracy of prediction with 8 indicators based on the logistic regression (AUC 0.8782), support vector machine (AUC 0.8525), Naïve Bayes (AUC 0.8408), and AdaBoost algorithms (AUC 0.8770). Compared with the traditional scoring system, the predictive model established in this paper can more accurately predict rebleeding accidents based on physiological test indicators and continuous patient indicators.

Download Full-text

Supervised shift k ‐means based machine learning approach for link prediction using inherent structural properties of large online social network

Computational Intelligence ◽

10.1111/coin.12372 ◽

2020 ◽

Author(s):

Praveen Kumar Bhanodia ◽

Aditya Khamparia ◽

Babita Pandey

Keyword(s):

Machine Learning ◽

Social Network ◽

Structural Properties ◽

Link Prediction ◽

Online Social Network ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text