Homophily-Based Link Prediction in The Facebook Online Social Network: A Rough Sets Approach

2015 ◽  
Vol 24 (4) ◽  
pp. 491-503 ◽  
Author(s):  
Islam Elkabani ◽  
Roa A. Aboo Khachfeh

AbstractOnline social networks are highly dynamic and sparse. One of the main problems in analyzing these networks is the problem of predicting the existence of links between users on these networks: the link prediction problem. Many studies have been conducted to predict links using a variety of techniques like the decision tree and the logistic regression approaches. In this work, we will illustrate the use of the rough set theory in predicting links over the Facebook social network based on homophilic features. Other supervised learning algorithms are also employed in our experiments and compared with the rough set classifier, such as naive Bayes, J48 decision tree, support vector machine, logistic regression, and multilayer perceptron neural network. Moreover, we studied the influence of the “common groups” and “common page likes” homophilic features on predicting friendship between users of Facebook, and also studied the effect of using the Jaccard coefficient in measuring the similarity between users’ homophilic attributes compared with using the overlap coefficient. We conducted our experiments on two different datasets obtained from the Facebook online social network, where users in each dataset live within the same geographical region. The results showed that the rough set classifier significantly outperformed the other classifiers in all experiments. The results also demonstrated that the common groups and the common page likes features have a significant influence on predicting the friendship between users of Facebook. Finally, the results revealed that using the overlap coefficient homophilic features provided better results than that of the Jaccard coefficient features.

Author(s):  
Anand Kumar Gupta ◽  
Neetu Sardana

The objective of an online social network is to amplify the stream of information among the users. This goal can be accomplished by maximizing interconnectivity among users using link prediction techniques. Existing link prediction techniques uses varied heuristics such as similarity score to predict possible connections. Link prediction can be considered a binary classification problem where probable class outcomes are presence and absence of connections. One of the challenges in classification is to decide threshold value. Since the social network is exceptionally dynamic in nature and each user possess different features, it is difficult to choose a static, common threshold which decides whether two non-connected users will form interconnectivity. This article proposes a novel technique, FIXT, that dynamically decides the threshold value for predicting the possibility of new link formation. The article evaluates the performance of FIXT with six baseline techniques. The comparative results depict that FIXT achieves accuracy up to 93% and outperforms baseline techniques.


2020 ◽  
Vol 10 (15) ◽  
pp. 5047 ◽  
Author(s):  
Viet-Ha Nhu ◽  
Danesh Zandi ◽  
Himan Shahabi ◽  
Kamran Chapi ◽  
Ataollah Shirzadi ◽  
...  

This paper aims to apply and compare the performance of the three machine learning algorithms–support vector machine (SVM), bayesian logistic regression (BLR), and alternating decision tree (ADTree)–to map landslide susceptibility along the mountainous road of the Salavat Abad saddle, Kurdistan province, Iran. We identified 66 shallow landslide locations, based on field surveys, by recording the locations of the landslides by a global position System (GPS), Google Earth imagery and black-and-white aerial photographs (scale 1: 20,000) and 19 landslide conditioning factors, then tested these factors using the information gain ratio (IGR) technique. We checked the validity of the models using statistical metrics, including sensitivity, specificity, accuracy, kappa, root mean square error (RMSE), and area under the receiver operating characteristic curve (AUC). We found that, although all three machine learning algorithms yielded excellent performance, the SVM algorithm (AUC = 0.984) slightly outperformed the BLR (AUC = 0.980), and ADTree (AUC = 0.977) algorithms. We observed that not only all three algorithms are useful and effective tools for identifying shallow landslide-prone areas but also the BLR algorithm can be used such as the SVM algorithm as a soft computing benchmark algorithm to check the performance of the models in future.


2019 ◽  
Vol 11 (24) ◽  
pp. 7038 ◽  
Author(s):  
Jihye Han ◽  
Soyoung Park ◽  
Seongheon Kim ◽  
Sanghun Son ◽  
Seonghyeok Lee ◽  
...  

In this study, we performed seismic vulnerability assessment and mapping of the ML5.8 Gyeongju Earthquake in Gyeongju, South Korea, as a case study. We applied logistic regression (LR) and four kernel models based on the support vector machine (SVM) learning method to derive suitable models for assessing seismic vulnerabilities; the results of each model were then mapped and evaluated. Dependent variables were quantified using buildings damaged in the 9.12 Gyeongju Earthquake, and independent variables were constructed and used as spatial databases by selecting 15 sub-indicators related to earthquakes. Success and prediction rates were calculated using receiver operating characteristic (ROC) curves. The success rates of the models (LR, SVM models based on linear, polynomial, radial basis function, and sigmoid kernels) were 0.652, 0.649, 0.842, 0.998, and 0.630, respectively, and the prediction rates were 0.714, 0.651, 0.804, 0.919, and 0.629, respectively. Among the five models, RBF-SVM showed the highest performance. Seismic vulnerability maps were created for each of the five models and were graded as safe, low, moderate, high, or very high. Finally, we examined the distribution of building classes among the 23 administrative districts of Gyeongju. The common vulnerable regions among all five maps were Jungbu-dong and Hwangnam-dong, and the common safe region among all five maps was Gangdong-myeon.


Author(s):  
Deekshith S G

The social network, a crucial part of our life is plagued by online impersonation and fake accounts. Fake profiles are mostly used by the intruders to carry out malicious activities such as harming person , identity theft and privacy intrusion in Online Social Network(OSN). Hence identifying an account is genuine or fake is one of the critical problem in OSN .In this paper we proposed many classification algorithm like Support Vector Machine algorithm ,KNN, and Random Forest algorithm. It also studies the comparison of classification methods on Spam User dataset which is used to select the best.


2020 ◽  
Author(s):  
Jian Zhan ◽  
Zuo-xi Wu ◽  
Zhen-xin Duan ◽  
Gui-ying Yang ◽  
Zhi-yong Du ◽  
...  

Abstract Background: Estimating the depth of anaesthesia (DoA) is critical in modern anaesthetic practice. Multiple DoA monitors based on electroencephalograms (EEGs) have been widely used for DoA monitoring; however, these monitors may be inaccurate under certain conditions. In this work, the hypothesis that heart rate variability (HRV)-derived features based on a deep neural network can distinguish different anaesthesia states was investigated.Methods: A novel method of distinguishing different anaesthesia states was developed based on four HRV-derived time and frequency domain features combined with a deep neural network. Four features were extracted from an electrocardiogram, including the HRV high-frequency power, low-frequency power, high-to-low-frequency power ratio, and sample entropy. Next, these features were used as inputs for the deep neural network, which used the expert assessment of consciousness level as the reference output. Finally, the deep neural network was compared with the logistic regression, support vector machine, and decision tree models. The datasets of 23 anaesthesia patients were used to assess the proposed method.Results: The accuracies of the four models, in distinguishing the anaesthesia states, were 86.2% (logistic regression), 87.5% (support vector machine), 87.2% (decision tree), and 90.1% (deep neural network). The accuracy of deep neural network was higher than those of the logistic regression (p < 0.05), support vector machine (p < 0.05), and decision tree (p < 0.05) approaches. Our method outperformed the logistic regression, support vector machine, and decision tree methods.Conclusions: The incorporation of four HRV-derived time and frequency domain features and a deep neural network could accurately distinguish between different anaesthesia states; however, this study is a pilot of a feasibility study, providing a method to supplement DoA monitoring based on EEG features to improve the accuracy of DoA estimation.


Sebatik ◽  
2020 ◽  
Vol 24 (2) ◽  
Author(s):  
Anifuddin Azis

Indonesia merupakan negara dengan keanekaragaman hayati terbesar kedua di dunia setelah Brazil. Indonesia memiliki sekitar 25.000 spesies tumbuhan dan 400.000 jenis hewan dan ikan. Diperkirakan 8.500 spesies ikan hidup di perairan Indonesia atau merupakan 45% dari jumlah spesies yang ada di dunia, dengan sekitar 7.000an adalah spesies ikan laut. Untuk menentukan berapa jumlah spesies tersebut dibutuhkan suatu keahlian di bidang taksonomi. Dalam pelaksanaannya mengidentifikasi suatu jenis ikan bukanlah hal yang mudah karena memerlukan suatu metode dan peralatan tertentu, juga pustaka mengenai taksonomi. Pemrosesan video atau citra pada data ekosistem perairan yang dilakukan secara otomatis mulai dikembangkan. Dalam pengembangannya, proses deteksi dan identifikasi spesies ikan menjadi suatu tantangan dibandingkan dengan deteksi dan identifikasi pada objek yang lain. Metode deep learning yang berhasil dalam melakukan klasifikasi objek pada citra mampu untuk menganalisa data secara langsung tanpa adanya ekstraksi fitur pada data secara khusus. Sistem tersebut memiliki parameter atau bobot yang berfungsi sebagai ektraksi fitur maupun sebagai pengklasifikasi. Data yang diproses menghasilkan output yang diharapkan semirip mungkin dengan data output yang sesungguhnya.  CNN merupakan arsitektur deep learning yang mampu mereduksi dimensi pada data tanpa menghilangkan ciri atau fitur pada data tersebut. Pada penelitian ini akan dikembangkan model hybrid CNN (Convolutional Neural Networks) untuk mengekstraksi fitur dan beberapa algoritma klasifikasi untuk mengidentifikasi spesies ikan. Algoritma klasifikasi yang digunakan pada penelitian ini adalah : Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree, K-Nearest Neighbor (KNN),  Random Forest, Backpropagation.


Author(s):  
Hongxin Wang ◽  
Lijing Jia ◽  
Heng Zhuang ◽  
Xueyan Li ◽  
Yuzhuo Zhao ◽  
...  

This study is to solve the problems of an overly-broad scale of medical indicators, lack of retrospective research samples, insufficient depth of data mining, and low disease prediction accuracy. In this paper, we propose an intelligent screening algorithm that combines a genetic algorithm, cellular automata, and rough set theory. This algorithm can achieve high accuracy in predicting patient outcomes with a small number of indicators. And we compare it with the traditional genetic algorithm. We built the prediction model with 64 indicators based on the logistic regression (AUC 0.8628), support vector machine (AUC 0.5319), Naïve Bayes (AUC 0.7102), and AdaBoost algorithms (AUC 0.9095). Using the cellular genetic algorithm for attribute screening not only effectively reduces the number of indicators but also achieve almost the same accuracy of prediction with 8 indicators based on the logistic regression (AUC 0.8782), support vector machine (AUC 0.8525), Naïve Bayes (AUC 0.8408), and AdaBoost algorithms (AUC 0.8770). Compared with the traditional scoring system, the predictive model established in this paper can more accurately predict rebleeding accidents based on physiological test indicators and continuous patient indicators.


Sign in / Sign up

Export Citation Format

Share Document