A new algorithm for detecting communities in social networks based on content and structure information

2019 ◽  
Vol 16 (1) ◽  
pp. 79-93
Author(s):  
ELyazid Akachar ◽  
Brahim Ouhbi ◽  
Bouchra Frikh

Purpose The purpose of this paper is to present an algorithm for detecting communities in social networks. Design/methodology/approach The majority of existing methods of community detection in social networks are based on structural information, and they neglect the content information. In this paper, the authors propose a novel approach that combines the content and structure information to discover more meaningful communities in social networks. To integrate the content information in the process of community detection, the authors propose to exploit the texts involved in social networks to identify the users’ topics of interest. These topics are detected based on the statistical and semantic measures, which allow us to divide the users into different groups so that each group represents a distinct topic. Then, the authors perform links analysis in each group to discover the users who are highly interconnected (communities). Findings To validate the performance of the approach, the authors carried out a set of experiments on four real life data sets, and they compared their method with classical methods that ignore the content information. Originality/value The experimental results demonstrate that the quality of community structure is improved when we take into account the content and structure information during the procedure of community detection.

2021 ◽  
Vol 40 (1) ◽  
pp. 1597-1608
Author(s):  
Ilker Bekmezci ◽  
Murat Ermis ◽  
Egemen Berki Cimen

Social network analysis offers an understanding of our modern world, and it affords the ability to represent, analyze and even simulate complex structures. While an unweighted model can be used for online communities, trust or friendship networks should be analyzed with weighted models. To analyze social networks, it is essential to produce realistic social models. However, there are serious differences between social network models and real-life data in terms of their fundamental statistical parameters. In this paper, a genetic algorithm (GA)-based social network improvement method is proposed to produce social networks more similar to real-life data sets. First, it creates a social model based on existing studies in the literature, and then it improves the model with the proposed GA-based approach based on the similarity of the average degree, the k-nearest neighbor, the clustering coefficient, degree distribution and link overlap. This study can be used to model the structural and statistical properties of large-scale societies more realistically. The performance results show that our approach can reduce the dissimilarity between the created social networks and the real-life data sets in terms of their primary statistical properties. It has been shown that the proposed GA-based approach can be used effectively not only in unweighted networks but also in weighted networks.


2017 ◽  
Vol 35 (4) ◽  
pp. 369-381 ◽  
Author(s):  
Jussi Vimpari ◽  
Seppo Junnila

Purpose Retail properties are a perfect example of a property class where revenues determine the rent for the property owners. Estimating the value of new retail developments is challenging, as the initial revenues can have a significant variance from the long-term revenue levels. Owners and tenants try to manage this problem by introducing different kind of options, such as overage rent and extension rights, to the lease contracts. The purpose of this paper is to value these options through time for different types of retailers, using real-life data with a method that can be easily applied in practice. Design/methodology/approach This paper builds upon the existing papers on real option studies but has a strong practical focus, which has been identified as a challenge in the field. The paper presents simple mathematical equations for valuing overage rent and extension options. The equations capture the value related to uncertainty (volatility) that is missed by standard valuation practices. Findings The results indicate that overage and extension options can represent a significant proportion of retail lease contract’s value and their value is heavily time-dependent. The option values differ greatly between tenants, as the volatilities can have a large spread across tenants. The paper suggests that the applicability of option pricing theory and calculus should not be considered as an insurmountable barrier any more, rather a greater challenge for the practical adaptability of the method can be the availability of real-life data that is a common problem in real option analysis. Practical implications The value of extension and overage options varies greatly between tenants. In general, the property owner can try balance the positive effects from the overage rents to the negative effects of tenant extensions. However, this study tries to highlight that, as usual, using the “law of averages” can result into poor valuation in this context as well. Even the data used in this study provide valuable findings for the property owner as an analytical deduction can be made that certain types of tenants have higher volatilities and this should be acknowledged when valuing options within lease contracts. Originality/value Previous literature in this topic often takes the input data for the option valuation as granted rather than trying to identify the real-life data available for the calculation. This is a common problem in real options valuation and it seems to be one of the reasons why option valuation has not been used widely in practice. This study has used real-life data to assess the problem and more importantly assessed the data across different types of tenants. The volatility spread between different types of tenants has not been discussed previously, even though it has a significant importance when using option pricing in practice.


Author(s):  
Amany A. Naem ◽  
Neveen I. Ghali

Antlion Optimization (ALO) is one of the latest population based optimization methods that proved its good performance in a variety of applications. The ALO algorithm copies the hunting mechanism of antlions to ants in nature. Community detection in social networks is conclusive to understanding the concepts of the networks. Identifying network communities can be viewed as a problem of clustering a set of nodes into communities. k-median clustering is one of the popular techniques that has been applied in clustering. The problem of clustering network can be formalized as an optimization problem where a qualitatively objective function that captures the intuition of a cluster as a set of nodes with better in ternal connectivity than external connectivity is selected to be optimized. In this paper, a mixture antlion optimization and k-median for solving the community detection problem is proposed and named as K-median Modularity ALO. Experimental results which are applied on real life networks show the ability of the mixture antlion optimization and k-median to detect successfully an optimized community structure based on putting the modularity as an objective function.


2008 ◽  
pp. 1231-1249
Author(s):  
Jaehoon Kim ◽  
Seong Park

Much of the research regarding streaming data has focused only on real time querying and analysis of recent data stream allowable in memory. However, as data stream mining, or tracking of past data streams, is often required, it becomes necessary to store large volumes of streaming data in stable storage. Moreover, as stable storage has restricted capacity, past data stream must be summarized. The summarization must be performed periodically because streaming data flows continuously, quickly, and endlessly. Therefore, in this paper, we propose an efficient periodic summarization method with a flexible storage allocation. It improves the overall estimation error by flexibly adjusting the size of the summarized data of each local time section. Additionally, as the processing overhead of compression and the disk I/O cost of decompression can be an important factor for quick summarization, we also consider setting the proper size of data stream to be summarized at a time. Some experimental results with artificial data sets as well as real life data show that our flexible approach is more efficient than the existing fixed approach.


2019 ◽  
Vol 38 (2) ◽  
pp. 293-307
Author(s):  
Po-Yen Chen

Purpose This study attempts to use a new source of data collection from open government data sets to identify potential academic social networks (ASNs) and defines their collaboration patterns. The purpose of this paper is to propose a direction that may advance our current understanding on how or why ASNs are formed or motivated and influence their research collaboration. Design/methodology/approach This study first reviews the open data sets in Taiwan, which is ranked as the first state in Global Open Data Index published by Open Knowledge Foundation to select the data sets that expose the government’s R&D activities. Then, based on the theory review of research collaboration, potential ASNs in those data sets are identified and are further generalized as various collaboration patterns. A research collaboration framework is used to present these patterns. Findings Project-based social networks, learning-based social networks and institution-based social networks are identified and linked to various collaboration patterns. Their collaboration mechanisms, e.g., team composition, motivation, relationship, measurement, and benefit-cost, are also discussed and compared. Originality/value In traditional, ASNs have usually been known as co-authorship networks or co-inventorship networks due to the limitation of data collection. This study first identifies some ASNs that may be formed before co-authorship networks or co-inventorship networks are formally built-up, and may influence the outcomes of research collaborations. These information allow researchers to deeply dive into the structure of ASNs and resolve collaboration mechanisms.


2020 ◽  
Vol 34 (35) ◽  
pp. 2050408
Author(s):  
Sumit Gupta ◽  
Dhirendra Pratap Singh

In today’s world scenario, many of the real-life problems and application data can be represented with the help of the graphs. Nowadays technology grows day by day at a very fast rate; applications generate a vast amount of valuable data, due to which the size of their representation graphs is increased. How to get meaningful information from these data become a hot research topic. Methodical algorithms are required to extract useful information from these raw data. These unstructured graphs are not scattered in nature, but these show some relationships between their basic entities. Identifying communities based on these relationships improves the understanding of the applications represented by graphs. Community detection algorithms are one of the solutions which divide the graph into small size clusters where nodes are densely connected within the cluster and sparsely connected across. During the last decade, there are lots of algorithms proposed which can be categorized into mainly two broad categories; non-overlapping and overlapping community detection algorithm. The goal of this paper is to offer a comparative analysis of the various community detection algorithms. We bring together all the state of art community detection algorithms related to these two classes into a single article with their accessible benchmark data sets. Finally, we represent a comparison of these algorithms concerning two parameters: one is time efficiency, and the other is how accurately the communities are detected.


2014 ◽  
Vol 22 (4) ◽  
pp. 358-370 ◽  
Author(s):  
John Haggerty ◽  
Sheryllynne Haggerty ◽  
Mark Taylor

Purpose – The purpose of this paper is to propose a novel approach that automates the visualisation of both quantitative data (the network) and qualitative data (the content) within emails to aid the triage of evidence during a forensics investigation. Email remains a key source of evidence during a digital investigation, and a forensics examiner may be required to triage and analyse large email data sets for evidence. Current practice utilises tools and techniques that require a manual trawl through such data, which is a time-consuming process. Design/methodology/approach – This paper applies the methodology to the Enron email corpus, and in particular one key suspect, to demonstrate the applicability of the approach. Resulting visualisations of network narratives are discussed to show how network narratives may be used to triage large evidence data sets. Findings – Using the network narrative approach enables a forensics examiner to quickly identify relevant evidence within large email data sets. Within the case study presented in this paper, the results identify key witnesses, other actors of interest to the investigation and potential sources of further evidence. Practical implications – The implications are for digital forensics examiners or for security investigations that involve email data. The approach posited in this paper demonstrates the triage and visualisation of email network narratives to aid an investigation and identify potential sources of electronic evidence. Originality/value – There are a number of network visualisation applications in use. However, none of these enable the combined visualisation of quantitative and qualitative data to provide a view of what the actors are discussing and how this shapes the network in email data sets.


2008 ◽  
Vol 20 (4) ◽  
pp. 1042-1064
Author(s):  
Maciej Pedzisz ◽  
Danilo P. Mandic

A homomorphic feedforward network (HFFN) for nonlinear adaptive filtering is introduced. This is achieved by a two-layer feedforward architecture with an exponential hidden layer and logarithmic preprocessing step. This way, the overall input-output relationship can be seen as a generalized Volterra model, or as a bank of homomorphic filters. Gradient-based learning for this architecture is introduced, together with some practical issues related to the choice of optimal learning parameters and weight initialization. The performance and convergence speed are verified by analysis and extensive simulations. For rigor, the simulations are conducted on artificial and real-life data, and the performances are compared against those obtained by a sigmoidal feedforward network (FFN) with identical topology. The proposed HFFN proved to be a viable alternative to FFNs, especially in the critical case of online learning on small- and medium-scale data sets.


Author(s):  
SANGHAMITRA BANDYOPADHYAY ◽  
UJJWAL MAULIK ◽  
MALAY KUMAR PAKHIRA

An efficient partitional clustering technique, called SAKM-clustering, that integrates the power of simulated annealing for obtaining minimum energy configuration, and the searching capability of K-means algorithm is proposed in this article. The clustering methodology is used to search for appropriate clusters in multidimensional feature space such that a similarity metric of the resulting clusters is optimized. Data points are redistributed among the clusters probabilistically, so that points that are farther away from the cluster center have higher probabilities of migrating to other clusters than those which are closer to it. The superiority of the SAKM-clustering algorithm over the widely used K-means algorithm is extensively demonstrated for artificial and real life data sets.


2014 ◽  
Vol 25 (3) ◽  
pp. 635-655 ◽  
Author(s):  
Sirikhorn Klindokmai ◽  
Peter Neech ◽  
Yue Wu ◽  
Udechukwu Ojiako ◽  
Max Chipulu ◽  
...  

Purpose – Virgin Atlantic Cargo is one of the largest air freight operators in the world. As part of a wider strategic development initiative, the company has identified forecasting accuracy as of strategic importance to its operational efficiency. This is because accurate forecast enables the company to have the right resources available at the right place and time. The purpose of this paper is to undertake an evaluation of current month-to-date forecasting utilized by Virgin Atlantic Cargo. The study employed demand patterns drawn from historical data on chargeable weight over a seven-year-period covering six of the company's routes. Design/methodology/approach – A case study is carried out, where a comparison between forecasting models is undertaken using error accuracy measures. Data in the form of historical chargeable weight over a seven-year-period covering six of the company's most profitable routes are employed in the study. For propriety and privacy reasons, data provided by the company have been sanitized. Findings – Preliminary analysis of the time series shows that the air cargo chargeable weight could be difficult to forecast due to demand fluctuations which appear extremely sensitive to external market and economic factors. Originality/value – The study contributes to existing literature on air cargo forecasting and is therefore of interest to scholars examining the problems of overbooking. Overbooking which is employed by air cargo operators to hedge against “no-show” bookings. However, the inability of air cargo operators to accurately predict cargo capacity unlikely to be used implies that operators are unable to establish with an aspect of certainty their revenue streams. The research methodology adopted is also predominantly discursive in that it employs a synthesis of existing forecasting literature and real-life data for accuracy analysis.


Sign in / Sign up

Export Citation Format

Share Document