scholarly journals On community structure validation in real networks

Author(s):  
Mirko Signorelli ◽  
Luisa Cutillo

AbstractCommunity structure is a commonly observed feature of real networks. The term refers to the presence in a network of groups of nodes (communities) that feature high internal connectivity, but are poorly connected between each other. Whereas the issue of community detection has been addressed in several works, the problem of validating a partition of nodes as a good community structure for a real network has received considerably less attention and remains an open issue. We propose a set of indices for community structure validation of network partitions that are based on an hypothesis testing procedure that assesses the distribution of links between and within communities. Using both simulations and real data, we illustrate how the proposed indices can be employed to compare the adequacy of different partitions of nodes as community structures in a given network, to assess whether two networks share the same or similar community structures, and to evaluate the performance of different network clustering algorithms.

2019 ◽  
Vol 33 (13) ◽  
pp. 1950164
Author(s):  
Qing-Feng Dong ◽  
Dian-Kun Chen ◽  
Ting Wang

At present, the detection of urban community structures is mainly based on existing administrative divisions, and is performed using qualitative methods. The lack of quantitative methods makes it difficult to judge the rationality of urban community divisions. In this study, we used complex network association mining methods to detect a city community structure by using the Origin-Destinations (OD) at traffic analysis zone (TAZ) level, and successively assigned all the TAZs into different communities. Based on the community results, we calculated the community core degree of each TAZ within every community, and then calculated the Traffic Core Degree and Location Core Degree indicators of the community based on OD passenger flow and spatial location relationship between communities. Finally, we analyzed the correlation among three indicators to ensure the rationality of the community structure. We used the city of Zhengzhou in 2016 as an example case study. For Zhengzhou, we detected a total of six communities. We found a relatively low correlation between Traffic Core Degree and Location Core Degree. Within each group, the correlation between community core degree and Traffic Core Degree was higher than that between community core degree and Location Core Degree, indicating that the urban community structure is more reasonably based on traffic characteristics. The development of a quantitative approach for determining reasonable city community structures has important implications for transportation planning and industrial layout.


2016 ◽  
Vol 35 (2) ◽  
pp. 244-261 ◽  
Author(s):  
Frederic Guerrero-Solé

In November 9, 2014, the Catalan government called Catalan people to participate in a straw poll about the independence of Catalonia from Spain. This article analyzes the use of Twitter between November 8 and 10, 2014. Drawing on a methodology developed by Guerrero-Solé, Corominas-Murtra, and Lopez-Gonzalez, this work examines the structure of the retweet overlap network (RON), formed by those users whose communities of retweeters have nonzero overlapping, to detect the community structure of the network. The results show a high polarization of the resulting network and prove that the RON is a reliable method to determinate network community structures and users’ political leaning in political discussions.


2021 ◽  
Vol 66 (3) ◽  
pp. 7-21
Author(s):  
Mirosław Szreder

Increasing numbers of non-random errors are observed in contemporary sample surveying – in particular, those resulting from no response or faulty measutrements (imprecise statistical observation). Until recently, the consequences of these kinds of errors have not been widely discussed in the context of the testing of hypoteses. Researchers focused almost entirely on sampling errors (random errors), whose magnitude decreases as the size of the random sample grows. In consequence, researchers who often use samples of very large sizes tend to overlook the influence random and non-random errors have on the results of their study. The aim of this paper is to present how non-random errors can affect the decision-making process based on the classical hypothesis testing procedure. Particular attention is devoted to cases in which researchers manage samples of large sizes. The study proved the thesis that samples of large sizes cause statistical tests to be more sensitive to non-random errors. Systematic errors, as a special case of non-random errors, increase the probability of making the wrong decision to reject a true hypothesis as the sample size grows. Supplementing the testing of hypotheses with the analysis of confidence intervals may in this context provide substantive support for the researcher in drawing accurate inferences.


2020 ◽  
Author(s):  
Wu Qu ◽  
Boliang Gao ◽  
Jie Wu ◽  
Min Jin ◽  
Jianxin Wang ◽  
...  

Abstract Background Microbial roles in element cycling and nutrient providing are crucial for mangrove ecosystems and serve as important regulators for climate change in Earth ecosystem. However, some key information about the spatiotemporal influences and abiotic and biotic shaping factors for the microbial communities in mangrove sediments remains lacking. Methods In this work, 22 sediment samples were collected from multiple spatiotemporal dimensions, including three locations, two depths, and four seasons, and the bacterial, archaeal, and fungal community structures in these samples were studied using amplicon sequencing. Results The microbial community structures were varied in the samples from different depths and locations based on the results of LDA effect size analysis, principal coordinate analysis, the analysis of similarities, and permutational multivariate ANOVA. However, these microbial community structures were stable among the seasonal samples. Linear fitting models and Mantel test showed that among the 13 environmental factors measured in this study, the sediment particle size (PS) was the key abiotic shaping factor for the bacterial, archaeal, or fungal community structure. Besides PS, salinity and humidity were also significant impact factors according to the canonical correlation analysis (p ≤ 0.05). Co-occurrence networks demonstrated that the bacteria assigned into phyla Ignavibacteriae, Proteobacteria, Bacteroidetes, Chloroflexi, and Acidobacteria were the key biotic factors for shaping the bacterial community in mangrove sediments. Conclusions This work showed the variability on spatial dimensions and the stability on temporal dimension for the bacterial, archaeal, or fungal microbial community structure, indicating that the tropical mangrove sediments are versatile but stable environments. PS served as the key abiotic factor could indirectly participate in material circulation in mangroves by influencing microbial community structures, along with salinity and humidity. The bacteria as key biotic factors were found with the abilities of photosynthesis, polysaccharide degradation, or nitrogen fixation, which were potential indicators for monitoring mangrove health, as well as crucial participants in the storage of mangrove blue carbons and mitigation of climate warming. This study expanded the knowledge of mangroves for the spatiotemporal variation, distribution, and regulation of the microbial community structures, thus further elucidating the microbial roles in mangrove management and climate regulation.


Author(s):  
Chunhua Ren ◽  
Linfu Sun

AbstractThe classic Fuzzy C-means (FCM) algorithm has limited clustering performance and is prone to misclassification of border points. This study offers a bi-directional FCM clustering ensemble approach that takes local information into account (LI_BIFCM) to overcome these challenges and increase clustering quality. First, various membership matrices are created after running FCM multiple times, based on the randomization of the initial cluster centers, and a vertical ensemble is performed using the maximum membership principle. Second, after each execution of FCM, multiple local membership matrices of the sample points are created using multiple K-nearest neighbors, and a horizontal ensemble is performed. Multiple horizontal ensembles can be created using multiple FCM clustering. Finally, the final clustering results are obtained by combining the vertical and horizontal clustering ensembles. Twelve data sets were chosen for testing from both synthetic and real data sources. The LI_BIFCM clustering performance outperformed four traditional clustering algorithms and three clustering ensemble algorithms in the experiments. Furthermore, the final clustering results has a weak correlation with the bi-directional cluster ensemble parameters, indicating that the suggested technique is robust.


2003 ◽  
Vol 11 (4) ◽  
pp. 381-396 ◽  
Author(s):  
Joshua D. Clinton ◽  
Adam Meirowitz

Scholars of legislative studies typically use ideal point estimates from scaling procedures to test theories of legislative politics. We contend that theory and methods may be better integrated by directly incorporating maintained and to be tested hypotheses in the statistical model used to estimate legislator preferences. In this view of theory and estimation, formal modeling (1) provides auxiliary assumptions that serve as constraints in the estimation process, and (2) generates testable predictions. The estimation and hypothesis testing procedure uses roll call data to evaluate the validity of theoretically derived to be tested hypotheses in a world where maintained hypotheses are presumed true. We articulate the approach using the language of statistical inference (both frequentist and Bayesian). The approach is demonstrated in analyses of the well-studied Powell amendment to the federal aid-to-education bill in the 84th House and the Compromise of 1790 in the 1st House.


MAKILA ◽  
2019 ◽  
Vol 13 (1) ◽  
pp. 14-28
Author(s):  
Sitna Marasabessy ◽  
Bokiraiya Latuamury ◽  
Iskar Iskar ◽  
Christy C.V. Suhendy

Green open space is at least a minimum requirement for an environmentally sustainable city at 30% of the total area. Pressure on green free space, especially the Green belt area in the river border, tends to increase from year to year due to an increase in urban population. Therefore, this study aims to analyze people's perceptions of the green belt vegetation's role in the watershed of the Wae Batu Gajah watershed in Ambon City. The research method uses descriptive methods that describe a situation based on facts in the field and do not treat the object, with the hypothesis testing procedure using Chi-Square. The results showed that the community's socio-economic parameters consisting of age, formal education, and occupation had a significant influence on the understanding of the green border of the river. In contrast, gender and marital status parameters have no significant effect on understanding the green belt border. Formal education can influence attitudes and behavior through values, character, and understanding of a problem built in stages in a person. The type of work a person has for a long time working will affect the environment's mindset and behavior. The poor only have two sources of income, through salaries / informal business surpluses for basic needs.


2021 ◽  
pp. 1-18
Author(s):  
Angeliki Koutsimpela ◽  
Konstantinos D. Koutroumbas

Several well known clustering algorithms have their own online counterparts, in order to deal effectively with the big data issue, as well as with the case where the data become available in a streaming fashion. However, very few of them follow the stochastic gradient descent philosophy, despite the fact that the latter enjoys certain practical advantages (such as the possibility of (a) running faster than their batch processing counterparts and (b) escaping from local minima of the associated cost function), while, in addition, strong theoretical convergence results have been established for it. In this paper a novel stochastic gradient descent possibilistic clustering algorithm, called O- PCM 2 is introduced. The algorithm is presented in detail and it is rigorously proved that the gradient of the associated cost function tends to zero in the L 2 sense, based on general convergence results established for the family of the stochastic gradient descent algorithms. Furthermore, an additional discussion is provided on the nature of the points where the algorithm may converge. Finally, the performance of the proposed algorithm is tested against other related algorithms, on the basis of both synthetic and real data sets.


Author(s):  
Alexander Troussov ◽  
Sergey Maruev ◽  
Sergey Vinogradov ◽  
Mikhail Zhizhin

Techno-social systems generate data, which are rather different, than data, traditionally studied in social network analysis and other fields. In massive social networks agents simultaneously participate in several contexts, in different communities. Network models of many real data from techno-social systems reflect various dimensionalities and rationales of actor's actions and interactions. The data are inherently multidimensional, where “everything is deeply intertwingled”. The multidimensional nature of Big Data and the emergence of typical network characteristics in Big Data, makes it reasonable to address the challenges of structure detection in network models, including a) development of novel methods for local overlapping clustering with outliers, b) with near linear performance, c) preferably combined with the computation of the structural importance of nodes. In this chapter the spreading connectivity based clustering method is introduced. The viability of the approach and its advantages are demonstrated on the data from the largest European social network VK.


Sign in / Sign up

Export Citation Format

Share Document