EMPIRICAL ANALYSIS OF THE CLUSTERING COEFFICIENT IN THE USER-OBJECT BIPARTITE NETWORKS

2013 ◽  
Vol 24 (08) ◽  
pp. 1350055 ◽  
Author(s):  
JIANGUO LIU ◽  
LEI HOU ◽  
YI-LU ZHANG ◽  
WEN-JUN SONG ◽  
XUE PAN

The clustering coefficient of the bipartite network, C4, has been widely used to investigate the statistical properties of the user-object systems. In this paper, we empirically analyze the evolution patterns of C4 for a nine year MovieLens data set, where C4 is used to describe the diversity of the user interest. First, we divide the MovieLens data set into fractions according to the time intervals and calculate C4 of each fraction. The empirical results show that, the diversity of the user interest changes periodically with a round of one year, which reaches the smallest value in spring, then increases to the maximum value in autumn and begins to decrease in winter. Furthermore, a null model is proposed to compare with the empirical results, which is constructed in the following way. Each user selects each object with a turnable probability p, and the numbers of users and objects are equal to that of the real MovieLens data set. The comparison result indicates that the user activity has greatly influenced the structure of the user-object bipartite network, and users with the same degree information may have two totally different clustering coefficients. On the other hand, the same clustering coefficient also corresponds to different degrees. Therefore, we need to take the clustering coefficient into consideration together with the degree information when describing the user selection activity.

2019 ◽  
Vol 30 (05) ◽  
pp. 1950035 ◽  
Author(s):  
Xiao-Lu Liu ◽  
Shu-Wei Jia ◽  
Yan Gu

User reputation is of great significance for online rating systems which can be described by user-object bipartite networks, measuring the user ability of rating accurate assessments of various objects. The clustering coefficients have been widely investigated to analyze the local structural properties of complex networks, analyzing the diversity of user interest. In this paper, we empirically analyze the relation of user reputation and clustering property for the user-object bipartite networks. Grouping by user reputation, the results for the MovieLens dataset show that both the average clustering coefficient and the standard deviation of clustering coefficient decrease with the user reputation, which are different from the results that the average clustering coefficient and the standard deviation of clustering coefficient remain stable regardless of user reputation in the null model, suggesting that the user interest tends to be multiple and the diversity of the user interests is centralized for users with high reputation. Furthermore, we divide users into seven groups according to the user degree and investigate the heterogeneity of rating behavior patterns. The results show that the relation of user reputation and clustering coefficient is obvious for small degree users and weak for large degree users, reflecting an important connection between user degree and collective rating behavior patterns. This work provides a further understanding on the intrinsic association between user collective behaviors and user reputation.


2012 ◽  
Vol 23 (02) ◽  
pp. 1250012 ◽  
Author(s):  
QIANG GUO ◽  
RUI LENG ◽  
KERUI SHI ◽  
JIAN-GUO LIU

The clustering coefficient of user–object bipartite networks is presented to evaluate the overlap percentage of neighbors rating lists, which could be used to measure interest correlations among neighbor sets. The collaborative filtering (CF) information filtering algorithm evaluates a given user's interests in terms of his/her friends' opinions, which has become one of the most successful technologies for recommender systems. In this paper, different from the object clustering coefficient, users' clustering coefficients of user–object bipartite networks are introduced to improve the user similarity measurement. Numerical results for MovieLens and Netflix data sets show that users' clustering effects could enhance the algorithm performance. For MovieLens data set, the algorithmic accuracy, measured by the average ranking score, can be improved by 12.0% and the diversity could be improved by 18.2% and reach 0.649 when the recommendation list equals to 50. For Netflix data set, the accuracy could be improved by 14.5% at the optimal case and the popularity could be reduced by 13.4% comparing with the standard CF algorithm. Finally, we investigate the sparsity effect on the performance. This work indicates the user clustering coefficients is an effective factor to measure the user similarity, meanwhile statistical properties of user–object bipartite networks should be investigated to estimate users' tastes.


2021 ◽  
Vol 14 (6) ◽  
pp. 984-996
Author(s):  
Yixing Yang ◽  
Yixiang Fang ◽  
Maria E. Orlowska ◽  
Wenjie Zhang ◽  
Xuemin Lin

A bipartite network is a network with two disjoint vertex sets and its edges only exist between vertices from different sets. It has received much interest since it can be used to model the relationship between two different sets of objects in many applications (e.g., the relationship between users and items in E-commerce). In this paper, we study the problem of efficient bi-triangle counting for a large bipartite network, where a bi-triangle is a cycle with three vertices from one vertex set and three vertices from another vertex set. Counting bi-triangles has found many real applications such as computing the transitivity coefficient and clustering coefficient for bipartite networks. To enable efficient bi-triangle counting, we first develop a baseline algorithm relying on the observation that each bi-triangle can be considered as the join of three wedges. Then, we propose a more sophisticated algorithm which regards a bi-triangle as the join of two super-wedges, where a wedge is a path with two edges while a super-wedge is a path with three edges. We further optimize the algorithm by ranking vertices according to their degrees. We have performed extensive experiments on both real and synthetic bipartite networks, where the largest one contains more than one billion edges, and the results show that the proposed solutions are up to five orders of magnitude faster than the baseline method.


2010 ◽  
Vol 21 (07) ◽  
pp. 891-901 ◽  
Author(s):  
QIANG GUO ◽  
JIAN-GUO LIU

In this paper, the statistical property of the bipartite network, namely clustering coefficient C4 is taken into account and be embedded into the collaborative filtering (CF) algorithm to improve the algorithmic accuracy and diversity. In the improved CF algorithm, the user similarity is defined by the mass diffusion process, and we argue that the object clustering C4 of the bipartite network should be considered to improve the user similarity measurement. The statistical result shows that the clustering coefficient of the MovieLens data approximately has Poisson distribution. By considering the clustering effects of object nodes, the numerical simulation on a benchmark data set shows that the accuracy of the improved algorithm, measured by the average ranking score and precision, could be improved 15.3 and 13.0%, respectively, in the optimal case. In addition, numerical results show that the improved algorithm can provide more diverse recommendation results, for example, when the recommendation list contains 20 objects, the diversity, measured by the hamming distance, is improved by 28.7%. Since all of the real recommendation data are evolving with time, this work may shed some light on the adaptive recommendation algorithm according to the statistical properties of the user-object bipartite network.


1994 ◽  
Vol 144 ◽  
pp. 139-141 ◽  
Author(s):  
J. Rybák ◽  
V. Rušin ◽  
M. Rybanský

AbstractFe XIV 530.3 nm coronal emission line observations have been used for the estimation of the green solar corona rotation. A homogeneous data set, created from measurements of the world-wide coronagraphic network, has been examined with a help of correlation analysis to reveal the averaged synodic rotation period as a function of latitude and time over the epoch from 1947 to 1991.The values of the synodic rotation period obtained for this epoch for the whole range of latitudes and a latitude band ±30° are 27.52±0.12 days and 26.95±0.21 days, resp. A differential rotation of green solar corona, with local period maxima around ±60° and minimum of the rotation period at the equator, was confirmed. No clear cyclic variation of the rotation has been found for examinated epoch but some monotonic trends for some time intervals are presented.A detailed investigation of the original data and their correlation functions has shown that an existence of sufficiently reliable tracers is not evident for the whole set of examinated data. This should be taken into account in future more precise estimations of the green corona rotation period.


Author(s):  
Mark Newman

A discussion of the most fundamental of network models, the configuration model, which is a random graph model of a network with a specified degree sequence. Following a definition of the model a number of basic properties are derived, including the probability of an edge, the expected number of multiedges, the excess degree distribution, the friendship paradox, and the clustering coefficient. This is followed by derivations of some more advanced properties including the condition for the existence of a giant component, the size of the giant component, the average size of a small component, and the expected diameter. Generating function methods for network models are also introduced and used to perform some more advanced calculations, such as the calculation of the distribution of the number of second neighbors of a node and the complete distribution of sizes of small components. The chapter ends with a brief discussion of extensions of the configuration model to directed networks, bipartite networks, networks with degree correlations, networks with high clustering, and networks with community structure, among other possibilities.


2008 ◽  
Vol 20 (5) ◽  
pp. 1211-1238 ◽  
Author(s):  
Gaby Schneider

Oscillatory correlograms are widely used to study neuronal activity that shows a joint periodic rhythm. In most cases, the statistical analysis of cross-correlation histograms (CCH) features is based on the null model of independent processes, and the resulting conclusions about the underlying processes remain qualitative. Therefore, we propose a spike train model for synchronous oscillatory firing activity that directly links characteristics of the CCH to parameters of the underlying processes. The model focuses particularly on asymmetric central peaks, which differ in slope and width on the two sides. Asymmetric peaks can be associated with phase offsets in the (sub-) millisecond range. These spatiotemporal firing patterns can be highly consistent across units yet invisible in the underlying processes. The proposed model includes a single temporal parameter that accounts for this peak asymmetry. The model provides approaches for the analysis of oscillatory correlograms, taking into account dependencies and nonstationarities in the underlying processes. In particular, the auto- and the cross-correlogram can be investigated in a joint analysis because they depend on the same spike train parameters. Particular temporal interactions such as the degree to which different units synchronize in a common oscillatory rhythm can also be investigated. The analysis is demonstrated by application to a simulated data set.


Author(s):  
Joshua Auld ◽  
Abolfazl (Kouros) Mohammadian ◽  
Marcelo Simas Oliveira ◽  
Jean Wolf ◽  
William Bachman

Research was undertaken to determine whether demographic characteristics of individual travelers could be derived from travel pattern information when no information about the individual was available. This question is relevant in the context of anonymously collected travel information, such as cell phone traces, when used for travel demand modeling. Determining the demographics of a traveler from such data could partially obviate the need for large-scale collection of travel survey data, depending on the purpose for which the data were to be used. This research complements methodologies used to identify activity stops, purposes, and mode types from raw trace data and presumes that such methods exist and are available. The paper documents the development of procedures for taking raw activity streams estimated from GPS trace data and converting these into activity travel pattern characteristics that are then combined with basic land use information and used to estimate various models of demographic characteristics. The work status, education level, age, and license possession of individuals and the presence of children in their households were all estimated successfully with substantial increases in performance versus null model expectations for both training and test data sets. The gender, household size, and number of vehicles proved more difficult to estimate, and performance was lower on the test data set; these aspects indicate overfitting in these models. Overall, the demographic models appear to have potential for characterizing anonymous data streams, which could extend the usability and applicability of such data sources to the travel demand context.


Author(s):  
J. Schachtschneider ◽  
C. Brenner

Abstract. The development of automated and autonomous vehicles requires highly accurate long-term maps of the environment. Urban areas contain a large number of dynamic objects which change over time. Since a permanent observation of the environment is impossible and there will always be a first time visit of an unknown or changed area, a map of an urban environment needs to model such dynamics.In this work, we use LiDAR point clouds from a large long term measurement campaign to investigate temporal changes. The data set was recorded along a 20 km route in Hannover, Germany with a Mobile Mapping System over a period of one year in bi-weekly measurements. The data set covers a variety of different urban objects and areas, weather conditions and seasons. Based on this data set, we show how scene and seasonal effects influence the measurement likelihood, and that multi-temporal maps lead to the best positioning results.


2018 ◽  
Author(s):  
Christelle Fraïsse ◽  
John J. Welch

AbstractFitness interactions between mutations can influence a population’s evolution in many different ways. While epistatic effects are difficult to measure precisely, important information about the overall distribution is captured by the mean and variance of log fitnesses for individuals carrying different numbers of mutations. We derive predictions for these quantities from simple fitness landscapes, based on models of optimizing selection on quantitative traits. We also explore extensions to the models, including modular pleiotropy, variable effects sizes, mutational bias, and maladaptation of the wild-type. We illustrate our approach by reanalysing a large data set of mutant effects in a yeast snoRNA. Though characterized by some strong epistatic interactions, these data give a good overall fit to the non-epistatic null model, suggesting that epistasis might have little effect on the evolutionary dynamics in this system. We also show how the amount of epistasis depends on both the underlying fitness landscape, and the distribution of mutations, and so it is expected to vary in consistent ways between new mutations, standing variation, and fixed mutations.


Sign in / Sign up

Export Citation Format

Share Document