A COMPLEX NETWORKS PERSPECTIVE ON COLLABORATIVE SOFTWARE ENGINEERING

Large collaborative software engineering projects are interesting examples for evolving complex systems. The complexity of these systems unfolds both in evolving software structures, as well as in the social dynamics and organization of development teams. Due to the adoption of Open Source practices and the increasing use of online support infrastructures, large-scale data sets covering both the social and technical dimension of collaborative software engineering processes are increasingly becoming available. In the analysis of these data, a growing number of studies employ a network perspective, using methods and abstractions from network science to generate insights about software engineering processes. Featuring a collection of inspiring works in this area, with this topical issue, we intend to give an overview of state-of-the-art research. We hope that this collection of articles will stimulate downstream applications of network-based data mining techniques in empirical software engineering.

Download Full-text

Citations driven by social connections? A multi-layer representation of coauthorship networks

Quantitative Science Studies ◽

10.1162/qss_a_00092 ◽

2020 ◽

Vol 1 (4) ◽

pp. 1493-1509

Author(s):

Christian Zingg ◽

Vahan Nanumyan ◽

Frank Schweitzer

Keyword(s):

Social Relations ◽

Large Scale ◽

Citation Rate ◽

Data Sets ◽

Large Scale Data ◽

Coauthorship Networks ◽

The Social ◽

Scientific Attention ◽

Physics Journals ◽

Scale Data

To what extent is the citation rate of new papers influenced by the past social relations of their authors? To answer this question, we present a data-driven analysis of nine different physics journals. Our analysis is based on a two-layer network representation constructed from two large-scale data sets, INSPIREHEP and APS. The social layer contains authors as nodes and coauthorship relations as links. This allows us to quantify the social relations of each author, prior to the publication of a new paper. The publication layer contains papers as nodes and citations between papers as links. This layer allows us to quantify scientific attention as measured by the change of the citation rate over time. We particularly study how this change correlates with the social relations of their authors, prior to publication. We find that on average the maximum value of the citation rate is reached sooner for authors who have either published more papers or who have had more coauthors in previous papers. We also find that for these authors the decay in the citation rate is faster, meaning that their papers are forgotten sooner.

Download Full-text

Faculty Opinions recommendation of Comparative assessment of large-scale data sets of protein-protein interactions.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1006598.82257 ◽

2002 ◽

Author(s):

Rob Russell

Keyword(s):

Protein Interactions ◽

Large Scale ◽

Comparative Assessment ◽

Data Sets ◽

Protein Protein Interactions ◽

Large Scale Data ◽

Scale Data ◽

Large Scale Data Sets

Download Full-text

Microcomputers in Political Science

News for Teachers of Political Science ◽

10.1017/s0197901900005079 ◽

1983 ◽

Vol 38 ◽

pp. 1-9

Author(s):

Herbert F. Weisberg

Keyword(s):

Data Analysis ◽

Political Science ◽

Large Scale ◽

Turnaround Time ◽

General Purpose ◽

Batch Mode ◽

New Era ◽

Large Scale Data ◽

The Social ◽

Frequency Counts

We are now entering a new era of computing in political science. The first era was marked by punched-card technology. Initially, the most sophisticated analyses possible were frequency counts and tables produced on a counter-sorter, a machine that specialized in chewing up data cards. By the early 1960s, batch processing on large mainframe computers became the predominant mode of data analysis, with turnaround time of up to a week. By the late 1960s, turnaround time was cut down to a matter of a few minutes and OSIRIS and then SPSS (and more recently SAS) were developed as general-purpose data analysis packages for the social sciences. Even today, use of these packages in batch mode remains one of the most efficient means of processing large-scale data analysis.

Download Full-text

The social dynamics of art research: Contemporary photography in Belfast post the Good Friday Agreement

Arts and Humanities in Higher Education ◽

10.1177/1474022213514550 ◽

2014 ◽

Vol 13 (3) ◽

pp. 312-317

Author(s):

Sarah Tuck

Keyword(s):

Social Dynamics ◽

Good Friday ◽

The Social ◽

Good Friday Agreement ◽

Art Research

Download Full-text

Self-Adaptive K-Means Based on a Covering Algorithm

Complexity ◽

10.1155/2018/7698274 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Yiwen Zhang ◽

Yuanyuan Zhou ◽

Xing Guo ◽

Jintao Wu ◽

Qiang He ◽

...

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Real Data ◽

Second Phase ◽

Data Sets ◽

Number Of Clusters ◽

Large Scale Data ◽

Long Time ◽

Two Phases ◽

Selection Of

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.

Download Full-text

Pattern Recognition in Large-Scale Data Sets: Application in Integrated Circuit Manufacturing

Big Data Analytics - Lecture Notes in Computer Science ◽

10.1007/978-3-319-03689-2_13 ◽

2013 ◽

pp. 185-196 ◽

Cited By ~ 1

Author(s):

Choudur K. Lakshminarayan ◽

Michael I. Baron

Keyword(s):

Pattern Recognition ◽

Integrated Circuit ◽

Large Scale ◽

Data Sets ◽

Integrated Circuit Manufacturing ◽

Large Scale Data ◽

Scale Data ◽

Large Scale Data Sets

Download Full-text

Discovering Latent Class Labels for Multi-Label Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/423 ◽

2020 ◽

Author(s):

Jun Huang ◽

Linchuan Xu ◽

Jing Wang ◽

Lei Feng ◽

Kenji Yamanishi

Keyword(s):

Large Scale ◽

Latent Class ◽

Training Data ◽

Data Sets ◽

Robust Learning ◽

Large Scale Data ◽

Novel Approach ◽

Fixed Set ◽

Class Labels ◽

Scale Data

Existing multi-label learning (MLL) approaches mainly assume all the labels are observed and construct classification models with a fixed set of target labels (known labels). However, in some real applications, multiple latent labels may exist outside this set and hide in the data, especially for large-scale data sets. Discovering and exploring the latent labels hidden in the data may not only find interesting knowledge but also help us to build a more robust learning model. In this paper, a novel approach named DLCL (i.e., Discovering Latent Class Labels for MLL) is proposed which can not only discover the latent labels in the training data but also predict new instances with the latent and known labels simultaneously. Extensive experiments show a competitive performance of DLCL against other state-of-the-art MLL approaches.

Download Full-text

BSO-MV: An Optimized Multiview Clustering Approach for Items Recommendation in Social Networks

JUCS - Journal of Universal Computer Science ◽

10.3897/jucs.70341 ◽

2021 ◽

Vol 27 (7) ◽

pp. 667-692

Author(s):

Lamia Berkani ◽

Lylia Betit ◽

Louiza Belarif

Keyword(s):

Social Networks ◽

Large Scale ◽

Data Sets ◽

Large Scale Data ◽

Recommendation Algorithms ◽

Clustering Approach ◽

Real World Datasets ◽

Multiview Clustering ◽

Improving Accuracy

Clustering-based approaches have been demonstrated to be efficient and scalable to large-scale data sets. However, clustering-based recommender systems suffer from relatively low accuracy and coverage. To address these issues, we propose in this article an optimized multiview clustering approach for the recommendation of items in social networks. First, the selection of the initial medoids is optimized using the Bees Swarm optimization algorithm (BSO) in order to generate better partitions (i.e. refining the quality of medoids according to the objective function). Then, the multiview clustering (MV) is applied, where users are iteratively clustered from the views of both rating patterns and social information (i.e. friendships and trust). Finally, a framework is proposed for testing the different alternatives, namely: (1) the standard recommendation algorithms; (2) the clustering-based and the optimized clustering-based recommendation algorithms using BSO; and (3) the MV and the optimized MV (BSO-MV) algorithms. Experimental results conducted on two real-world datasets demonstrate the effectiveness of the proposed BSO-MV algorithm in terms of improving accuracy, as it outperforms the existing related approaches and baselines.

Download Full-text

The collective disorientation of the COVID-19 crisis

Global Discourse ◽

10.1332/204378921x16146158263164 ◽

2021 ◽

Author(s):

Pablo Fernández Velasco ◽

Bastien Perroy ◽

Roberto Casati

Keyword(s):

Social Dynamics ◽

Frames Of Reference ◽

Current Crisis ◽

Spatial Disorientation ◽

Multiple Dimensions ◽

The Social ◽

Sense Of Time ◽

Art Research ◽

Immediate Experience ◽

Economic Thinking

One of the chief features of this global crisis is that we find ourselves in a shifting landscape. The resulting disorientation extends beyond health research and into many domains of our individual and collective lives. We suffer from political disorientation (the need for a radical shift in economic thinking), from social disorientation (the rearrangement of social dynamics based on distancing measures), and from temporal disorientation (the warping of our sense of time during lockdown), to name but a few. This generalised state of disorientation has substantial effects on wellbeing and decision making. In this paper, we review the multiple dimensions of disorientation of the COVID-19 crisis and use state-of-the art research on disorientation to gain insight into the social, psychological and political dynamics of the current pandemic. Just like standard, spatial cases of disorientation, the non-spatial forms of disorientation prevalent in the current crisis consist in the mismatch between our frames of reference and our immediate experience, and they result in anxiety, helplessness and isolation, but also in the possibility of re-orienting. The current crisis provides a unique environment in which to study non-spatial forms of disorientation. In turn, existing knowledge about spatial disorientation can shed light on the shifting landscape of the COVID-19 pandemic. Key messages <ul><li>Growing evidence suggests that the COVID-19 crisis has been disorienting across domains.</li> <li>Disorientation is a metacognitive feeling monitoring both spatial and non-spatial tasks.</li> <li>Temporal disorientation was fostered by the pandemic’s counterintuitive temporality.</li> <li>Disorientation mitigation can facilitate new social and political frames of reference to emerge.</li></ul>

Download Full-text

A Real-Time Log Analyzer Based on MongoDB

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.571-572.497 ◽

2014 ◽

Vol 571-572 ◽

pp. 497-501 ◽

Cited By ~ 3

Author(s):

Qi Lv ◽

Wei Xie

Keyword(s):

Real Time ◽

Large Scale ◽

Performance Comparison ◽

Log Analysis ◽

Data Sets ◽

Time Data ◽

Real Time Analysis ◽

Large Scale Data ◽

Implementation Approach ◽

And Performance

Real-time log analysis on large scale data is important for applications. Specifically, real-time refers to UI latency within 100ms. Therefore, techniques which efficiently support real-time analysis over large log data sets are desired. MongoDB provides well query performance, aggregation frameworks, and distributed architecture which is suitable for real-time data query and massive log analysis. In this paper, a novel implementation approach for an event driven file log analyzer is presented, and performance comparison of query, scan and aggregation operations over MongoDB, HBase and MySQL is analyzed. Our experimental results show that HBase performs best balanced in all operations, while MongoDB provides less than 10ms query speed in some operations which is most suitable for real-time applications.

Download Full-text