An Analysis of Design Process and Performance in Distributed Data Science Teams

Purpose – Often, it is assumed that teams are better at solving problems than individuals working independently. However, recent work in engineering, design, and psychology contradicts this assumption. This work examines the behavior of teams engaged in data science competitions. Crowdsourced competitions have seen increased used for software development and data science, and platforms often encourage teamwork between participants.Design/methodology/approach – We specifically examine teams participating in data science competitions hosted by Kaggle. We analyze data provided by Kaggle to compare the effect of team size and interaction frequency on team performance. We also contextualize these results through a semantic analysis.Findings – This work demonstrates that groups of individuals working independently may outperform interacting teams on average, but that small, interacting teams are more likely to win competitions. The semantic analysis revealed differences in forum participation, verb usage, and pronoun usage when comparing top- and bottom-performing teams.Research limitations/implications- These results reveal a perplexing tension that must be explored further: true teams may experience better performance with higher cohesion, but nominal teams may perform even better on average with essentially no cohesion. A limitation of this research includes not factoring in team member experience level and reliance on extant data.Originality/Value – These results are potentially of use to designers of crowdsourced data science competitions as well as managers and contributors to distributed software development projects.

Download Full-text

An analysis of design process and performance in distributed data science teams

Team Performance Management ◽

10.1108/tpm-03-2019-0024 ◽

2019 ◽

Vol 25 (7/8) ◽

pp. 419-439

Author(s):

Torsten Maier ◽

Joanna DeFranco ◽

Christopher Mccomb

Keyword(s):

Software Development ◽

Data Science ◽

Team Member ◽

Semantic Analysis ◽

Distributed Data ◽

Team Size ◽

Distributed Software ◽

Content Type ◽

Crowdsourced Data ◽

And Performance

PurposeOften, it is assumed that teams are better at solving problems than individuals working independently. However, recent work in engineering, design and psychology contradicts this assumption. This study aims to examine the behavior of teams engaged in data science competitions. Crowdsourced competitions have seen increased use for software development and data science, and platforms often encourage teamwork between participants.Design/methodology/approachWe specifically examine the teams participating in data science competitions hosted by Kaggle. We analyze the data provided by Kaggle to compare the effect of team size and interaction frequency on team performance. We also contextualize these results through a semantic analysis.FindingsThis work demonstrates that groups of individuals working independently may outperform interacting teams on average, but that small, interacting teams are more likely to win competitions. The semantic analysis revealed differences in forum participation, verb usage and pronoun usage when comparing top- and bottom-performing teams.Research limitations/implicationsThese results reveal a perplexing tension that must be explored further: true teams may experience better performance with higher cohesion, but nominal teams may perform even better on average with essentially no cohesion. Limitations of this research include not factoring in team member experience level and reliance on extant data.Originality/valueThese results are potentially of use to designers of crowdsourced data science competitions as well as managers and contributors to distributed software development projects.

Download Full-text

Experiences from Measuring Learning and Performance in Large-Scale Distributed Software Development

Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement - ESEM '16 ◽

10.1145/2961111.2962636 ◽

2016 ◽

Cited By ~ 4

Author(s):

Ricardo Britto ◽

Darja Šmite ◽

Lars-Ola Damm

Keyword(s):

Software Development ◽

Large Scale ◽

Distributed Software Development ◽

Distributed Software ◽

And Performance ◽

Learning And Performance

Download Full-text

Success Factors of Distributed Software Development Projects in Israel

7th Conference on the Engineering of Computer Based Systems ◽

10.1145/3459960.3459976 ◽

2021 ◽

Author(s):

Meir Hahami ◽

David Raz

Keyword(s):

Software Development ◽

Success Factors ◽

Development Projects ◽

Distributed Software Development ◽

Distributed Software

Download Full-text

Selected implications of a hyper-connected world on pavement engineering

International Journal of Pavement Research and Technology ◽

10.1007/s42947-020-6012-7 ◽

2020 ◽

Vol 13 (6) ◽

pp. 673-678

Author(s):

Wynand Jacobus van der Merwe Steyn

Keyword(s):

Data Science ◽

Industrial Revolution ◽

Cost Benefit ◽

Physical World ◽

Management Systems ◽

Life Data ◽

Pavement Engineering ◽

Potential Benefits ◽

And Performance ◽

Lifecycle Management

AbstractThe world is becoming a hyper-connected environment where an abundance of data from sensor networks can provide continuous information on the behaviour and performance of infrastructure. The last part of the 3rd Industrial Revolution (IR) and the start of the 4th IR gave rise to a world where this overabundance of sensors, and availability of wireless networks enables connections between people and infrastructure that was not practically comprehensible during the 20th century. 4IR supports the datafication of life, data science, big data, transportation evolution, optimization of logistic and supply chains and automation of various aspects of life, including vehicles and road infrastructure. The hyper-connected 4IR environment allows integration between the physical world and digital and intelligent engineering, increasingly serving as the primary lifecycle management systems for engineering practitioners. With this background, the paper evaluates a few concepts of the hyper-connected pavement environment in a 4IR Digital Twin mode, with the emphasis on selected applications, implications, benefits and limitations. The hyper-connected world can and should be managed in the pavement realm to ensure that adequate and applicable data are collected regarding infrastructure, environment and users to enable a more efficient and effective transportation system. In this regard, and planning for future scenarios where the proliferation of data is a given, it is important that pavement engineers understand what is possible, evaluate the potential benefits, conduct cost/benefit evaluations, and implement appropriate solutions to ensure longevity and safety of pavement infrastructure.

Download Full-text

Collaborative teaching of globally distributed software development

Proceeding of the 33rd international conference on Software engineering - ICSE '11 ◽

10.1145/1985793.1986049 ◽

2011 ◽

Author(s):

Stuart Faulk ◽

Michal Young ◽

David Weiss ◽

Lian Yu

Keyword(s):

Software Development ◽

Collaborative Teaching ◽

Distributed Software Development ◽

Distributed Software ◽

Globally Distributed

Download Full-text

Distributed software development in an offshore outsourcing project: A case study of source code evolution and quality

Information and Software Technology ◽

10.1016/j.infsof.2015.12.005 ◽

2016 ◽

Vol 72 ◽

pp. 125-136 ◽

Cited By ~ 6

Author(s):

Ronald Jabangwe ◽

Darja Šmite ◽

Emil Hessbo

Keyword(s):

Software Development ◽

Source Code ◽

Offshore Outsourcing ◽

Distributed Software Development ◽

Distributed Software ◽

Code Evolution

Download Full-text

Research on MES Domain Analysis Driven by Ontology

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.20-23.1084 ◽

2010 ◽

Vol 20-23 ◽

pp. 1084-1090 ◽

Cited By ~ 1

Author(s):

Wen Long

Keyword(s):

Control System ◽

Software Development ◽

Method Development ◽

Domain Analysis ◽

Manufacturing Execution System ◽

Development Method ◽

And Performance ◽

And Control ◽

Plan Management ◽

Manufacturing Execution

Manufacturing Execution System (MES) links plan management and workshop control in an enterprise, which is an integrative management and control system of workshop production oriented to manufacturing process. To overcome the difficulties of traditional software development method, development of MES based on component is adopted to prompt development efficiency and performance of MES, which can be more reconstructing, reuse, expansion and integration, and MES domain analysis driven by ontology is investigated in detail. MES domain analysis driven by ontology is feasible and efficient through developing a pharmaceutics MES which applied in a pharmaceutics manufacturing factory.

Download Full-text

Using Data Analytics for Collaboration Patterns in Distributed Software Team Simulations: The Role of Dashboards in Visualizing Global Software Development Patterns

2016 IEEE 11th International Conference on Global Software Engineering Workshops (ICGSEW) ◽

10.1109/icgsew.2016.15 ◽

2016 ◽

Cited By ~ 1

Author(s):

Georgios A. Dafoulas ◽

Fatma C. Serce ◽

Kathleen Swigger ◽

Robert Brazile ◽

Ferda N. Alpaslan ◽

...

Keyword(s):

Software Development ◽

Data Analytics ◽

Global Software Development ◽

Distributed Software ◽

Development Patterns ◽

Collaboration Patterns ◽

Using Data ◽

Software Team

Download Full-text

Optimizing the Self-Organizing Team Size Using a Genetic Algorithm in Agile Practices

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0085 ◽

2018 ◽

Vol 29 (1) ◽

pp. 1151-1165

Author(s):

Wael Almadhoun ◽

Mohammad Hamdan

Keyword(s):

Genetic Algorithm ◽

Software Development ◽

Objective Function ◽

The Self ◽

Performance Characteristics ◽

Team Size ◽

Agile Practices ◽

Team Characteristics ◽

Agile Software ◽

Self Organizing

Abstract In agile software processes, the issue of team size is an important one. In this work we look at how to find the optimal, or near optimal, self-organizing team size using a genetic algorithm (GA) which considers team communication efforts. Communication, authority, roles, and learning are the team’s performance characteristics. The GA has been developed according to performance characteristics. A survey was used to evaluate the communication weight factors, which were qualitatively assessed and used in the algorithm’s objective function. The GA experiments were performed in different stages: each stage results were tested and compared with the previous results. The results show that self-organizing teams of sizes ranged from five to nine members scored more. The model can be improved by adding other team characteristics, i.e. software development efforts and costs.

Download Full-text

A review of the agile and geographically distributed software development

Information Technology and Computer Application Engineering ◽

10.1201/b15936-44 ◽

2013 ◽

pp. 191-194

Keyword(s):

Software Development ◽

Distributed Software Development ◽

Distributed Software ◽

Geographically Distributed ◽

Geographically Distributed Software Development

Download Full-text