scholarly journals An Analysis of Design Process and Performance in Distributed Data Science Teams

2019 ◽  
Author(s):  
Christopher McComb ◽  
Joanna DeFranco ◽  
Torsten Maier

Purpose – Often, it is assumed that teams are better at solving problems than individuals working independently. However, recent work in engineering, design, and psychology contradicts this assumption. This work examines the behavior of teams engaged in data science competitions. Crowdsourced competitions have seen increased used for software development and data science, and platforms often encourage teamwork between participants.Design/methodology/approach – We specifically examine teams participating in data science competitions hosted by Kaggle. We analyze data provided by Kaggle to compare the effect of team size and interaction frequency on team performance. We also contextualize these results through a semantic analysis.Findings – This work demonstrates that groups of individuals working independently may outperform interacting teams on average, but that small, interacting teams are more likely to win competitions. The semantic analysis revealed differences in forum participation, verb usage, and pronoun usage when comparing top- and bottom-performing teams.Research limitations/implications- These results reveal a perplexing tension that must be explored further: true teams may experience better performance with higher cohesion, but nominal teams may perform even better on average with essentially no cohesion. A limitation of this research includes not factoring in team member experience level and reliance on extant data.Originality/Value – These results are potentially of use to designers of crowdsourced data science competitions as well as managers and contributors to distributed software development projects.

2019 ◽  
Vol 25 (7/8) ◽  
pp. 419-439
Author(s):  
Torsten Maier ◽  
Joanna DeFranco ◽  
Christopher Mccomb

PurposeOften, it is assumed that teams are better at solving problems than individuals working independently. However, recent work in engineering, design and psychology contradicts this assumption. This study aims to examine the behavior of teams engaged in data science competitions. Crowdsourced competitions have seen increased use for software development and data science, and platforms often encourage teamwork between participants.Design/methodology/approachWe specifically examine the teams participating in data science competitions hosted by Kaggle. We analyze the data provided by Kaggle to compare the effect of team size and interaction frequency on team performance. We also contextualize these results through a semantic analysis.FindingsThis work demonstrates that groups of individuals working independently may outperform interacting teams on average, but that small, interacting teams are more likely to win competitions. The semantic analysis revealed differences in forum participation, verb usage and pronoun usage when comparing top- and bottom-performing teams.Research limitations/implicationsThese results reveal a perplexing tension that must be explored further: true teams may experience better performance with higher cohesion, but nominal teams may perform even better on average with essentially no cohesion. Limitations of this research include not factoring in team member experience level and reliance on extant data.Originality/valueThese results are potentially of use to designers of crowdsourced data science competitions as well as managers and contributors to distributed software development projects.


2020 ◽  
Vol 13 (6) ◽  
pp. 673-678
Author(s):  
Wynand Jacobus van der Merwe Steyn

AbstractThe world is becoming a hyper-connected environment where an abundance of data from sensor networks can provide continuous information on the behaviour and performance of infrastructure. The last part of the 3rd Industrial Revolution (IR) and the start of the 4th IR gave rise to a world where this overabundance of sensors, and availability of wireless networks enables connections between people and infrastructure that was not practically comprehensible during the 20th century. 4IR supports the datafication of life, data science, big data, transportation evolution, optimization of logistic and supply chains and automation of various aspects of life, including vehicles and road infrastructure. The hyper-connected 4IR environment allows integration between the physical world and digital and intelligent engineering, increasingly serving as the primary lifecycle management systems for engineering practitioners. With this background, the paper evaluates a few concepts of the hyper-connected pavement environment in a 4IR Digital Twin mode, with the emphasis on selected applications, implications, benefits and limitations. The hyper-connected world can and should be managed in the pavement realm to ensure that adequate and applicable data are collected regarding infrastructure, environment and users to enable a more efficient and effective transportation system. In this regard, and planning for future scenarios where the proliferation of data is a given, it is important that pavement engineers understand what is possible, evaluate the potential benefits, conduct cost/benefit evaluations, and implement appropriate solutions to ensure longevity and safety of pavement infrastructure.


2010 ◽  
Vol 20-23 ◽  
pp. 1084-1090 ◽  
Author(s):  
Wen Long

Manufacturing Execution System (MES) links plan management and workshop control in an enterprise, which is an integrative management and control system of workshop production oriented to manufacturing process. To overcome the difficulties of traditional software development method, development of MES based on component is adopted to prompt development efficiency and performance of MES, which can be more reconstructing, reuse, expansion and integration, and MES domain analysis driven by ontology is investigated in detail. MES domain analysis driven by ontology is feasible and efficient through developing a pharmaceutics MES which applied in a pharmaceutics manufacturing factory.


2018 ◽  
Vol 29 (1) ◽  
pp. 1151-1165
Author(s):  
Wael Almadhoun ◽  
Mohammad Hamdan

Abstract In agile software processes, the issue of team size is an important one. In this work we look at how to find the optimal, or near optimal, self-organizing team size using a genetic algorithm (GA) which considers team communication efforts. Communication, authority, roles, and learning are the team’s performance characteristics. The GA has been developed according to performance characteristics. A survey was used to evaluate the communication weight factors, which were qualitatively assessed and used in the algorithm’s objective function. The GA experiments were performed in different stages: each stage results were tested and compared with the previous results. The results show that self-organizing teams of sizes ranged from five to nine members scored more. The model can be improved by adding other team characteristics, i.e. software development efforts and costs.


Sign in / Sign up

Export Citation Format

Share Document