Analyzing OSS Project Health with Heterogeneous Data Sources
Stakeholders in Open Source Software (OSS) projects need to determine whether a project is likely to sustain for a sufficient period of time in order to justify their investments into this project. In an OSS project context, there are typically several data sources and OSS processes relevant for determining project health indicators. However, even within one project these data sources often are technically and/or semantically heterogeneous, which makes data collection and analysis tedious and error prone. In this paper, the authors propose and evaluate a framework for OSS data analysis (FOSSDA), which enables the efficient collection, integration, and analysis of data from heterogeneous sources. Major results of the empirical studies are: (a) the framework is useful for integrating data from heterogeneous data sources effectively and (b) project health indicators based on integrated data analyses were found to be more accurate than analyses based on individual non-integrated data sources.