scholarly journals GrimoireLab: A toolset for software development analytics

2021 ◽  
Vol 7 ◽  
pp. e601
Author(s):  
Santiago Dueñas ◽  
Valerio Cosentino ◽  
Jesus M. Gonzalez-Barahona ◽  
Alvaro del Castillo San Felix ◽  
Daniel Izquierdo-Cortazar ◽  
...  

Background After many years of research on software repositories, the knowledge for building mature, reusable tools that perform data retrieval, storage and basic analytics is readily available. However, there is still room to improvement in the area of reusable tools implementing this knowledge. Goal To produce a reusable toolset supporting the most common tasks when retrieving, curating and visualizing data from software repositories, allowing for the easy reproduction of data sets ready for more complex analytics, and sparing the researcher or the analyst of most of the tasks that can be automated. Method Use our experience in building tools in this domain to identify a collection of scenarios where a reusable toolset would be convenient, and the main components of such a toolset. Then build those components, and refine them incrementally using the feedback from their use in both commercial, community-based, and academic environments. Results GrimoireLab, an efficient toolset composed of five main components, supporting about 30 different kinds of data sources related to software development. It has been tested in many environments, for performing different kinds of studies, and providing different kinds of services. It features a common API for accessing the retrieved data, facilities for relating items from different data sources, semi-structured storage for easing later analysis and reproduction, and basic facilities for visualization, preliminary analysis and drill-down in the data. It is also modular, making it easy to support new kinds of data sources and analysis. Conclusions We present a mature toolset, widely tested in the field, that can help to improve the situation in the area of reusable tools for mining software repositories. We show some scenarios where it has already been used. We expect it will help to reduce the effort for doing studies or providing services in this area, leading to advances in reproducibility and comparison of results.

2019 ◽  
Vol 20 (1) ◽  
pp. 106
Author(s):  
Yuva Naelana ◽  
S. Bekti Istiyanto

The Community Based Total Sanitation Program (STBM) is a program launched by the Ministry of Health of the Republic of Indonesia. One of the pillars of the STBM, Open Defecation Free (ODF), is one of the homeworks of the local government. In contrast to other districts, in Tegal Regency the implementation of this program was regulated directly in the Regent's Regulation on the Regional Program for Community Empowerment. The purpose of this study is to explore further how PDPM will be implemented in an effort to realize Tegal Open Defecation Free District in 2019. The method used in the preparation of this study is descriptive qualitative. The author uses two data sources namely primary and secondary through in-depth interviews with three informants and documentation. The results show that so far the Jambanisasi PDPM has been considered successful in building public awareness of the importance of healthy sanitation. The implementation of ODF through the three main components of STBM and triggering techniques to meet the three expectations, namely right target, quality and benefits. PDPM Jambanisasi has succeeded in empowering communities in the health and economic fields through the community of sanitation entrepreneurs.


2016 ◽  
Vol 2016 ◽  
pp. 1-11 ◽  
Author(s):  
Xiaobing Sun ◽  
Bin Li ◽  
Yucong Duan ◽  
Wei Shi ◽  
Xiangyue Liu

There are a large number of open source projects in software repositories for developers to reuse. During software development and maintenance, developers can leverage good interfaces in these open source projects and establish the framework of the new project quickly when reusing interfaces in these open source projects. However, if developers want to reuse them, they need to read a lot of code files and learn which interfaces can be reused. To help developers better take advantage of the available interfaces used in software repositories, we previously proposed an approach to automatically recommend interfaces by mining existing open source projects in the software repositories. We mainly used the LDA (Latent Dirichlet Allocation) topic model to construct the Feature-Interface Graph for each software project and recommended the interfaces based on the Feature-Interface Graph. In this paper, we improve our previous approach by clustering the recommending interfaces on the Feature-Interface Graph, which can recommend more accurate interfaces for developers to reuse. We evaluate the effectiveness of the improved approach and the results show that the improved approach can be more efficient to recommend more accurate interfaces for reuse over our previous work.


Author(s):  
Sabrina T. Wong ◽  
Julia M. Langton ◽  
Alan Katz ◽  
Martin Fortin ◽  
Marshall Godwin ◽  
...  

AbstractAimTo describe the process by which the 12 community-based primary health care (CBPHC) research teams worked together and fostered cross-jurisdictional collaboration, including collection of common indicators with the goal of using the same measures and data sources.BackgroundA pan-Canadian mechanism for common measurement of the impact of primary care innovations across Canada is lacking. The Canadian Institutes for Health Research and its partners funded 12 teams to conduct research and collaborate on development of a set of commonly collected indicators.MethodsA working group representing the 12 teams was established. They undertook an iterative process to consider existing primary care indicators identified from the literature and by stakeholders. Indicators were agreed upon with the intention of addressing three objectives across the 12 teams: (1) describing the impact of improving access to CBPHC; (2) examining the impact of alternative models of chronic disease prevention and management in CBPHC; and (3) describing the structures and context that influence the implementation, delivery, cost, and potential for scale-up of CBPHC innovations.FindingsNineteen common indicators within the core dimensions of primary care were identified: access, comprehensiveness, coordination, effectiveness, and equity. We also agreed to collect data on health care costs and utilization within each team. Data sources include surveys, health administrative data, interviews, focus groups, and case studies. Collaboration across these teams sets the foundation for a unique opportunity for new knowledge generation, over and above any knowledge developed by any one team. Keys to success are each team’s willingness to engage and commitment to working across teams, funding to support this collaboration, and distributed leadership across the working group. Reaching consensus on collection of common indicators is challenging but achievable.


2020 ◽  
Author(s):  
James A. Fellows Yates ◽  
Aida Andrades Valtueña ◽  
Ashild J. Vågene ◽  
Becky Cribdon ◽  
Irina M. Velsko ◽  
...  

ABSTRACTAncient DNA and RNA are valuable data sources for a wide range of disciplines. Within the field of ancient metagenomics, the number of published genetic datasets has risen dramatically in recent years, and tracking this data for reuse is particularly important for large-scale ecological and evolutionary studies of individual microbial taxa, microbial communities, and metagenomic assemblages. AncientMetagenomeDir (archived at https://doi.org/10.5281/zenodo.3980833) is a collection of indices of published genetic data deriving from ancient microbial samples that provides basic, standardised metadata and accession numbers to allow rapid data retrieval from online repositories. These collections are community-curated and span multiple sub-disciplines in order to ensure adequate breadth and consensus in metadata definitions, as well as longevity of the database. Internal guidelines and automated checks to facilitate compatibility with established sequence-read archives and term-ontologies ensure consistency and interoperability for future meta-analyses. This collection will also assist in standardising metadata reporting for future ancient metagenomic studies.


Author(s):  
Heiko Paulheim ◽  
Christian Bizer

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.


2021 ◽  
Author(s):  
Benjamin Moreno-Torres ◽  
Christoph Völker ◽  
Sabine Kruschwitz

<div> <p>Non-destructive testing (NDT) data in civil engineering is regularly used for scientific analysis. However, there is no uniform representation of the data yet. An analysis of distributed data sets across different test objects is therefore too difficult in most cases.</p> <p>To overcome this, we present an approach for an integrated data management of distributed data sets based on Semantic Web technologies. The cornerstone of this approach is an ontology, a semantic knowledge representation of our domain. This NDT-CE ontology is later populated with the data sources. Using the properties and the relationships between concepts that the ontology contains, we make these data sets meaningful also for machines. Furthermore, the ontology can be used as a central interface for database access. Non-domain data sources can be integrated by linking them with the NDT ontology, making them directly available for generic use in terms of digitization. Based on an extensive literature research, we outline the possibilities that result for NDT in civil engineering, such as computer-aided sorting and analysis of measurement data, and the recognition and explanation of correlations.</p> <p>A common knowledge representation and data access allows the scientific exploitation of existing data sources with data-based methods (such as image recognition, measurement uncertainty calculations, factor analysis or material characterization) and simplifies bidirectional knowledge and data transfer between engineers and NDT specialists.</p> </div>


Sign in / Sign up

Export Citation Format

Share Document