GrimoireLab: A toolset for software development analytics

PeerJ Computer Science ◽

10.7717/peerj-cs.601 ◽

2021 ◽

Vol 7 ◽

pp. e601

Author(s):

Santiago Dueñas ◽

Valerio Cosentino ◽

Jesus M. Gonzalez-Barahona ◽

Alvaro del Castillo San Felix ◽

Daniel Izquierdo-Cortazar ◽

...

Keyword(s):

Software Development ◽

Preliminary Analysis ◽

Data Retrieval ◽

Mining Software Repositories ◽

Data Sources ◽

Data Sets ◽

Community Based ◽

Academic Environments ◽

Software Repositories ◽

Main Components

Background After many years of research on software repositories, the knowledge for building mature, reusable tools that perform data retrieval, storage and basic analytics is readily available. However, there is still room to improvement in the area of reusable tools implementing this knowledge. Goal To produce a reusable toolset supporting the most common tasks when retrieving, curating and visualizing data from software repositories, allowing for the easy reproduction of data sets ready for more complex analytics, and sparing the researcher or the analyst of most of the tasks that can be automated. Method Use our experience in building tools in this domain to identify a collection of scenarios where a reusable toolset would be convenient, and the main components of such a toolset. Then build those components, and refine them incrementally using the feedback from their use in both commercial, community-based, and academic environments. Results GrimoireLab, an efficient toolset composed of five main components, supporting about 30 different kinds of data sources related to software development. It has been tested in many environments, for performing different kinds of studies, and providing different kinds of services. It features a common API for accessing the retrieved data, facilities for relating items from different data sources, semi-structured storage for easing later analysis and reproduction, and basic facilities for visualization, preliminary analysis and drill-down in the data. It is also modular, making it easy to support new kinds of data sources and analysis. Conclusions We present a mature toolset, widely tested in the field, that can help to improve the situation in the area of reusable tools for mining software repositories. We show some scenarios where it has already been used. We expect it will help to reduce the effort for doing studies or providing services in this area, leading to advances in reproducibility and comparison of results.

Download Full-text

IMPLEMENTASI PROGRAM DAERAH PEMBERDAYAAN MASYARAKAT (PDPM) DALAM MEWUJUDKAN ”KABUPATEN TEGAL OPEN DEFECATION FREE 2019”

Jurnal Dakwah Tabligh ◽

10.24252/jdt.v20i1.9604 ◽

2019 ◽

Vol 20 (1) ◽

pp. 106

Author(s):

Yuva Naelana ◽

S. Bekti Istiyanto

Keyword(s):

Local Government ◽

Public Awareness ◽

Community Empowerment ◽

Data Sources ◽

Community Based ◽

Open Defecation ◽

Ministry Of Health ◽

The Republic ◽

Main Components ◽

Depth Interviews

The Community Based Total Sanitation Program (STBM) is a program launched by the Ministry of Health of the Republic of Indonesia. One of the pillars of the STBM, Open Defecation Free (ODF), is one of the homeworks of the local government. In contrast to other districts, in Tegal Regency the implementation of this program was regulated directly in the Regent's Regulation on the Regional Program for Community Empowerment. The purpose of this study is to explore further how PDPM will be implemented in an effort to realize Tegal Open Defecation Free District in 2019. The method used in the preparation of this study is descriptive qualitative. The author uses two data sources namely primary and secondary through in-depth interviews with three informants and documentation. The results show that so far the Jambanisasi PDPM has been considered successful in building public awareness of the importance of healthy sanitation. The implementation of ODF through the three main components of STBM and triggering techniques to meet the three expectations, namely right target, quality and benefits. PDPM Jambanisasi has succeeded in empowering communities in the health and economic fields through the community of sanitation entrepreneurs.

Download Full-text

Mining Software Repositories for Automatic Interface Recommendation

Scientific Programming ◽

10.1155/2016/5475964 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 6

Author(s):

Xiaobing Sun ◽

Bin Li ◽

Yucong Duan ◽

Wei Shi ◽

Xiangyue Liu

Keyword(s):

Software Development ◽

Open Source ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Mining Software Repositories ◽

Software Project ◽

Previous Approach ◽

Software Repositories ◽

Dirichlet Allocation

There are a large number of open source projects in software repositories for developers to reuse. During software development and maintenance, developers can leverage good interfaces in these open source projects and establish the framework of the new project quickly when reusing interfaces in these open source projects. However, if developers want to reuse them, they need to read a lot of code files and learn which interfaces can be reused. To help developers better take advantage of the available interfaces used in software repositories, we previously proposed an approach to automatically recommend interfaces by mining existing open source projects in the software repositories. We mainly used the LDA (Latent Dirichlet Allocation) topic model to construct the Feature-Interface Graph for each software project and recommended the interfaces based on the Feature-Interface Graph. In this paper, we improve our previous approach by clustering the recommending interfaces on the Feature-Interface Graph, which can recommend more accurate interfaces for developers to reuse. We evaluate the effectiveness of the improved approach and the results show that the improved approach can be more efficient to recommend more accurate interfaces for reuse over our previous work.

Download Full-text

Using Pig as a data preparation language for large-scale mining software repositories studies: An experience report

Journal of Systems and Software ◽

10.1016/j.jss.2011.07.034 ◽

2012 ◽

Vol 85 (10) ◽

pp. 2195-2204 ◽

Cited By ~ 6

Author(s):

Weiyi Shang ◽

Bram Adams ◽

Ahmed E. Hassan

Keyword(s):

Large Scale ◽

Mining Software Repositories ◽

Experience Report ◽

Data Preparation ◽

Software Repositories

Download Full-text

Preliminary analysis of acceleration of sea level rise through the twentieth century using extended tide gauge data sets (August 2014)

Journal of Geophysical Research Oceans ◽

10.1002/2014jc009976 ◽

2014 ◽

Vol 119 (11) ◽

pp. 7645-7659 ◽

Cited By ~ 23

Author(s):

Peter Hogarth

Keyword(s):

Twentieth Century ◽

Sea Level Rise ◽

Sea Level ◽

Preliminary Analysis ◽

Tide Gauge ◽

Data Sets ◽

Tide Gauge Data ◽

Gauge Data

Download Full-text

Mining Software Repositories for Social Norms

2015 IEEE/ACM 37th IEEE International Conference on Software Engineering ◽

10.1109/icse.2015.209 ◽

2015 ◽

Cited By ~ 3

Author(s):

Hoa Khanh Dam ◽

Bastin Tony Roy Savarimuthu ◽

Daniel Avery ◽

Aditya Ghose

Keyword(s):

Social Norms ◽

Mining Software Repositories ◽

Software Repositories

Download Full-text

Promoting cross-jurisdictional primary health care research: developing a set of common indicators across 12 community-based primary health care teams in Canada

Primary Health Care Research & Development ◽

10.1017/s1463423618000518 ◽

2018 ◽

Vol 20 ◽

Cited By ~ 5

Author(s):

Sabrina T. Wong ◽

Julia M. Langton ◽

Alan Katz ◽

Martin Fortin ◽

Marshall Godwin ◽

...

Keyword(s):

Primary Care ◽

Health Care ◽

Primary Health Care ◽

Working Group ◽

Scale Up ◽

Data Sources ◽

Knowledge Generation ◽

Community Based ◽

Primary Health ◽

The Impact

AbstractAimTo describe the process by which the 12 community-based primary health care (CBPHC) research teams worked together and fostered cross-jurisdictional collaboration, including collection of common indicators with the goal of using the same measures and data sources.BackgroundA pan-Canadian mechanism for common measurement of the impact of primary care innovations across Canada is lacking. The Canadian Institutes for Health Research and its partners funded 12 teams to conduct research and collaborate on development of a set of commonly collected indicators.MethodsA working group representing the 12 teams was established. They undertook an iterative process to consider existing primary care indicators identified from the literature and by stakeholders. Indicators were agreed upon with the intention of addressing three objectives across the 12 teams: (1) describing the impact of improving access to CBPHC; (2) examining the impact of alternative models of chronic disease prevention and management in CBPHC; and (3) describing the structures and context that influence the implementation, delivery, cost, and potential for scale-up of CBPHC innovations.FindingsNineteen common indicators within the core dimensions of primary care were identified: access, comprehensiveness, coordination, effectiveness, and equity. We also agreed to collect data on health care costs and utilization within each team. Data sources include surveys, health administrative data, interviews, focus groups, and case studies. Collaboration across these teams sets the foundation for a unique opportunity for new knowledge generation, over and above any knowledge developed by any one team. Keys to success are each team’s willingness to engage and commitment to working across teams, funding to support this collaboration, and distributed leadership across the working group. Reaching consensus on collection of common indicators is challenging but achievable.

Download Full-text

Community-curated and standardised metadata of published ancient metagenomic samples with AncientMetagenomeDir

10.1101/2020.09.02.279570 ◽

2020 ◽

Author(s):

James A. Fellows Yates ◽

Aida Andrades Valtueña ◽

Ashild J. Vågene ◽

Becky Cribdon ◽

Irina M. Velsko ◽

...

Keyword(s):

Large Scale ◽

Genetic Data ◽

Data Retrieval ◽

Data Sources ◽

Valuable Data ◽

Dna And Rna ◽

Wide Range ◽

Evolutionary Studies ◽

Meta Analyses ◽

Microbial Samples

ABSTRACTAncient DNA and RNA are valuable data sources for a wide range of disciplines. Within the field of ancient metagenomics, the number of published genetic datasets has risen dramatically in recent years, and tracking this data for reuse is particularly important for large-scale ecological and evolutionary studies of individual microbial taxa, microbial communities, and metagenomic assemblages. AncientMetagenomeDir (archived at https://doi.org/10.5281/zenodo.3980833) is a collection of indices of published genetic data deriving from ancient microbial samples that provides basic, standardised metadata and accession numbers to allow rapid data retrieval from online repositories. These collections are community-curated and span multiple sub-disciplines in order to ensure adequate breadth and consensus in metadata definitions, as well as longevity of the database. Internal guidelines and automated checks to facilitate compatibility with established sequence-read archives and term-ontologies ensure consistency and interoperability for future meta-analyses. This collection will also assist in standardising metadata reporting for future ancient metagenomic studies.

Download Full-text

Improving the Quality of Linked Data Using Statistical Distributions

Information Retrieval and Management ◽

10.4018/978-1-5225-5191-1.ch074 ◽

2018 ◽

pp. 1638-1664 ◽

Cited By ~ 1

Author(s):

Heiko Paulheim ◽

Christian Bizer

Keyword(s):

Knowledge Base ◽

Linked Data ◽

Relational Databases ◽

Knowledge Bases ◽

Structured Data ◽

Data Sources ◽

Data Sets ◽

Statistical Distributions ◽

The Web

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.

Download Full-text

Software Development Effort Estimation in Academic Environments Applying a General Regression Neural Network Involving Size and People Factors

Lecture Notes in Computer Science - Pattern Recognition ◽

10.1007/978-3-642-21587-2_29 ◽

2011 ◽

pp. 269-277 ◽

Cited By ~ 2

Author(s):

Cuauhtémoc López-Martín ◽

Arturo Chavoya ◽

M. E. Meda-Campaña

Keyword(s):

Neural Network ◽

Software Development ◽

General Regression Neural Network ◽

Development Effort ◽

Effort Estimation ◽

Academic Environments ◽

Software Development Effort ◽

General Regression ◽

Software Development Effort Estimation

Download Full-text

An Ontology-based approach to enable data-driven research in the field of NDT in Civil Engineering

10.5194/egusphere-egu21-12125 ◽

2021 ◽

Author(s):

Benjamin Moreno-Torres ◽

Christoph Völker ◽

Sabine Kruschwitz

Keyword(s):

Knowledge Representation ◽

Civil Engineering ◽

Common Knowledge ◽

Measurement Data ◽

Data Access ◽

Semantic Knowledge ◽

Data Sources ◽

Data Sets ◽

Extensive Literature ◽

Distributed Data

<div> <p>Non-destructive testing (NDT) data in civil engineering is regularly used for scientific analysis. However, there is no uniform representation of the data yet. An analysis of distributed data sets across different test objects is therefore too difficult in most cases.</p> <p>To overcome this, we present an approach for an integrated data management of distributed data sets based on Semantic Web technologies. The cornerstone of this approach is an ontology, a semantic knowledge representation of our domain. This NDT-CE ontology is later populated with the data sources. Using the properties and the relationships between concepts that the ontology contains, we make these data sets meaningful also for machines. Furthermore, the ontology can be used as a central interface for database access. Non-domain data sources can be integrated by linking them with the NDT ontology, making them directly available for generic use in terms of digitization. Based on an extensive literature research, we outline the possibilities that result for NDT in civil engineering, such as computer-aided sorting and analysis of measurement data, and the recognition and explanation of correlations.</p> <p>A common knowledge representation and data access allows the scientific exploitation of existing data sources with data-based methods (such as image recognition, measurement uncertainty calculations, factor analysis or material characterization) and simplifies bidirectional knowledge and data transfer between engineers and NDT specialists.</p> </div>

Download Full-text