scholarly journals Boa: a link between worlds

Author(s):  
Stephen Romansky ◽  
Sadegh Charmchi ◽  
Abram Hindle

The business models of software/platform as a service have contributed to developers dependence on the Internet. Developers can rapidly point each other and consumers to the newest software changes with the power of the hyper link. But, developers are not limited to referencing software changes to one another through the web. Other shared hypermedia might include links to: Stack Overflow, Twitter, and issue trackers. This work explores the software traceability of Uniform Resource Locators (URLs) which software developers leave in commit messages and software repositories. URLs are easily extracted from commit messages and source code. Therefore, it would be useful to researchers if URLs provide additional insight on project development. To assess traceability, manual topic labelling is evaluated against automated topic labelling on URL data sets. This work also shows differences between URL data collected from commit messages versus URL data collected from source code. As well, this work explores outlying software projects with many URLs in case these projects do not provide meaningful software relationship information. Results from manual topic labelling show promise under evaluation while automated topic labelling did not yield precise topics. Further investigation of manual and automated topic analysis would be useful.

2016 ◽  
Author(s):  
Stephen Romansky ◽  
Sadegh Charmchi ◽  
Abram Hindle

The business models of software/platform as a service have contributed to developers dependence on the Internet. Developers can rapidly point each other and consumers to the newest software changes with the power of the hyper link. But, developers are not limited to referencing software changes to one another through the web. Other shared hypermedia might include links to: Stack Overflow, Twitter, and issue trackers. This work explores the software traceability of Uniform Resource Locators (URLs) which software developers leave in commit messages and software repositories. URLs are easily extracted from commit messages and source code. Therefore, it would be useful to researchers if URLs provide additional insight on project development. To assess traceability, manual topic labelling is evaluated against automated topic labelling on URL data sets. This work also shows differences between URL data collected from commit messages versus URL data collected from source code. As well, this work explores outlying software projects with many URLs in case these projects do not provide meaningful software relationship information. Results from manual topic labelling show promise under evaluation while automated topic labelling did not yield precise topics. Further investigation of manual and automated topic analysis would be useful.


2019 ◽  
Author(s):  
Robert W Reid ◽  
Jacob W Ferrier ◽  
Jeremy J Jay

AbstractSummaryDatabio is capable of providing fast and accurate annotation of gene-oriented data sets, coupled with an integrated identifier conversion service to empower downstream data mining and computational analysis. Databio is enabled by fast real-time data structures applied to over 137 million unique identifiers, and uses automated heuristics to permit accurate data provenance without highly specialized knowledge and bioinformatics training.Availability and ImplementationFreely available on the web at https://datab.io/. Source code and binaries are freely available for download at https://github.com/joiningdata/databio/, implemented in Go and supported on Linux, Windows, and macOS.


2021 ◽  
Vol 36 (2) ◽  
pp. 160-166
Author(s):  
Alfaroq O.M. Mohammed ◽  
Ziad A. Abdelnabi ◽  
Abdalmunam Abdalla

The development of software projects consists of several stages, such as analysis and design. It also requires a set of skills that the software developer can use to work on the project, such as specifying the requirements and writing code. Developers usually search for source code on the internet for remix and reuse in software production. This paper aims to investigate the influence and effect of code retrieved from the web on programmers’ views, decisions, and skills. A questionnaire instrument was designed and distributed to programmers for their feedback. As a result, we were able to address some points and achieved a better understanding of the interaction between programmers and the code from the web, especially the code from programming forums such as Stack Over Flow.


Computers ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 47
Author(s):  
Fariha Iffath ◽  
A. S. M. Kayes ◽  
Md. Tahsin Rahman ◽  
Jannatul Ferdows ◽  
Mohammad Shamsul Arefin ◽  
...  

A programming contest generally involves the host presenting a set of logical and mathematical problems to the contestants. The contestants are required to write computer programs that are capable of solving these problems. An online judge system is used to automate the judging procedure of the programs that are submitted by the users. Online judges are systems designed for the reliable evaluation of the source codes submitted by the users. Traditional online judging platforms are not ideally suitable for programming labs, as they do not support partial scoring and efficient detection of plagiarized codes. When considering this fact, in this paper, we present an online judging framework that is capable of automatic scoring of codes by detecting plagiarized contents and the level of accuracy of codes efficiently. Our system performs the detection of plagiarism by detecting fingerprints of programs and using the fingerprints to compare them instead of using the whole file. We used winnowing to select fingerprints among k-gram hash values of a source code, which was generated by the Rabin–Karp Algorithm. The proposed system is compared with the existing online judging platforms to show the superiority in terms of time efficiency, correctness, and feature availability. In addition, we evaluated our system by using large data sets and comparing the run time with MOSS, which is the widely used plagiarism detection technique.


Author(s):  
Paulo Meirelles ◽  
Carlos Santos Jr. ◽  
Joao Miranda ◽  
Fabio Kon ◽  
Antonio Terceiro ◽  
...  

Author(s):  
Heiko Paulheim ◽  
Christian Bizer

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.


Author(s):  
Sebastian Baltes

AbstractAnalyzing and understanding software developers’ work habits and resulting needs is an essential prerequisite to improve software development practice. In our research, we utilize different qualitative and quantitative research methods to empirically investigate three underexplored aspects of software development: First, we analyze how software developers use sketches and diagrams in their daily work and derive requirements for better tool support. Then, we explore to what degree developers copy code from the popular online platform Stack Overflow without adhering to license requirements and motivate why this behavior may lead to legal issues for affected open source software projects. Finally, we describe a novel theory of software development expertise and identify factors fostering or hindering the formation of such expertise. Besides, we report on methodological implications of our research and present the open dataset SOTorrent, which supports researchers in analyzing the origin, evolution, and usage of content on Stack Overflow. The common goal for all studies we conducted was to better understand software developers’ work practices. Our findings support researchers and practitioners in making data-informed decisions when developing new tools or improving processes related to either the specific work habits we studied or expertise development in general.


2020 ◽  
Author(s):  
Willian N. Oizumi ◽  
Alessandro F. Garcia

Design problems affect most software projects and make their maintenance expensive and impeditive. Thus, the identification of potential design problems in the source code – which is very often the only available and upto-date artifact in a project – becomes essential in long-living software systems. This identification task is challenging as the reification of design problems in the source code tend to be scattered through several code elements. However, stateof-the-art techniques do not provide enough information to effectively help developers in this task. In this work, we address this challenge by proposing a new technique to support developers in revealing design problems. This technique synthesizes information about potential design problems, which are materialized in the implementation under the form of syntactic and semantic anomaly agglomerations. Our evaluation shows that the proposed synthesis technique helps to reveal more than 1200 design problems across 7 industry-strength systems, with a median precision of 71% and a median recall of 78%. The relevance of our work has been widely recognized by the software engineering community through 2 awards and 7 publications in international and national venues.


Author(s):  
Rosalva E. Gallardo-Valencia ◽  
Phitchayaphong Tantikul ◽  
Susan Elliott Sim
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document