Boa: a link between worlds

10.7287/peerj.preprints.1947 ◽

2016 ◽

Author(s):

Stephen Romansky ◽

Sadegh Charmchi ◽

Abram Hindle

Keyword(s):

Business Models ◽

Source Code ◽

Data Sets ◽

Additional Insight ◽

Software Projects ◽

Software Developers ◽

Topic Analysis ◽

Platform As A Service ◽

Software Changes ◽

The Web

The business models of software/platform as a service have contributed to developers dependence on the Internet. Developers can rapidly point each other and consumers to the newest software changes with the power of the hyper link. But, developers are not limited to referencing software changes to one another through the web. Other shared hypermedia might include links to: Stack Overflow, Twitter, and issue trackers. This work explores the software traceability of Uniform Resource Locators (URLs) which software developers leave in commit messages and software repositories. URLs are easily extracted from commit messages and source code. Therefore, it would be useful to researchers if URLs provide additional insight on project development. To assess traceability, manual topic labelling is evaluated against automated topic labelling on URL data sets. This work also shows differences between URL data collected from commit messages versus URL data collected from source code. As well, this work explores outlying software projects with many URLs in case these projects do not provide meaningful software relationship information. Results from manual topic labelling show promise under evaluation while automated topic labelling did not yield precise topics. Further investigation of manual and automated topic analysis would be useful.

Download Full-text

Automated Gene Data Integration with Databio

10.1101/768077 ◽

2019 ◽

Author(s):

Robert W Reid ◽

Jacob W Ferrier ◽

Jeremy J Jay

Keyword(s):

Data Structures ◽

Computational Analysis ◽

Source Code ◽

Data Provenance ◽

Data Sets ◽

Time Data ◽

Specialized Knowledge ◽

Link Type ◽

Gene Data ◽

The Web

AbstractSummaryDatabio is capable of providing fast and accurate annotation of gene-oriented data sets, coupled with an integrated identifier conversion service to empower downstream data mining and computational analysis. Databio is enabled by fast real-time data structures applied to over 137 million unique identifiers, and uses automated heuristics to permit accurate data provenance without highly specialized knowledge and bioinformatics training.Availability and ImplementationFreely available on the web at https://datab.io/. Source code and binaries are freely available for download at https://github.com/joiningdata/databio/, implemented in Go and supported on Linux, Windows, and macOS.

Download Full-text

The Influence of Code Retrieval from the Web on Programmer’s Skills, Meth-odologies, and Coding Behaviors

AL-MUKHTAR JOURNAL OF SCIENCES ◽

10.54172/mjsc.v36i2.66 ◽

2021 ◽

Vol 36 (2) ◽

pp. 160-166

Author(s):

Alfaroq O.M. Mohammed ◽

Ziad A. Abdelnabi ◽

Abdalmunam Abdalla

Keyword(s):

Source Code ◽

The Internet ◽

Software Projects ◽

Analysis And Design ◽

Software Developer ◽

Software Production ◽

The Web

The development of software projects consists of several stages, such as analysis and design. It also requires a set of skills that the software developer can use to work on the project, such as specifying the requirements and writing code. Developers usually search for source code on the internet for remix and reuse in software production. This paper aims to investigate the influence and effect of code retrieved from the web on programmers’ views, decisions, and skills. A questionnaire instrument was designed and distributed to programmers for their feedback. As a result, we were able to address some points and achieved a better understanding of the interaction between programmers and the code from the web, especially the code from programming forums such as Stack Over Flow.

Download Full-text

Online Judging Platform Utilizing Dynamic Plagiarism Detection Facilities

Computers ◽

10.3390/computers10040047 ◽

2021 ◽

Vol 10 (4) ◽

pp. 47

Author(s):

Fariha Iffath ◽

A. S. M. Kayes ◽

Md. Tahsin Rahman ◽

Jannatul Ferdows ◽

Mohammad Shamsul Arefin ◽

...

Keyword(s):

Source Code ◽

Large Data ◽

Large Data Sets ◽

Detection Technique ◽

Data Sets ◽

Plagiarism Detection ◽

Source Codes ◽

Efficient Detection ◽

Mathematical Problems ◽

Automatic Scoring

A programming contest generally involves the host presenting a set of logical and mathematical problems to the contestants. The contestants are required to write computer programs that are capable of solving these problems. An online judge system is used to automate the judging procedure of the programs that are submitted by the users. Online judges are systems designed for the reliable evaluation of the source codes submitted by the users. Traditional online judging platforms are not ideally suitable for programming labs, as they do not support partial scoring and efficient detection of plagiarized codes. When considering this fact, in this paper, we present an online judging framework that is capable of automatic scoring of codes by detecting plagiarized contents and the level of accuracy of codes efficiently. Our system performs the detection of plagiarism by detecting fingerprints of programs and using the fingerprints to compare them instead of using the whole file. We used winnowing to select fingerprints among k-gram hash values of a source code, which was generated by the Rabin–Karp Algorithm. The proposed system is compared with the existing online judging platforms to show the superiority in terms of time efficiency, correctness, and feature availability. In addition, we evaluated our system by using large data sets and comparing the run time with MOSS, which is the widely used plagiarism detection technique.

Download Full-text

Exploring 3-dimensional oceanographic data sets on the Web using virtual reality modeling language

Oceans '99. MTS/IEEE. Riding the Crest into the 21st Century. Conference and Exhibition. Conference Proceedings (IEEE Cat. No.99CH37008) ◽

10.1109/oceans.1999.800217 ◽

2003 ◽

Cited By ~ 2

Author(s):

C.W. Moore ◽

D.C. McClurg ◽

N.N. Soreide ◽

A.J. Hermann ◽

C.M. Lascara ◽

...

Keyword(s):

Virtual Reality ◽

Modeling Language ◽

Data Sets ◽

3 Dimensional ◽

Virtual Reality Modeling Language ◽

Oceanographic Data ◽

The Web

Download Full-text

A Study of the Relationships between Source Code Metrics and Attractiveness in Free Software Projects

2010 Brazilian Symposium on Software Engineering ◽

10.1109/sbes.2010.27 ◽

2010 ◽

Cited By ~ 18

Author(s):

Paulo Meirelles ◽

Carlos Santos Jr. ◽

Joao Miranda ◽

Fabio Kon ◽

Antonio Terceiro ◽

...

Keyword(s):

Source Code ◽

Free Software ◽

Software Projects ◽

Code Metrics ◽

Source Code Metrics

Download Full-text

Improving the Quality of Linked Data Using Statistical Distributions

Information Retrieval and Management ◽

10.4018/978-1-5225-5191-1.ch074 ◽

2018 ◽

pp. 1638-1664 ◽

Cited By ~ 1

Author(s):

Heiko Paulheim ◽

Christian Bizer

Keyword(s):

Knowledge Base ◽

Linked Data ◽

Relational Databases ◽

Knowledge Bases ◽

Structured Data ◽

Data Sources ◽

Data Sets ◽

Statistical Distributions ◽

The Web

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.

Download Full-text

Software Developers’ Work Habits and Expertise: Empirical Studies on Sketching, Code Plagiarism, and Expertise Development

Ernst Denert Award for Software Engineering 2019 ◽

10.1007/978-3-030-58617-1_4 ◽

2020 ◽

pp. 47-60

Author(s):

Sebastian Baltes

Keyword(s):

Software Development ◽

Specific Work ◽

Expertise Development ◽

Software Projects ◽

Software Developers ◽

Quantitative Research Methods ◽

Stack Overflow ◽

Daily Work ◽

Work Habits ◽

Novel Theory

AbstractAnalyzing and understanding software developers’ work habits and resulting needs is an essential prerequisite to improve software development practice. In our research, we utilize different qualitative and quantitative research methods to empirically investigate three underexplored aspects of software development: First, we analyze how software developers use sketches and diagrams in their daily work and derive requirements for better tool support. Then, we explore to what degree developers copy code from the popular online platform Stack Overflow without adhering to license requirements and motivate why this behavior may lead to legal issues for affected open source software projects. Finally, we describe a novel theory of software development expertise and identify factors fostering or hindering the formation of such expertise. Besides, we report on methodological implications of our research and present the open dataset SOTorrent, which supports researchers in analyzing the origin, evolution, and usage of content on Stack Overflow. The common goal for all studies we conducted was to better understand software developers’ work practices. Our findings support researchers and practitioners in making data-informed decisions when developing new tools or improving processes related to either the specific work habits we studied or expertise development in general.

Download Full-text

Synthesis of Code Anomalies: Revealing Design Problems in the Source Code

10.5753/ctd.2016.9131 ◽

2020 ◽

Author(s):

Willian N. Oizumi ◽

Alessandro F. Garcia

Keyword(s):

Software Engineering ◽

Source Code ◽

New Technique ◽

Software Systems ◽

Identification Task ◽

Software Projects ◽

Design Problems ◽

Engineering Community ◽

Synthesis Technique ◽

A New Technique

Design problems affect most software projects and make their maintenance expensive and impeditive. Thus, the identification of potential design problems in the source code – which is very often the only available and upto-date artifact in a project – becomes essential in long-living software systems. This identification task is challenging as the reification of design problems in the source code tend to be scattered through several code elements. However, stateof-the-art techniques do not provide enough information to effectively help developers in this task. In this work, we address this challenge by proposing a new technique to support developers in revealing design problems. This technique synthesizes information about potential design problems, which are materialized in the implementation under the form of syntactic and semantic anomaly agglomerations. Our evaluation shows that the proposed synthesis technique helps to reveal more than 1200 design problems across 7 industry-strength systems, with a median precision of 71% and a median recall of 78%. The relevance of our work has been widely recognized by the software engineering community through 2 awards and 7 publications in international and national venues.

Download Full-text

Searching for reputable source code on the web

Proceedings of the 16th ACM international conference on Supporting group work - GROUP '10 ◽

10.1145/1880071.1880102 ◽

2010 ◽

Cited By ~ 7

Author(s):

Rosalva E. Gallardo-Valencia ◽

Phitchayaphong Tantikul ◽

Susan Elliott Sim

Keyword(s):

Source Code ◽

The Web

Download Full-text