The Influence of Code Retrieval from the Web on Programmer’s Skills, Meth-odologies, and Coding Behaviors

Thanks to the open nature of libre (free, open source) software projects, researchers have gained access to a rich set of data related to various aspects of software development. Although it is usually publicly available on the Internet, obtaining and analyzing the data in a convenient way is not an easy task, and many considerations have to be taken into account. In this chapter we introduce the most relevant data sources that can be found in libre software projects and that are commonly studied by scholars: source code releases, source code management systems, mailing lists and issue (bug) tracking systems. The chapter also provides some advice on the problems that can be found when retrieving and preparing the data sources for a later analysis, as well as information about the tools and datasets that support these tasks.

Download Full-text

Web-Based Mechanism Design and Analysis

Volume 2: 28th Biennial Mechanisms and Robotics Conference, Parts A and B ◽

10.1115/detc2004-57594 ◽

2004 ◽

Cited By ~ 2

Author(s):

Harry H. Cheng ◽

Dung T. Trang

Keyword(s):

Distance Learning ◽

Mechanism Design ◽

The Internet ◽

Web Based ◽

Planar Mechanism ◽

Analysis And Design ◽

Design And Implementation ◽

Web Based System ◽

The Web

We have developed a Ch Mechanism Toolkit for analysis and design of mechanisms. It was developed using Ch, an embeddable C/C++ interpreter with extensions. The Ch Mechanism Toolkit allows users to write simple programs for solving complicated planar mechanism problems. As an extension to the toolkit, a Web-based system was created for performing mechanism design and analysis through the internet. This paper will discuss the design and implementation of the Ch Mechanism Toolkit as well as its corresponding web-based system. The web-based mechanism system is especially suitable for distance learning. The web-based system for mechanism design and analysis is available on the Web at http://www.softintegration/webservices/mechanism/.

Download Full-text

Boa: a link between worlds

10.7287/peerj.preprints.1947v1 ◽

2016 ◽

Author(s):

Stephen Romansky ◽

Sadegh Charmchi ◽

Abram Hindle

Keyword(s):

Business Models ◽

Source Code ◽

Data Sets ◽

Additional Insight ◽

Software Projects ◽

Software Developers ◽

Topic Analysis ◽

Platform As A Service ◽

Software Changes ◽

The Web

The business models of software/platform as a service have contributed to developers dependence on the Internet. Developers can rapidly point each other and consumers to the newest software changes with the power of the hyper link. But, developers are not limited to referencing software changes to one another through the web. Other shared hypermedia might include links to: Stack Overflow, Twitter, and issue trackers. This work explores the software traceability of Uniform Resource Locators (URLs) which software developers leave in commit messages and software repositories. URLs are easily extracted from commit messages and source code. Therefore, it would be useful to researchers if URLs provide additional insight on project development. To assess traceability, manual topic labelling is evaluated against automated topic labelling on URL data sets. This work also shows differences between URL data collected from commit messages versus URL data collected from source code. As well, this work explores outlying software projects with many URLs in case these projects do not provide meaningful software relationship information. Results from manual topic labelling show promise under evaluation while automated topic labelling did not yield precise topics. Further investigation of manual and automated topic analysis would be useful.

Download Full-text

Tools and Datasets for Mining Libre Software Repositories

Computer Engineering ◽

10.4018/978-1-61350-456-7.ch305 ◽

2012 ◽

pp. 564-582

Author(s):

Gregorio Robles ◽

Jesús M. González-Barahona ◽

Daniel Izquierdo-Cortazar ◽

Israel Herraiz

Keyword(s):

Source Code ◽

Data Sources ◽

The Internet ◽

Software Projects ◽

Free Open Source Software ◽

Mailing Lists ◽

Bug Tracking ◽

Libre Software ◽

Free Open Source ◽

Open Nature

Thanks to the open nature of libre (free, open source) software projects, researchers have gained access to a rich set of data related to various aspects of software development. Although it is usually publicly available on the Internet, obtaining and analyzing the data in a convenient way is not an easy task, and many considerations have to be taken into account. In this chapter we introduce the most relevant data sources that can be found in libre software projects and that are commonly studied by scholars: source code releases, source code management systems, mailing lists and issue (bug) tracking systems. The chapter also provides some advice on the problems that can be found when retrieving and preparing the data sources for a later analysis, as well as information about the tools and datasets that support these tasks.

Download Full-text

High Performance Distributed Web-Scraper

Proceedings of the Institute for System Programming of RAS ◽

10.15514/ispras-2021-33(3)-7 ◽

2021 ◽

Vol 33 (3) ◽

pp. 87-100

Author(s):

Denis Eyzenakh ◽

Anton Rameykov ◽

Igor Nikiforov

Keyword(s):

Data Storage ◽

High Performance ◽

Data Extraction ◽

Source Code ◽

The Internet ◽

Distinctive Features ◽

The Past ◽

Web Information ◽

Machine Leaning ◽

The Web

Over the past decade, the Internet has become the gigantic and richest source of data. The data is used for the extraction of knowledge by performing machine leaning analysis. In order to perform data mining of the web-information, the data should be extracted from the source and placed on analytical storage. This is the ETL-process. Different web-sources have different ways to access their data: either API over HTTP protocol or HTML source code parsing. The article is devoted to the approach of high-performance data extraction from sources that do not provide an API to access the data. Distinctive features of the proposed approach are: load balancing, two levels of data storage, and separating the process of downloading files from the process of scraping. The approach is implemented in the solution with the following technologies: Docker, Kubernetes, Scrapy, Python, MongoDB, Redis Cluster, and СephFS. The results of solution testing are described in this article as well.

Download Full-text

Modeling Defects in E-Projects

Integrated Approaches in Information Technology and Web Engineering ◽

10.4018/978-1-60566-418-7.ch020 ◽

2009 ◽

pp. 317-330

Author(s):

John D. Ferguson ◽

James Miller

Keyword(s):

Simple Model ◽

Production Process ◽

High Frequency ◽

Rapid Development ◽

Software Projects ◽

Software Production ◽

Defect Injection ◽

Removal Processes ◽

Tools And Techniques ◽

The Web

It is now widely accepted that software projects utilizing the Web (e-projects) face many of the same problems and risks experienced with more traditional software projects, only to a greater degree. Further, their characteristics of rapid development cycles combined with high frequency of software releases and adaptations make many of the traditional tools and techniques for modeling defects unsuitable. This paper proposes a simple model to explain and quantify the interaction between generic defect injection and removal processes in e-projects. The model is based upon long standing and highly regarded work from the field of quantitative ecological population modeling. This basic modeling approach is then subsequently tailored to fit the software production process within an e-project context.

Download Full-text

Boa: a link between worlds

10.7287/peerj.preprints.1947 ◽

2016 ◽

Author(s):

Stephen Romansky ◽

Sadegh Charmchi ◽

Abram Hindle

Keyword(s):

Business Models ◽

Source Code ◽

Data Sets ◽

Additional Insight ◽

Software Projects ◽

Software Developers ◽

Topic Analysis ◽

Platform As A Service ◽

Software Changes ◽

The Web

The business models of software/platform as a service have contributed to developers dependence on the Internet. Developers can rapidly point each other and consumers to the newest software changes with the power of the hyper link. But, developers are not limited to referencing software changes to one another through the web. Other shared hypermedia might include links to: Stack Overflow, Twitter, and issue trackers. This work explores the software traceability of Uniform Resource Locators (URLs) which software developers leave in commit messages and software repositories. URLs are easily extracted from commit messages and source code. Therefore, it would be useful to researchers if URLs provide additional insight on project development. To assess traceability, manual topic labelling is evaluated against automated topic labelling on URL data sets. This work also shows differences between URL data collected from commit messages versus URL data collected from source code. As well, this work explores outlying software projects with many URLs in case these projects do not provide meaningful software relationship information. Results from manual topic labelling show promise under evaluation while automated topic labelling did not yield precise topics. Further investigation of manual and automated topic analysis would be useful.

Download Full-text