Text mining and software engineering: an integrated source code and document analysis approach

Design problems affect most software projects and make their maintenance expensive and impeditive. Thus, the identification of potential design problems in the source code – which is very often the only available and upto-date artifact in a project – becomes essential in long-living software systems. This identification task is challenging as the reification of design problems in the source code tend to be scattered through several code elements. However, stateof-the-art techniques do not provide enough information to effectively help developers in this task. In this work, we address this challenge by proposing a new technique to support developers in revealing design problems. This technique synthesizes information about potential design problems, which are materialized in the implementation under the form of syntactic and semantic anomaly agglomerations. Our evaluation shows that the proposed synthesis technique helps to reveal more than 1200 design problems across 7 industry-strength systems, with a median precision of 71% and a median recall of 78%. The relevance of our work has been widely recognized by the software engineering community through 2 awards and 7 publications in international and national venues.

Download Full-text

Program Mining Augmented with Empirical Properties

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch246 ◽

2011 ◽

pp. 1610-1616

Author(s):

Minh Ngoc Ngo

Keyword(s):

Data Mining ◽

Software Engineering ◽

Reverse Engineering ◽

Source Code ◽

Close Similarity ◽

Legacy Systems ◽

Software Structure ◽

Program Mining ◽

Invariant Properties ◽

To Receive

Due to the need to reengineer and migrating aging software and legacy systems, reverse engineering has started to receive some attention. It has now been established as an area in software engineering to understand the software structure, to recover or extract design and features from programs mainly from source code. The inference of design and feature from codes has close similarity with data mining that extracts and infers information from data. In view of their similarity, reverse engineering from program codes can be called as program mining. Traditionally, the latter has been mainly based on invariant properties and heuristics rules. Recently, empirical properties have been introduced to augment the existing methods. This article summarizes some of the work in this area.

Download Full-text

Productivity Measurement in Software Engineering

International Journal of Information Technologies and Systems Approach ◽

10.4018/ijitsa.2015010103 ◽

2015 ◽

Vol 8 (1) ◽

pp. 46-68 ◽

Cited By ~ 2

Author(s):

Adrián Hernández-López ◽

Ricardo Colomo-Palacios ◽

Pedro Soto-Acosta ◽

Cristina Casado Lumberas

Keyword(s):

Software Engineering ◽

Manufacturing Industry ◽

Source Code ◽

Research Topic ◽

Productivity Measurement

Productivity measurement is constructed by the measure of tree categories of elements: inputs, outputs and factors. This concept, which started being used in the manufacturing industry, has been also a research topic within Software Engineering (SE). In this area, the most used inputs are time and effort and the most used outputs are source code and functionality. Despite of their known limitations, many of the most used productivity measures are still being used due to the information that they provide for management goals. In order to enable the construction of new productivity measures for SE practitioners, the existence of other inputs apart from time and effort, and other outputs, apart from source code and functionality is analyzed in this paper. Moreover, differences in usage of the inputs and production of the outputs among some SE job positions are analyzed and explained.

Download Full-text

An Expert Judgment in Source Code Quality Research Domain—A Comparative Study between Professionals and Students

Applied Sciences ◽

10.3390/app10207088 ◽

2020 ◽

Vol 10 (20) ◽

pp. 7088

Author(s):

Luka Pavlič ◽

Marjan Heričko ◽

Tina Beranič

Keyword(s):

Experimental Data ◽

Software Engineering ◽

Research Evidence ◽

Source Code ◽

Scientific Research ◽

Controlled Experiments ◽

Quality Research ◽

Professional Responses ◽

Code Quality ◽

Research Domain

In scientific research, evidence is often based on empirical data. Scholars tend to rely on students as participants in experiments in order to validate their thesis. They are an obvious choice when it comes to scientific research: They are usually willing to participate and are often themselves pursuing an education in the experiment’s domain. The software engineering domain is no exception. However, readers, authors, and reviewers do sometimes question the validity of experimental data that is gathered in controlled experiments from students. This is why we will address this difficult-to-answer question: Are students a proper substitute for experienced professional engineers while performing experiments in a typical software engineering experiment. As we demonstrate in this paper, it is not a “yes or no” answer. In some aspects, students were not outperformed by professionals, but in others, students would not only give different answers compared to professionals, but their answers would also diverge. In this paper we will show and analyze the results of a controlled experiment in the source code quality domain in terms of comparing student and professional responses. We will show that authors have to be careful when employing students in experiments, especially when complex and advanced domains are addressed. However, they may be a proper substitution in cases, where non-advanced aspects are required.

Download Full-text

Benefits and Drawbacks of Representing and Analyzing Source Code and Software Engineering Artifacts with Graph Databases

Lecture Notes in Business Information Processing - Software Quality: The Complexity and Challenges of Software Engineering and Software Quality in the Cloud ◽

10.1007/978-3-030-05767-1_9 ◽

2018 ◽

pp. 125-148

Author(s):

Rudolf Ramler ◽

Georg Buchgeher ◽

Claus Klammer ◽

Michael Pfeiffer ◽

Christian Salomon ◽

...

Keyword(s):

Software Engineering ◽

Source Code ◽

Graph Databases

Download Full-text

Text mining for software engineering

Proceedings of the 2005 international workshop on Mining software repositories - MSR '05 ◽

10.1145/1083142.1083153 ◽

2005 ◽

Cited By ~ 14

Author(s):

Jane Huffman Hayes ◽

Alex Dekhtyar ◽

Senthil Sundaram

Keyword(s):

Software Engineering ◽

Text Mining

Download Full-text

Identifying New IT-Based Service Concepts Based on the Technological Strength: A Text Mining and Morphology Analysis Approach

Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007) ◽

10.1109/fskd.2007.350 ◽

2007 ◽

Cited By ~ 1

Author(s):

Changyong Lee ◽

Hyeonju Seol ◽

Yongtae Park

Keyword(s):

Text Mining ◽

Analysis Approach ◽

Morphology Analysis ◽

Technological Strength

Download Full-text

Extracting the Essential Cartographic Functionality of Programs on the Web

Proceedings of the ICA ◽

10.5194/ica-proc-1-66-2018 ◽

2018 ◽

Vol 1 ◽

pp. 1-5

Author(s):

Florian Ledermann

Keyword(s):

Software Engineering ◽

Source Code ◽

Automated Analysis ◽

Point Of View ◽

Close Reading ◽

Future Developments ◽

Potential Applications ◽

The Web

Following Aristotle, F. P. Brooks (1987) emphasizes the distinction between “essential difficulties” and “accidental difficulties” as a key challenge in software engineering. From the point of view of cartography, it would be desirable to identify the cartographic essence of a program, and subject it to additional scrutiny, while its accidental proper-ties, again from the point of view of cartography, are usually of lesser relevance to cartographic analysis. In this paper, two methods that facilitate extracting the cartographic essence of programs are presented: close reading of their source code, and the automated analysis of their runtime behavior. The advantages and shortcomings of both methods are discussed, followed by an outlook to future developments and potential applications.

Download Full-text

COMPARISON OF SOFTWARE COMPLEXITY OF SEARCH ALGORITHM USING CODE BASED COMPLEXITY METRICS

International Journal of Engineering Applied Sciences and Technology ◽

10.33564/ijeast.2021.v06i05.003 ◽

2021 ◽

Vol 6 (5) ◽

Author(s):

Bello Muriana ◽

Ogba Paul Onuh

Keyword(s):

Software Engineering ◽

Programming Languages ◽

Search Algorithm ◽

Source Code ◽

Search Algorithms ◽

Binary Search ◽

Software Systems ◽

Software Complexity ◽

Complexity Metrics ◽

Binary Search Algorithm

Measures of software complexity are essential part of software engineering. Complexity metrics can be used to forecast key information regarding the testability, reliability, and manageability of software systems from study of the source code. This paper presents the results of three distinct software complexity metrics that were applied to two searching algorithms (Linear and Binary search algorithm). The goal is to compare the complexity of linear and binary search algorithms implemented in (Python, Java, and C++ languages) and measure the sample algorithms using line of code, McCabe and Halstead metrics. The findings indicate that the program difficulty of Halstead metrics has minimal value for both linear and binary search when implemented in python. Analysis of Variance (ANOVA) was adopted to determine whether there is any statistically significant differences between the search algorithms when implemented in the three programming languages and it was revealed that the three (3) programming languages do not vary considerably for both linear and binary search techniques which implies that any of the (3) programming languages is suitable for coding linear and binary search algorithms.

Download Full-text