Program Mining Augmented with Empirical Properties

Author(s):  
Minh Ngoc Ngo

Due to the need to reengineer and migrating aging software and legacy systems, reverse engineering has started to receive some attention. It has now been established as an area in software engineering to understand the software structure, to recover or extract design and features from programs mainly from source code. The inference of design and feature from codes has close similarity with data mining that extracts and infers information from data. In view of their similarity, reverse engineering from program codes can be called as program mining. Traditionally, the latter has been mainly based on invariant properties and heuristics rules. Recently, empirical properties have been introduced to augment the existing methods. This article summarizes some of the work in this area.

Author(s):  
Hee Beng Kuan Tan ◽  
Yuan Zhao

Although the use of statistically probable properties is very common in the area of medicine, it is not so in software engineering. The use of such properties may open a new avenue for the automated recovery of designs from source codes. In fact, the recovery of designs can also be called program mining, which in turn can be viewed as an extension of data mining to the mining in program source codes.


2006 ◽  
Vol 35 (3) ◽  
Author(s):  
Bronius Paradauskas ◽  
Aurimas Laurikaitis

This article discusses the process of enterprise knowledge extraction from relational database and source code of legacy information systems. Problems of legacy systems and main solutions for them are briefly described here. The uses of data reverse engineering and program understanding techniques to automatically infer as much as possible the schema and semantics of a legacy information system is analyzed. Eight step data reverse engineering algorithm for knowledge extraction from legacy systems is provided. A hypothetical example of knowledge extraction from legacy information system is presented.


Author(s):  
Steve McRobb ◽  
Richard Millham ◽  
Jianjun Pu ◽  
Hongji Yang

This chapter presents a report of an experimental approach that uses WSL as an intermediate language for the visualisation of COBOL legacy systems in UML. Key UML techniques are identified that can be used for visualisation. Many cases were studied, and one is presented in detail. The report concludes by demonstrating how this approach can be used to build a software tool that automates the visualisation task. Furthermore, understanding a system is of critical importance to a developer who must be able to understand the business processes being modeled by the system along with the system’s functionality, structure, events, and interactions with external entities. Such an understanding is of even more importance in reverse engineering. Although developers have the advantage of having the source code available, system documentation is often missing or incomplete, and the original users, whose requirements were used to design the system, are often long gone.


Author(s):  
Youcef Baghdadi ◽  
Naoufel Kraiem

Reverse engineering techniques have become very important within the maintenance process providing several benefits. They retrieve abstract representations that not only facilitate the comprehension of legacy systems but also refactor these representations. Business process archaeology has emerged as a set of techniques and tools to recover business processes from source code and to preserve the existing business functions and rules buried in legacy source code. This chapter presents a reverse engineering process and a tool to retrieve services from running databases. These services are further reused in composing business processes with respect to Service-Oriented Architecture, a new architectural style that promotes agility.


Author(s):  
HARALD C. GALL ◽  
RENÉ R. KLÖSCH ◽  
ROLAND T. MITTERMEIR

Integrating application domain knowledge into reverse engineering is an important step to overcome the shortcomings of conventional reverse engineering approaches that are based exclusively on information derivable from source code. In this paper, we show the basic concepts of a program transformation process from a conventional to an object-oriented architecture which incorporates extraneous higher-level knowledge in its process. To which degree this knowledge might stem from some general domain knowledge, and to which extent it needs to be introduced as application dependent knowledge by a human expert is discussed. The paper discusses these issues in the context of the architectural transformation of legacy systems to an object-oriented architecture.


2021 ◽  
Vol 2066 (1) ◽  
pp. 012013
Author(s):  
Xiaobin Hong

Abstract With the rapid development of informatization, computer database software systems have entered various fields of society, which has brought about the explosive growth of industry data. Faced with massive amounts of data, computers with limited storage capacity have to abandon some outdated data, and the application of various data mining technologies related to it has gradually matured. The purpose of this article is to discuss the application research of data mining technology in software engineering. This article analyzes the correlation analysis of a large number of bug repair source code update data and bug defect reports in the version control system SVN and the defect tracking system Bugzilla in the software engineering project development process, and tries to classify the bug report by data mining technology: defect changes and potential defects change. Starting from large-scale software engineering projects, apply data mining technology to the huge software engineeri ng knowledge base. Especially the software development and maintenance are explained, as well as the more challenging problems in the future. This paper uses data mining technology to study the dependency of the source code files of each module of the software system, and helps software developers quickly understand the software architecture by understanding the interrelationships between the modules, and provides suggestions for modification paths. Experimental research shows that this paper compares with F-measure and concludes that FL-M-GSpan algorithm is better than TS-M-GSpan algorithm. At the same time, it is found that the FL-M-GSpan algorithm always has a better accuracy rate close to 95%, while the TS-M-GSpan algorithm always has a better recall rate.


2020 ◽  
Vol 5 (2) ◽  
pp. 443-452
Author(s):  
Zhihao Cui ◽  
Chaobing Yan

AbstractThe scale and complexity of health information service system has increased dramatically, and its development activities and management are difficult to control. In the field of, Traditional methods and simple mathematical statistics methods are difficult to solve the problems caused by the explosive growth of data and information, which will adversely affect health information service system management finally. So, it is particularly important to find valuable information from the source code, design documents and collected software datasets and to guide the development and maintenance of software engineering. Therefore, some experts and scholars want to use mature data mining technologies to study the large amount of data generated in software engineering projects (commonly referred to as software knowledge base), and further explore the potential and valuable information inherently hidden behind the software data. This article initially gives a brief overview of the relevant knowledge of data mining technology and computer software technology, using decision tree graph mining algorithm to mine the function adjustment graph of the software system definition class, and then source code annotations are added to the relevant calling relationships. Data mining technology and computer software technology are deeply integrated, and the decision tree algorithm in data mining is used to mine the knowledge base of computer software. Potential defect changes are listed as key maintenance objects. The historical versions of source code change files with defects are found dynamically and corrected in time, to avoid the increase of maintenance cost in the future.


Author(s):  
Liliana María Favre

Reverse Engineering is the process of analyzing available software artifacts such as requirements, design, architectures, code or byte code, with the objective of extracting information and providing high-level views on the underlying system. A common idea in reverse engineering is to exploit the source code as the most reliable description both of the behavior of a software system and of the organization and its business rules. However, reverse engineering is immersed in a variety of tasks related to comprehending and modifying software such as re-documentation of programs and relational databases, recovering of architectures, recovering of alternative design views, recovering of design patterns, building traceability between code and designs, modernization of interfaces or extracting the source code or high level abstractions from byte code when the source code is not available. Reverse engineering is hardly associated with modernization of legacy systems that were developed many years ago with technology that is now obsolete. These systems include software, hardware, business processes and organizational strategies and politics. Many of them remain in use after more than 20 years; they may be written for technology which is expensive to maintain and which may not be aligned with current organizational politics. Legacy systems resume key knowledge acquired over the life of an organization. Changes are motivated for multiple reasons, for instance the way in which we do business and create value. Important business rules are embedded in the software and may not be documented elsewhere. The way in which the legacy system operates is not explicit (Brodie and Stonebraker, 1995) (Sommerville, 2004).


2019 ◽  
Vol 15 (2) ◽  
pp. 275-280
Author(s):  
Agus Setiyono ◽  
Hilman F Pardede

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam.  One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.


Sign in / Sign up

Export Citation Format

Share Document