Data mining source code to facilitate program comprehension: experiments on clustering data retrieved from C++programs

Author(s):  
Y. Kanellopoulos ◽  
C. Tjortjis
2016 ◽  
Vol 5 (1) ◽  
pp. 39-60 ◽  
Author(s):  
Shouki A. Ebad ◽  
Danish Manzoor

An important indicator of source code quality is compliance with naming conventions. It is believed that such practices improve program comprehension, which directly affects maintainability and reusability. In this paper, the authors conduct an experiment to determine how well Java and C# programs follow a set of well-publicized naming practices. The experiment evaluated 120 arbitrarily selected open-source Java and C# classes from different programmers with respect to four naming conventions. The results indicate that Java and C# programs do not always follow naming conventions. However, Java developers are more attentive than C# developers in terms of following naming practices. A disturbing trend was found in variable and constant naming conventions, which were violated in most C# subjects. Moreover, there is a positive correlation between the number of violations found in a C# class and its size but a negative correlation in case of Java class. The findings are expected to contribute to the existing knowledge of the use of coding standards and source code quality. The paper also discusses the threats to the validity of the study and suggests open issues for future research.


2020 ◽  
Vol 3 (3) ◽  
pp. 187-201
Author(s):  
Sufajar Butsianto ◽  
Nindi Tya Mayangwulan

Penggunaan mobil di Indonesia setiap tahunnya selalu meningkat dan membuat perusahaan otomotif berlomba-lomba dalam peningkatan penjualannya. Tujuan dari penelitian ini untuk mengelompokan data penjualan kedalam sebuah cluster dengan metode Data Mining Algoritma K-Means Clustering. Data Penjualan nantinya akan dikelompokan berdasarkan kemiripan data tersebut sehingga data dengan karakteristik yang sama akan berada dalam satu cluster. Atribut yang digunakan adalah brand dan penjualan. Cluster yang terbentuk setelah dilakukan proses K-Means Clustering terbagi menjadi tiga cluster yaitu Cluster 0 jumlah anggota 235 dengan presentase 26% dikategorikan Laris, Cluster 1 jumlah anggota 604 dengan presentase 67% dikategorikan Kurang Laris, dan Cluster 2 jumlah angota 61 dengan presentase 7% dikategorikan Paling Laris, dari proses clustering diatas dapat diperoleh validasi DBI (Davies Bouldin Index) dengan nilai 0,341


2018 ◽  
Vol 3 (1) ◽  
pp. 001
Author(s):  
Zulhendra Zulhendra ◽  
Gunadi Widi Nurcahyo ◽  
Julius Santony

In this study using Data Mining, namely K-Means Clustering. Data Mining can be used in searching for a large enough data analysis that aims to enable Indocomputer to know and classify service data based on customer complaints using Weka Software. In this study using the algorithm K-Means Clustering to predict or classify complaints about hardware damage on Payakumbuh Indocomputer. And can find out the data of Laptop brands most do service on Indocomputer Payakumbuh as one of the recommendations to consumers for the selection of Laptops.


2020 ◽  
Vol 25 (1) ◽  
pp. 76-88
Author(s):  
Suhandio Handoko ◽  
Fauziah Fauziah ◽  
Endah Tri Esti Handayani
Keyword(s):  

Perkembangan industri telekomunikasi saat ini sangat pesat karena telekomunikasi sudah menjadi kebutuhan utama bagi masyarakat sehingga banyak perusahaan yang bergerak di industry telekomunikasi. Banyaknya industry Telekomunikasi menuntut para pengembang untuk menemukan strategi atau suatu pola yang dapat meningkatkan penjualan dan pemasaran produk, salah satu strateginya adalah dengan memanfaatkan data transaksi. Paket data merupakan produk dibidang telekomunikasi. Proses Clustering saat ini masih di lakukan secara manual sehingga membutuhkan waktu, proses perhitungan dan ketelitian yang tinggi. Pada penelitian ini dibuat aplikasi berbasis website dengan tujuan untuk mempermudah Clustering data sehingga dapat digunakan sebagai referensi dalam perencanaan promosi produk telkomsel ke berbagai daerah. Metode yang digunakan untuk mengatasi permasalahan tersebut yaitu metode Clustering dengan menggunakan Algoritma K-Means. Algoritma K-Means merupakan algoritma pengelompokkan sejumlah data menjadi menjadi kelompok-kelompok data tertentu. Pada penelitian ini data penjualan dikelompokkan menjadi 3 yaitu data penjualan rendah, data penjualan sedang dan data penjualan tinggi. Pengujian clustering dengan algoritma K-Means pada aplikasi terhadap data transaksi penjualan paket telkomsel diperoleh persentase kesesuaian yaitu 100% dibandingkan dengan clustering manual.


2021 ◽  
Vol 8 (1) ◽  
pp. 83
Author(s):  
Bagus Muhammad Islami ◽  
Cepy Sukmayadi ◽  
Tesa Nur Padilah

Abstrak: Masalah kesehatan yang ada di dalam masyarakat terutama di negara- negara berkembang seperti Indonesia dipengaruhi oleh dua faktor yaitu aspek fisik dan aspek non fisik. Berdasarkan data yang diperoleh dari karawangkab.bps.go.id data dibagi menjadi 3 cluster yaitu sedikit, sedang dan terbanyak. Algoritma yang digunakan adalah K-Means cluster yang diimplementsikan menggunakan Microsoft Excel dan Rapidminer Studio. Hasil pengolahan data fasilitas kesehatan di karawang menghasilkan 3 cluster dengan cluster 1 yang mempunyai fasilitas kesehatan sedikit sebanyak 23 kecamatan, cluster 2 yang mempunyai fasilitas kesehatan sedang sebanyak 5 kecamatan dan cluster 3 yang mempunyai fasilitas kesehatan terbanyak terdapat 2 kecamatan. Kinerja yang dihasilkan dari algoritma K-means menghasilkan nilai Davies Boildin Index sebesar 0,109.   Kata kunci: clustering, data mining, fasilitas kesehatan, K-Means.   Abstract: Health problems that exist in society, especially in developing countries like Indonesia, are built by two factors, namely physical and non-physical aspects. Based on data obtained from karawangkab.bps.go.id the data is divided into 3 clusters, namely the least, medium and the most. The algorithm used is the K-Means cluster which is implemented using Microsoft Excel and Rapidminer Studio. The results of data processing of health facilities in Karawang produce 3 clusters with cluster 1 which has 23 sub-districts of health facilities, cluster 2 which has medium health facilities as many as 5 districts and cluster 3 which has the most health facilities in 2 districts. The performance resulting from the K-means algorithm results in a Davies Boildin Index value of 0.109.   Keywords: clustering, data mining, health facilities, K-Means.


Author(s):  
Slawomir T. Wierzchon

Standard clustering algorithms employ fixed assumptions about data structure. For instance, the k-means algorithm is applicable for spherical and linearly separable data clouds. When the data come from multidimensional normal distribution – so-called EM algorithm can be applied. But in practice the assumptions underlying given set of observations are too complex to fit into a single assumption. We can split these assumptions into manageable hypothesis justifying the use of particular clustering algorithms. Then we must aggregate partial results into a meaningful description of our data. The consensus clustering do this task. In this article we clarify the idea of consensus clustering, and we present conceptual frames for such a compound analysis. Next the basic approaches to implement consensus procedure are given. Finally, some new directions in this field are mentioned.


Author(s):  
Minh Ngoc Ngo

Due to the need to reengineer and migrating aging software and legacy systems, reverse engineering has started to receive some attention. It has now been established as an area in software engineering to understand the software structure, to recover or extract design and features from programs mainly from source code. The inference of design and feature from codes has close similarity with data mining that extracts and infers information from data. In view of their similarity, reverse engineering from program codes can be called as program mining. Traditionally, the latter has been mainly based on invariant properties and heuristics rules. Recently, empirical properties have been introduced to augment the existing methods. This article summarizes some of the work in this area.


Author(s):  
Ioannis N. Kouris

Software development has various stages, that can be conceptually grouped into two phases namely development and production (Figure 1). The development phase includes requirements engineering, architecting, design, implementation and testing. The production phase on the other hand includes the actual deployment of the end product and its maintenance. Software maintenance is the last and most difficult stage in the software lifecycle (Sommerville, 2001), as well as the most costly one. According to Zelkowitz, Shaw and Gannon (1979) the production phase accounts for 67% of the costs of the whole process, whereas according to Van Vliet (2000) the actual cost of software maintenance has been estimated at more than half of the total software development cost. The development phase is critical in order to facilitate efficient and simple software maintenance. The earlier stages should be done by taking into consideration apart from any functional requirements also the later maintenance task. For example the design stage should plan the structure in a way that can be easily altered. Similarly, the implementation stage should create code that can be easily read, understood, and changed, and should also keep the code length to a minimum. According to Van Vliet (2000) the final source code length generated is the determinant factor for the total cost during maintenance, since obviously the less code is written the easier the maintenance becomes. According to Erdil et al. (2003) there are four major problems that can slow down the whole maintenance process: unstructured code, maintenance programmers having insufficient knowledge of the system, documentation being absent, out of date, or at best insufficient, and software maintenance having a bad image. Thus the success of the maintenance phase relies on these problems being fixed earlier in the life cycle. In real life however when programmers decide to perform some maintenance task on a program such as to fix bugs, to make modifications, to create software updates etc. these are usually done in a state of time and commercial pressures and with the logic of cost reduction, thus finally resulting in a problematic system with ever increased complexity. As a consequence the maintainers spend from 50% up to almost 90% of their time trying to comprehend the program (Erdös and Sneed; 1998, Von Mayrhauser and Vans; 1994, Pigoski, 1996). Providing maintainers with tools and techniques to comprehend the programs has become and is receiving a lot of financial and research interest given the widespread of computers and software in all aspects of life. In this work we briefly present some of the most important techniques proposed in the field thus far and focus primarily on the use of data mining techniques in general and especially on association rules. Accordingly we give some possible solutions to problems faced by these methods.


Sign in / Sign up

Export Citation Format

Share Document