scholarly journals Uncovering source code reuse in large-scale academic environments

2014 ◽  
Vol 23 (3) ◽  
pp. 383-390 ◽  
Author(s):  
Enrique Flores ◽  
Alberto Barrón-Cedeño ◽  
Lidia Moreno ◽  
Paolo Rosso

2021 ◽  
Author(s):  
Aleksandar Kovačević ◽  
Jelena Slivka ◽  
Dragan Vidaković ◽  
Katarina-Glorija Grujić ◽  
Nikola Luburić ◽  
...  

<p>Code smells are structures in code that often have a negative impact on its quality. Manually detecting code smells is challenging and researchers proposed many automatic code smell detectors. Most of the studies propose detectors based on code metrics and heuristics. However, these studies have several limitations, including evaluating the detectors using small-scale case studies and an inconsistent experimental setting. Furthermore, heuristic-based detectors suffer from limitations that hinder their adoption in practice. Thus, researchers have recently started experimenting with machine learning (ML) based code smell detection. </p><p>This paper compares the performance of multiple ML-based code smell detection models against multiple traditionally employed metric-based heuristics for detection of God Class and Long Method code smells. We evaluate the effectiveness of different source code representations for machine learning: traditionally used code metrics and code embeddings (code2vec, code2seq, and CuBERT).<br></p><p>We perform our experiments on the large-scale, manually labeled MLCQ dataset. We consider the binary classification problem – we classify the code samples as smelly or non-smelly and use the F1-measure of the minority (smell) class as a measure of performance. In our experiments, the ML classifier trained using CuBERT source code embeddings achieved the best performance for both God Class (F-measure of 0.53) and Long Method detection (F-measure of 0.75). With the help of a domain expert, we perform the error analysis to discuss the advantages of the CuBERT approach.<br></p><p>This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. A secondary contribution of our study is the systematic evaluation of the effectiveness of multiple heuristic-based approaches on the same large-scale, manually labeled MLCQ dataset.<br></p>



2016 ◽  
Vol 28 (8) ◽  
pp. 642-664 ◽  
Author(s):  
Ameer Armaly ◽  
Collin McMillan




Author(s):  
Sangeeta Lal ◽  
Neetu Sardana ◽  
Ashish Sureka

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.



2020 ◽  
Vol 36 (13) ◽  
pp. 4097-4098 ◽  
Author(s):  
Anna Breit ◽  
Simon Ott ◽  
Asan Agibetov ◽  
Matthias Samwald

Abstract Summary Recently, novel machine-learning algorithms have shown potential for predicting undiscovered links in biomedical knowledge networks. However, dedicated benchmarks for measuring algorithmic progress have not yet emerged. With OpenBioLink, we introduce a large-scale, high-quality and highly challenging biomedical link prediction benchmark to transparently and reproducibly evaluate such algorithms. Furthermore, we present preliminary baseline evaluation results. Availability and implementation Source code and data are openly available at https://github.com/OpenBioLink/OpenBioLink. Supplementary information Supplementary data are available at Bioinformatics online.



Author(s):  
Havva Cetiner Altiparmak ◽  
Busra Tokgoz ◽  
Okkes Emin Balcicek ◽  
Aslihan Ozkaya ◽  
Ahmet Arslan


2019 ◽  
Vol 35 (21) ◽  
pp. 4462-4464
Author(s):  
Jordan H Creed ◽  
Garrick Aden-Buie ◽  
Alvaro N Monteiro ◽  
Travis A Gerke

Abstract Summary Complementary advances in genomic technology and public data resources have created opportunities for researchers to conduct multifaceted examination of the genome on a large scale. To meet the need for integrative genome wide exploration, we present epiTAD. This web-based tool enables researchers to compare genomic 3D organization and annotations across multiple databases in an interactive manner to facilitate in silico discovery. Availability and implementation epiTAD can be accessed at https://apps.gerkelab.com/epiTAD/ where we have additionally made publicly available the source code and a Docker containerized version of the application.



Sign in / Sign up

Export Citation Format

Share Document