source code
Recently Published Documents





Velin Kralev ◽  
Radoslava Kraleva ◽  
Viktor Ankov ◽  
Dimitar Chakalov

<span lang="EN-US">This research focuses on the k-center problem and its applications. Different methods for solving this problem are analyzed. The implementations of an exact algorithm and of an approximate algorithm are presented. The source code and the computation complexity of these algorithms are presented and analyzed. The multitasking mode of the operating system is taken into account considering the execution time of the algorithms. The results show that the approximate algorithm finds solutions that are not worse than two times optimal. In some case these solutions are very close to the optimal solutions, but this is true only for graphs with a smaller number of nodes. As the number of nodes in the graph increases (respectively the number of edges increases), the approximate solutions deviate from the optimal ones, but remain acceptable. These results give reason to conclude that for graphs with a small number of nodes the approximate algorithm finds comparable solutions with those founds by the exact algorithm.</span>

2022 ◽  
Vol 29 (2) ◽  
pp. 1-33
April Yi Wang ◽  
Dakuo Wang ◽  
Jaimie Drozdal ◽  
Michael Muller ◽  
Soya Park ◽  

Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations. Inspired by human documentation practices learned from 80 highly-voted Kaggle notebooks, we design and implement Themisto, an automated documentation generation system to explore how human-centered AI systems can support human data scientists in the machine learning code documentation scenario. Themisto facilitates the creation of documentation via three approaches: a deep-learning-based approach to generate documentation for source code, a query-based approach to retrieve online API documentation for source code, and a user prompt approach to nudge users to write documentation. We evaluated Themisto in a within-subjects experiment with 24 data science practitioners, and found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code they would have ignored, and improved participants’ satisfaction with their computational notebook.

2022 ◽  
Vol 31 (2) ◽  
pp. 1-23
Jevgenija Pantiuchina ◽  
Bin Lin ◽  
Fiorella Zampetti ◽  
Massimiliano Di Penta ◽  
Michele Lanza ◽  

Refactoring operations are behavior-preserving changes aimed at improving source code quality. While refactoring is largely considered a good practice, refactoring proposals in pull requests are often rejected after the code review. Understanding the reasons behind the rejection of refactoring contributions can shed light on how such contributions can be improved, essentially benefiting software quality. This article reports a study in which we manually coded rejection reasons inferred from 330 refactoring-related pull requests from 207 open-source Java projects. We surveyed 267 developers to assess their perceived prevalence of these identified rejection reasons, further complementing the reasons. Our study resulted in a comprehensive taxonomy consisting of 26 refactoring-related rejection reasons and 21 process-related rejection reasons. The taxonomy, accompanied with representative examples and highlighted implications, provides developers with valuable insights on how to ponder and polish their refactoring contributions, and indicates a number of directions researchers can pursue toward better refactoring recommenders.

2022 ◽  
Vol 31 (2) ◽  
pp. 1-34
Patrick Keller ◽  
Abdoul Kader Kaboré ◽  
Laura Plein ◽  
Jacques Klein ◽  
Yves Le Traon ◽  

Recent successes in training word embeddings for Natural Language Processing ( NLP ) tasks have encouraged a wave of research on representation learning for source code, which builds on similar NLP methods. The overall objective is then to produce code embeddings that capture the maximum of program semantics. State-of-the-art approaches invariably rely on a syntactic representation (i.e., raw lexical tokens, abstract syntax trees, or intermediate representation tokens) to generate embeddings, which are criticized in the literature as non-robust or non-generalizable. In this work, we investigate a novel embedding approach based on the intuition that source code has visual patterns of semantics. We further use these patterns to address the outstanding challenge of identifying semantic code clones. We propose the WySiWiM  ( ‘ ‘What You See Is What It Means ” ) approach where visual representations of source code are fed into powerful pre-trained image classification neural networks from the field of computer vision to benefit from the practical advantages of transfer learning. We evaluate the proposed embedding approach on the task of vulnerable code prediction in source code and on two variations of the task of semantic code clone identification: code clone detection (a binary classification problem), and code classification (a multi-classification problem). We show with experiments on the BigCloneBench (Java), Open Judge (C) that although simple, our WySiWiM  approach performs as effectively as state-of-the-art approaches such as ASTNN or TBCNN. We also showed with data from NVD and SARD that WySiWiM  representation can be used to learn a vulnerable code detector with reasonable performance (accuracy ∼90%). We further explore the influence of different steps in our approach, such as the choice of visual representations or the classification algorithm, to eventually discuss the promises and limitations of this research direction.

2022 ◽  
Vol 31 (1) ◽  
pp. 1-46
Chao Liu ◽  
Cuiyun Gao ◽  
Xin Xia ◽  
David Lo ◽  
John Grundy ◽  

Context: Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years. This is because they can often solve many SE challenges without enormous manual feature engineering effort and complex domain knowledge. Objective: Although many DL studies have reported substantial advantages over other state-of-the-art models on effectiveness, they often ignore two factors: (1) reproducibility —whether the reported experimental results can be obtained by other researchers using authors’ artifacts (i.e., source code and datasets) with the same experimental setup; and (2) replicability —whether the reported experimental result can be obtained by other researchers using their re-implemented artifacts with a different experimental setup. We observed that DL studies commonly overlook these two factors and declare them as minor threats or leave them for future work. This is mainly due to high model complexity with many manually set parameters and the time-consuming optimization process, unlike classical supervised machine learning (ML) methods (e.g., random forest). This study aims to investigate the urgency and importance of reproducibility and replicability for DL studies on SE tasks. Method: In this study, we conducted a literature review on 147 DL studies recently published in 20 SE venues and 20 AI (Artificial Intelligence) venues to investigate these issues. We also re-ran four representative DL models in SE to investigate important factors that may strongly affect the reproducibility and replicability of a study. Results: Our statistics show the urgency of investigating these two factors in SE, where only 10.2% of the studies investigate any research question to show that their models can address at least one issue of replicability and/or reproducibility. More than 62.6% of the studies do not even share high-quality source code or complete data to support the reproducibility of their complex models. Meanwhile, our experimental results show the importance of reproducibility and replicability, where the reported performance of a DL model could not be reproduced for an unstable optimization process. Replicability could be substantially compromised if the model training is not convergent, or if performance is sensitive to the size of vocabulary and testing data. Conclusion: It is urgent for the SE community to provide a long-lasting link to a high-quality reproduction package, enhance DL-based solution stability and convergence, and avoid performance sensitivity on different sampled data.

2022 ◽  
Vol 31 (1) ◽  
pp. 1-74
Owain Parry ◽  
Gregory M. Kapfhammer ◽  
Michael Hilton ◽  
Phil McMinn

Tests that fail inconsistently, without changes to the code under test, are described as flaky . Flaky tests do not give a clear indication of the presence of software bugs and thus limit the reliability of the test suites that contain them. A recent survey of software developers found that 59% claimed to deal with flaky tests on a monthly, weekly, or daily basis. As well as being detrimental to developers, flaky tests have also been shown to limit the applicability of useful techniques in software testing research. In general, one can think of flaky tests as being a threat to the validity of any methodology that assumes the outcome of a test only depends on the source code it covers. In this article, we systematically survey the body of literature relevant to flaky test research, amounting to 76 papers. We split our analysis into four parts: addressing the causes of flaky tests, their costs and consequences, detection strategies, and approaches for their mitigation and repair. Our findings and their implications have consequences for how the software-testing community deals with test flakiness, pertinent to practitioners and of interest to those wanting to familiarize themselves with the research area.

2022 ◽  
Vol 12 ◽  
Chu-Qiao Gao ◽  
Yuan-Ke Zhou ◽  
Xiao-Hong Xin ◽  
Hui Min ◽  
Pu-Feng Du

Drug repositioning provides a promising and efficient strategy to discover potential associations between drugs and diseases. Many systematic computational drug-repositioning methods have been introduced, which are based on various similarities of drugs and diseases. In this work, we proposed a new computational model, DDA-SKF (drug–disease associations prediction using similarity kernels fusion), which can predict novel drug indications by utilizing similarity kernel fusion (SKF) and Laplacian regularized least squares (LapRLS) algorithms. DDA-SKF integrated multiple similarities of drugs and diseases. The prediction performances of DDA-SKF are better, or at least comparable, to all state-of-the-art methods. The DDA-SKF can work without sufficient similarity information between drug indications. This allows us to predict new purpose for orphan drugs. The source code and benchmarking datasets are deposited in a GitHub repository (

Yongzheng Zhang ◽  
Huilong Ren

AbstractIn this paper, we present an open-source code for the first-order and higher-order nonlocal operator method (NOM) including a detailed description of the implementation. The NOM is based on so-called support, dual-support, nonlocal operators, and an operate energy functional ensuring stability. The nonlocal operator is a generalization of the conventional differential operators. Combined with the method of weighed residuals and variational principles, NOM establishes the residual and tangent stiffness matrix of operate energy functional through some simple matrix without the need of shape functions as in other classical computational methods such as FEM. NOM only requires the definition of the energy drastically simplifying its implementation. The implementation in this paper is focused on linear elastic solids for sake of conciseness through the NOM can handle more complex nonlinear problems. The NOM can be very flexible and efficient to solve partial differential equations (PDEs), it’s also quite easy for readers to use the NOM and extend it to solve other complicated physical phenomena described by one or a set of PDEs. Finally, we present some classical benchmark problems including the classical cantilever beam and plate-with-a-hole problem, and we also make an extension of this method to solve complicated problems including phase-field fracture modeling and gradient elasticity material.

2022 ◽  
Vol 5 (4) ◽  
pp. e202101124
Elena Rensen ◽  
Stefano Pietropaoli ◽  
Florian Mueller ◽  
Christian Weber ◽  
Sylvie Souquere ◽  

The current COVID-19 pandemic is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The positive-sense single-stranded RNA virus contains a single linear RNA segment that serves as a template for transcription and replication, leading to the synthesis of positive and negative-stranded viral RNA (vRNA) in infected cells. Tools to visualize vRNA directly in infected cells are critical to analyze the viral replication cycle, screen for therapeutic molecules, or study infections in human tissue. Here, we report the design, validation, and initial application of FISH probes to visualize positive or negative RNA of SARS-CoV-2 (CoronaFISH). We demonstrate sensitive visualization of vRNA in African green monkey and several human cell lines, in patient samples and human tissue. We further demonstrate the adaptation of CoronaFISH probes to electron microscopy. We provide all required oligonucleotide sequences, source code to design the probes, and a detailed protocol. We hope that CoronaFISH will complement existing techniques for research on SARS-CoV-2 biology and COVID-19 pathophysiology, drug screening, and diagnostics.

2022 ◽  
Vol 2022 ◽  
pp. 1-13
Yue Wu ◽  
Junxiang Li ◽  
Jiru Zhou ◽  
Shichang Luo ◽  
Liwei Song

Because of its unique decentralization, encryption, reliability, and tamper-proof, the block chain system makes smart contracts break through the shackles of the lack of trusted environment, and its application field keeps expanding. We read the source code and official documents of Bitcoin, Ethereum, and Hyperledger to explore the operation principle and implementation mode of smart contract. By analyzing the evolution process of smart contracts in blockchain and the sequence of its function expansion, according to the multirole business process of supply chain, we design a semipublic smart contract chain model based on Ethereum and Hyperledger in order to provide useful inspiration and help for the future research of smart contracts in blockchain applied in supply chain.

Sign in / Sign up

Export Citation Format

Share Document