PREDICTING SOFTWARE CHANGE IN AN OPEN SOURCE SOFTWARE USING MACHINE LEARNING ALGORITHMS

Author(s):  
RUCHIKA MALHOTRA ◽  
ANKITA JAIN BANSAL

Due to various reasons such as ever increasing demands of the customer or change in the environment or detection of a bug, changes are incorporated in a software. This results in multiple versions or evolving nature of a software. Identification of parts of a software that are more prone to changes than others is one of the important activities. Identifying change prone classes will help developers to take focused and timely preventive actions on the classes of the software with similar characteristics in the future releases. In this paper, we have studied the relationship between various object oriented (OO) metrics and change proneness. We collected a set of OO metrics and change data of each class that appeared in two versions of an open source dataset, 'Java TreeView', i.e., version 1.1.6 and version 1.0.3. Besides this, we have also predicted various models that can be used to identify change prone classes, using machine learning and statistical techniques and then compared their performance. The results are analyzed using Area Under the Curve (AUC) obtained from Receiver Operating Characteristics (ROC) analysis. The results show that the models predicted using both machine learning and statistical methods demonstrate good performance in terms of predicting change prone classes. Based on the results, it is reasonable to claim that quality models have a significant relevance with OO metrics and hence can be used by researchers for early prediction of change prone classes.

2019 ◽  
Vol 37 (15_suppl) ◽  
pp. 2581-2581 ◽  
Author(s):  
Paul Johannet ◽  
Nicolas Coudray ◽  
George Jour ◽  
Douglas MacArthur Donnelly ◽  
Shirin Bajaj ◽  
...  

2581 Background: There is growing interest in optimizing patient selection for treatment with immune checkpoint inhibitors (ICIs). We postulate that phenotypic features present in metastatic melanoma tissue reflect the biology of tumor cells, immune cells, and stromal tissue, and hence can provide predictive information about tumor behavior. Here, we test the hypothesis that machine learning algorithms can be trained to predict the likelihood of response and/or toxicity to ICIs. Methods: We examined 124 stage III/IV melanoma patients who received anti-CTLA-4 (n = 81), anti-PD-1 (n = 25), or combination (n = 18) therapy as first line. The tissue analyzed was resected before treatment with ICIs. In total, 340 H&E slides were digitized and annotated for three regions of interest: tumor, lymphocytes, and stroma. The slides were then partitioned into training (n = 285), validation (n = 26), and test (n = 29) sets. Slides were tiled (299x299 pixels) at 20X magnification. We trained a deep convolutional neural network (DCNN) to automatically segment the images into each of the three regions and then deconstruct images into their component features to detect non-obvious patterns with objectivity and reproducibility. We then trained the DCNN for two classifications: 1) complete/partial response versus progression of disease (POD), and 2) severe versus no immune-related adverse events (irAEs). Predictive accuracy was estimated by area under the curve (AUC) of receiver operating characteristics (ROC). Results: The DCNN identified tumor within LN with AUC 0.987 and within ST with AUC 0.943. Prediction of POD based on ST-only always performed better than prediction based on LN-only (AUC 0.84 compared to 0.61, respectively). The DCNN had an average AUC 0.69 when analyzing only tumor regions from both LN and ST data sets and AUC 0.68 when analyzing tumor and lymphocyte regions. Severe irAEs were predicted with limited accuracy (AUC 0.53). Conclusions: Our results support the potential application of machine learning on pre-treatment histologic slides to predict response to ICIs. It also revealed their limited value in predicting toxicity. We are currently investigating whether the predictive capability of the algorithm can be further improved by incorporating additional immunologic biomarkers.


Software maintainability is a vital quality aspect as per ISO standards. This has been a concern since decades and even today, it is of top priority. At present, majority of the software applications, particularly open source software are being developed using Object-Oriented methodologies. Researchers in the earlier past have used statistical techniques on metric data extracted from software to evaluate maintainability. Recently, machine learning models and algorithms are also being used in a majority of research works to predict maintainability. In this research, we performed an empirical case study on an open source software jfreechart by applying machine learning algorithms. The objective was to study the relationships between certain metrics and maintainability.


2021 ◽  
Vol 15 ◽  
Author(s):  
Alexandre Routier ◽  
Ninon Burgos ◽  
Mauricio Díaz ◽  
Michael Bacci ◽  
Simona Bottani ◽  
...  

We present Clinica (www.clinica.run), an open-source software platform designed to make clinical neuroscience studies easier and more reproducible. Clinica aims for researchers to (i) spend less time on data management and processing, (ii) perform reproducible evaluations of their methods, and (iii) easily share data and results within their institution and with external collaborators. The core of Clinica is a set of automatic pipelines for processing and analysis of multimodal neuroimaging data (currently, T1-weighted MRI, diffusion MRI, and PET data), as well as tools for statistics, machine learning, and deep learning. It relies on the brain imaging data structure (BIDS) for the organization of raw neuroimaging datasets and on established tools written by the community to build its pipelines. It also provides converters of public neuroimaging datasets to BIDS (currently ADNI, AIBL, OASIS, and NIFD). Processed data include image-valued scalar fields (e.g., tissue probability maps), meshes, surface-based scalar fields (e.g., cortical thickness maps), or scalar outputs (e.g., regional averages). These data follow the ClinicA Processed Structure (CAPS) format which shares the same philosophy as BIDS. Consistent organization of raw and processed neuroimaging files facilitates the execution of single pipelines and of sequences of pipelines, as well as the integration of processed data into statistics or machine learning frameworks. The target audience of Clinica is neuroscientists or clinicians conducting clinical neuroscience studies involving multimodal imaging, and researchers developing advanced machine learning algorithms applied to neuroimaging data.


2020 ◽  
Author(s):  
Adriana Tomic ◽  
Ivan Tomic ◽  
Levi Waldron ◽  
Ludwig Geistlinger ◽  
Max Kuhn ◽  
...  

AbstractData analysis and knowledge discovery has become more and more important in biology and medicine with the increasing complexity of the biological datasets, but necessarily sophisticated programming skills and in-depth understanding of algorithms needed pose barriers to most biologists and clinicians to perform such research. We have developed a modular open-source software SIMON to facilitate the application of 180+ state-of-the-art machine learning algorithms to high-dimensional biomedical data. With an easy to use graphical user interface, standardized pipelines, automated approach for machine learning and other statistical analysis methods, SIMON helps to identify optimal algorithms and provides a resource that empowers non-technical and technical researchers to identify crucial patterns in biomedical data.


2021 ◽  
Vol 94 (1126) ◽  
pp. 20210221
Author(s):  
Bino Abel Varghese ◽  
Heeseop Shin ◽  
Bhushan Desai ◽  
Ali Gholamrezanezhad ◽  
Xiaomeng Lei ◽  
...  

Objectives For optimal utilization of healthcare resources, there is a critical need for early identification of COVID-19 patients at risk of poor prognosis as defined by the need for intensive unit care and mechanical ventilation. We tested the feasibility of chest X-ray (CXR)-based radiomics metrics to develop machine-learning algorithms for predicting patients with poor outcomes. Methods In this Institutional Review Board (IRB) approved, Health Insurance Portability and Accountability Act (HIPAA) compliant, retrospective study, we evaluated CXRs performed around the time of admission from 167 COVID-19 patients. Of the 167 patients, 68 (40.72%) required intensive care during their stay, 45 (26.95%) required intubation, and 25 (14.97%) died. Lung opacities were manually segmented using ITK-SNAP (open-source software). CaPTk (open-source software) was used to perform 2D radiomics analysis. Results Of all the algorithms considered, the AdaBoost classifier performed the best with AUC = 0.72 to predict the need for intubation, AUC = 0.71 to predict death, and AUC = 0.61 to predict the need for admission to the intensive care unit (ICU). AdaBoost had similar performance with ElasticNet in predicting the need for admission to ICU. Analysis of the key radiomic metrics that drive model prediction and performance showed the importance of first-order texture metrics compared to other radiomics panel metrics. Using a Venn-diagram analysis, two first-order texture metrics and one second-order texture metric that consistently played an important role in driving model performance in all three outcome predictions were identified. Conclusions: Considering the quantitative nature and reliability of radiomic metrics, they can be used prospectively as prognostic markers to individualize treatment plans for COVID-19 patients and also assist with healthcare resource management. Advances in knowledge We report on the performance of CXR-based imaging metrics extracted from RT-PCR positive COVID-19 patients at admission to develop machine-learning algorithms for predicting the need for ICU, the need for intubation, and mortality, respectively.


Author(s):  
Noëmi Rebecca Meier ◽  
Thomas M. Sutter ◽  
Marc Jacobsen ◽  
Tom H. M. Ottenhoff ◽  
Julia E. Vogt ◽  
...  

RationaleTuberculosis diagnosis in children remains challenging. Microbiological confirmation of tuberculosis disease is often lacking, and standard immunodiagnostic including the tuberculin skin test and interferon-γ release assay for tuberculosis infection has limited sensitivity. Recent research suggests that inclusion of novel Mycobacterium tuberculosis antigens has the potential to improve standard immunodiagnostic tests for tuberculosis.ObjectiveTo identify optimal antigen–cytokine combinations using novel Mycobacterium tuberculosis antigens and cytokine read-outs by machine learning algorithms to improve immunodiagnostic assays for tuberculosis.MethodsA total of 80 children undergoing investigation of tuberculosis were included (15 confirmed tuberculosis disease, five unconfirmed tuberculosis disease, 28 tuberculosis infection and 32 unlikely tuberculosis). Whole blood was stimulated with 10 novel Mycobacterium tuberculosis antigens and a fusion protein of early secretory antigenic target (ESAT)-6 and culture filtrate protein (CFP) 10. Cytokines were measured using xMAP multiplex assays. Machine learning algorithms defined a discriminative classifier with performance measured using area under the receiver operating characteristics.Measurements and main resultsWe found the following four antigen–cytokine pairs had a higher weight in the discriminative classifier compared to the standard ESAT-6/CFP-10-induced interferon-γ: Rv2346/47c- and Rv3614/15c-induced interferon-gamma inducible protein-10; Rv2031c-induced granulocyte-macrophage colony-stimulating factor and ESAT-6/CFP-10-induced tumor necrosis factor-α. A combination of the 10 best antigen–cytokine pairs resulted in area under the curve of 0.92 ± 0.04.ConclusionWe exploited the use of machine learning algorithms as a key tool to evaluate large immunological datasets. This identified several antigen–cytokine pairs with the potential to improve immunodiagnostic tests for tuberculosis in children.


2021 ◽  
Vol 11 (12) ◽  
pp. 5690
Author(s):  
Mamdouh Alenezi

The evolution of software is necessary for the success of software systems. Studying the evolution of software and understanding it is a vocal topic of study in software engineering. One of the primary concepts of software evolution is that the internal quality of a software system declines when it evolves. In this paper, the method of evolution of the internal quality of object-oriented open-source software systems has been examined by applying a software metric approach. More specifically, we analyze how software systems evolve over versions regarding size and the relationship between size and different internal quality metrics. The results and observations of this research include: (i) there is a significant difference between different systems concerning the LOC variable (ii) there is a significant correlation between all pairwise comparisons of internal quality metrics, and (iii) the effect of complexity and inheritance on the LOC was positive and significant, while the effect of Coupling and Cohesion was not significant.


2021 ◽  
Vol 22 (S3) ◽  
Author(s):  
Junyi Li ◽  
Huinian Li ◽  
Xiao Ye ◽  
Li Zhang ◽  
Qingzhe Xu ◽  
...  

Abstract Background The prediction of long non-coding RNA (lncRNA) has attracted great attention from researchers, as more and more evidence indicate that various complex human diseases are closely related to lncRNAs. In the era of bio-med big data, in addition to the prediction of lncRNAs by biological experimental methods, many computational methods based on machine learning have been proposed to make better use of the sequence resources of lncRNAs. Results We developed the lncRNA prediction method by integrating information-entropy-based features and machine learning algorithms. We calculate generalized topological entropy and generate 6 novel features for lncRNA sequences. By employing these 6 features and other features such as open reading frame, we apply supporting vector machine, XGBoost and random forest algorithms to distinguish human lncRNAs. We compare our method with the one which has more K-mer features and results show that our method has higher area under the curve up to 99.7905%. Conclusions We develop an accurate and efficient method which has novel information entropy features to analyze and classify lncRNAs. Our method is also extendable for research on the other functional elements in DNA sequences.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Martin Saveski ◽  
Edmond Awad ◽  
Iyad Rahwan ◽  
Manuel Cebrian

AbstractAs groups are increasingly taking over individual experts in many tasks, it is ever more important to understand the determinants of group success. In this paper, we study the patterns of group success in Escape The Room, a physical adventure game in which a group is tasked with escaping a maze by collectively solving a series of puzzles. We investigate (1) the characteristics of successful groups, and (2) how accurately humans and machines can spot them from a group photo. The relationship between these two questions is based on the hypothesis that the characteristics of successful groups are encoded by features that can be spotted in their photo. We analyze >43K group photos (one photo per group) taken after groups have completed the game—from which all explicit performance-signaling information has been removed. First, we find that groups that are larger, older and more gender but less age diverse are significantly more likely to escape. Second, we compare humans and off-the-shelf machine learning algorithms at predicting whether a group escaped or not based on the completion photo. We find that individual guesses by humans achieve 58.3% accuracy, better than random, but worse than machines which display 71.6% accuracy. When humans are trained to guess by observing only four labeled photos, their accuracy increases to 64%. However, training humans on more labeled examples (eight or twelve) leads to a slight, but statistically insignificant improvement in accuracy (67.4%). Humans in the best training condition perform on par with two, but worse than three out of the five machine learning algorithms we evaluated. Our work illustrates the potentials and the limitations of machine learning systems in evaluating group performance and identifying success factors based on sparse visual cues.


Sign in / Sign up

Export Citation Format

Share Document