Improving Credibility of Machine Learner Models in Software Engineering

Given a choice, software project managers frequently prefer traditional methods of making decisions rather than relying on empirical software engineering (empirical/machine learning-based models). One reason for this choice is the perceived lack of credibility associated with these models. To promote better empirical software engineering, a series of experiments are conducted on various NASA datasets to demonstrate the importance of assessing the ease/difficulty of a modeling situation. Each dataset is divided into three groups, a training set, and “nice/nasty” neighbor test sets. Using a nearest neighbor approach, “nice neighbors” align closest to same class training instances. “Nasty neighbors” align to the opposite class training instances. The “nice”, “nasty” experiments average 94% and 20% accuracy, respectively. Another set of experiments show how a ten-fold cross-validation is not sufficient in characterizing a dataset. Finally, a set of metric equations is proposed for improving the credibility assessment of empirical/machine learning models.

Practitioners’ Perspective on Software Project Management Education

RENOTE ◽

10.22456/1679-1916.99482 ◽

2019 ◽

Vol 17 (3) ◽

pp. 273-284

Author(s):

Maria Lydia Fioravanti ◽

Antonio Cesar Amaru Maximiano ◽

Ellen Francine Barbosa

Keyword(s):

Project Management ◽

Software Engineering ◽

Management Education ◽

Degree Program ◽

Project Managers ◽

Specific Information ◽

Software Project ◽

Software Project Management ◽

Recent Graduates

Despite Software project management (SPM) being one of the most relevant topicsin the area of software engineering that should be addressed in computing programs, SPM skills of recent graduates are not satisfactory yet. In this context, besides being important to know there are skill deficiencies, we also need to gather specific information on how to adjust and improve the education on the corresponding topics. In this paper we attempt to identify what knowledge deficiencies in SPM can persist after a student graduates from a computing degree program. We surveyed practitioners that graduated and worked as software project managers to gather the knowledge deficiencies from the industry perspective. In general, the results indicated that there is a number of professionals who seeks postgraduate programs to fill the deficiencies of the undergrad programs.

Agile Estimation Techniques and Innovative Approaches to Software Process Improvement - Advances in Systems Analysis, Software Engineering, and High Performance Computing ◽

The Influence of Personality Traits on Software Engineering and its Applications

10.4018/978-1-4666-5182-1.ch006 ◽

2014 ◽

pp. 83-95 ◽

Author(s):

Adrián Casado-Rivas ◽

Manuel Muñoz Archidona

Keyword(s):

Software Engineering ◽

Personality Traits ◽

High Performance ◽

Project Managers ◽

Big Five Personality ◽

Software Project ◽

Myers Briggs Type Indicator ◽

Team Members ◽

Engineering Teams ◽

Software Engineers

In Software Engineering, personality traits have helped to better understand the human factor. In this chapter, the authors give an overview of important personality traits theories that have influenced Software Engineering and have been widely adopted. The theories considered are Myers-Briggs Type Indicator, Big Five Personality Traits, and Belbin Roles. The influence of personality traits has provided remarkable benefits to Software Engineering, especially in the making of teams. For software project managers, it is useful to know what set of soft skills correlates to a specific team role so as to analyze how personality traits have contributed to high performance and cohesive software engineering teams. The study of software engineers’ personality traits also helps to motivate team members. Creating teams that involve compatible individuals, each working on tasks that suit them, and having a motivated team improves team performance, productivity, and reduces project costs.

Towards Knowledge Evolution in Software Engineering

Systems Approach Applications for Developments in Information Technology ◽

10.4018/978-1-4666-1562-5.ch002 ◽

2012 ◽

pp. 8-24

Author(s):

Yves Wautelet ◽

Christophe Schinckus ◽

Manuel Kolp

Keyword(s):

Software Engineering ◽

Life Cycles ◽

Project Managers ◽

Software Project ◽

Modeling Languages ◽

Software Projects ◽

Theoretical Frameworks ◽

Knowledge Evolution ◽

Iterative Development ◽

Early Implementation

This article presents an epistemological reading of knowledge evolution in software engineering (SE) both within a software project and into SE theoretical frameworks principally modeling languages and software development life cycles (SDLC). The article envisages SE as an artificial science and notably points to the use of iterative development as a more adequate framework for the enterprise applications. Iterative development has become popular in SE since it allows a more efficient knowledge acquisition process especially in user intensive applications by continuous organizational modeling and requirements acquisition, early implementation and testing, modularity,… SE is by nature a human activity: analysts, designers, developers and other project managers confront their visions of the software system they are building with users’ requirements. The study of software projects’ actors and stakeholders using Simon’s bounded rationality points to the use of an iterative development life cycle. The later, indeed, allows to better apprehend their rationality. Popper’s knowledge growth principle could at first seem suited for the analysis of the knowledge evolution in the SE field. However, this epistemology is better adapted to purely hard sciences as physics than to SE which also takes roots in human activities and by the way in social sciences. Consequently, we will nuance the vision using Lakatosian epistemology notably using his falsification principle criticism on SE as an evolving science. Finally the authors will point to adaptive rationality for a lecture of SE theorists and researchers’ rationality.

Towards Knowledge Evolution in Software Engineering

International Journal of Information Technologies and Systems Approach ◽

10.4018/jitsa.2010100202 ◽

2010 ◽

Vol 3 (1) ◽

pp. 21-40

Author(s):

Yves Wautelet ◽

Christophe Schinckus ◽

Manuel Kolp

Keyword(s):

Software Engineering ◽

Life Cycles ◽

Project Managers ◽

Software Project ◽

Modeling Languages ◽

Software Projects ◽

Theoretical Frameworks ◽

Knowledge Evolution ◽

Iterative Development ◽

Early Implementation

This article presents an epistemological reading of knowledge evolution in software engineering (SE) both within a software project and into SE theoretical frameworks principally modeling languages and software development life cycles (SDLC). The article envisages SE as an artificial science and notably points to the use of iterative development as a more adequate framework for the enterprise applications. Iterative development has become popular in SE since it allows a more efficient knowledge acquisition process especially in user intensive applications by continuous organizational modeling and requirements acquisition, early implementation and testing, modularity,… SE is by nature a human activity: analysts, designers, developers and other project managers confront their visions of the software system they are building with users’ requirements. The study of software projects’ actors and stakeholders using Simon’s bounded rationality points to the use of an iterative development life cycle. The later, indeed, allows to better apprehend their rationality. Popper’s knowledge growth principle could at first seem suited for the analysis of the knowledge evolution in the SE field. However, this epistemology is better adapted to purely hard sciences as physics than to SE which also takes roots in human activities and by the way in social sciences. Consequently, we will nuance the vision using Lakatosian epistemology notably using his falsification principle criticism on SE as an evolving science. Finally the authors will point to adaptive rationality for a lecture of SE theorists and researchers’ rationality.

A pitfall for machine learning methods aiming to predict across cell types

Genome Biology ◽

10.1186/s13059-020-02177-y ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Jacob Schreiber ◽

Ritambhara Singh ◽

Jeffrey Bilmes ◽

William Stafford Noble

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Cell Types ◽

Chromatin Domain ◽

Learning Models ◽

Machine Learning Methods ◽

Domain Boundaries ◽

Average Activity ◽

Test Sets ◽

AbstractMachine learning models that predict genomic activity are most useful when they make accurate predictions across cell types. Here, we show that when the training and test sets contain the same genomic loci, the resulting model may falsely appear to perform well by effectively memorizing the average activity associated with each locus across the training cell types. We demonstrate this phenomenon in the context of predicting gene expression and chromatin domain boundaries, and we suggest methods to diagnose and avoid the pitfall. We anticipate that, as more data becomes available, future projects will increasingly risk suffering from this issue.

The Polypharmacology Browser PPB2: Target Prediction Combining Nearest Neighbors with Machine Learning

10.26434/chemrxiv.6895646.v1 ◽

2018 ◽

Author(s):

Mahendra Awale ◽

Jean-Louis Reymond

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Nearest Neighbor ◽

Target Prediction ◽

Molecular Shape ◽

Public Access ◽

Molecular Fingerprints ◽

Small Molecule Drug ◽

Fold Cross Validation

<div>Here we report PPB2 as a target prediction tool assigning targets to a query molecule based on ChEMBL data. PPB2 computes ligand similarities using molecular fingerprints encoding composition (MQN), molecular shape and pharmacophores (Xfp), and substructures (ECfp4), and features an unprecedented combination of nearest neighbor (NN) searches and Naïve Bayes (NB) machine learning, together with simple NN searches, NB and Deep Neural Network (DNN) machine learning models as further options. Although NN(ECfp4) gives the best results in terms of recall in a 10-fold cross-validation study, combining NN searches with NB machine learning provides superior precision statistics, as well as better results in a case study predicting off-targets of a recently reported TRPV6 calcium channel inhibitor, illustrating the value of this combined approach. PPB2 is available to assess possible off-targets of small molecule drug-like compounds by public access at ppb2.gdb.tools.</div>

Perbandingan Akurasi dan Waktu Proses Algoritma K-NN dan SVM dalam Analisis Sentimen Twitter

Jurnal Informatika ◽

10.31311/ji.v6i2.5129 ◽

2019 ◽

Vol 6 (2) ◽

pp. 226-235

Author(s):

Muhammad Rangga Aziz Nasution ◽

Mardhiya Hayaty

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Unsupervised Learning ◽

Supervised Learning ◽

Cross Validation ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Fold Cross Validation

Salah satu cabang ilmu komputer yaitu pembelajaran mesin (machine learning) menjadi tren dalam beberapa waktu terakhir. Pembelajaran mesin bekerja dengan memanfaatkan data dan algoritma untuk membuat model dengan pola dari kumpulan data tersebut. Selain itu, pembelajaran mesin juga mempelajari bagaimama model yang telah dibuat dapat memprediksi keluaran (output) berdasarkan pola yang ada. Terdapat dua jenis metode pembelajaran mesin yang dapat digunakan untuk analisis sentimen: supervised learning dan unsupervised learning. Penelitian ini akan membandingkan dua algoritma klasifikasi yang termasuk dari supervised learning: algoritma K-Nearest Neighbor dan Support Vector Machine, dengan cara membuat model dari masing-masing algoritma dengan objek teks sentimen. Perbandingan dilakukan untuk mengetahui algoritma mana lebih baik dalam segi akurasi dan waktu proses. Hasil pada perhitungan akurasi menunjukkan bahwa metode Support Vector Machine lebih unggul dengan nilai 89,70% tanpa K-Fold Cross Validation dan 88,76% dengan K-Fold Cross Validation. Sedangkan pada perhitungan waktu proses metode K-Nearest Neighbor lebih unggul dengan waktu proses 0.0160s tanpa K-Fold Cross Validation dan 0.1505s dengan K-Fold Cross Validation.

Debiasing Algorithms for Protein Ligand Binding Data do not Improve Generalisation

10.26434/chemrxiv.8139194.v1 ◽

2019 ◽

Author(s):

Vikram Sundar ◽

Lucy Colwell

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Training Data ◽

Chemical Data ◽

Learning Models ◽

Binding Data ◽

Test Sets ◽

The structured nature of chemical data means machine learning models trained to predict protein-ligand binding risk overfitting the data, impairing their ability to generalise and make accurate predictions for novel candidate ligands. To address this limitation, data debiasing algorithms systematically partition the data to reduce bias. When models are trained using debiased data splits, the reward for simply memorising the training data is reduced, suggesting that the ability of the model to make accurate predictions for novel candidate ligands will improve. To test this hypothesis, we use distance-based data splits to measure how well a model can generalise. We first confirm that models perform better for randomly split held-out sets than for distant held-out sets. We then debias the data and find, surprisingly, that debiasing typically reduces the ability of models to make accurate predictions for distant held-out test sets. These results suggest that debiasing reduces the information available to a model, impairing its ability to generalise.

Debiasing Algorithms for Protein Ligand Binding Data do not Improve Generalisation

10.26434/chemrxiv.8139194 ◽

2019 ◽

Author(s):

Vikram Sundar ◽

Lucy Colwell

Keyword(s):

Machine Learning ◽

Ligand Binding ◽

Training Data ◽

Chemical Data ◽

Learning Models ◽

Binding Data ◽

Test Sets ◽

The structured nature of chemical data means machine learning models trained to predict protein-ligand binding risk overfitting the data, impairing their ability to generalise and make accurate predictions for novel candidate ligands. To address this limitation, data debiasing algorithms systematically partition the data to reduce bias. When models are trained using debiased data splits, the reward for simply memorising the training data is reduced, suggesting that the ability of the model to make accurate predictions for novel candidate ligands will improve. To test this hypothesis, we use distance-based data splits to measure how well a model can generalise. We first confirm that models perform better for randomly split held-out sets than for distant held-out sets. We then debias the data and find, surprisingly, that debiasing typically reduces the ability of models to make accurate predictions for distant held-out test sets. These results suggest that debiasing reduces the information available to a model, impairing its ability to generalise.