scholarly journals Performance tuning for machine learning-based software development effort prediction models

Author(s):  
EGEMEN ERTUĞRUL ◽  
ZAKİR BAYTAR ◽  
ÇAĞATAY ÇATAL ◽  
ÖMER CAN MURATLI

Software development effort estimation is a critical activity of the project management process. In this study, machine learning algorithms were investigated in conjunction with feature transformation, feature selection, and parameter tuning techniques to estimate the development effort accurately and a new model was proposed as part of an expert system. We preferred the most general-purpose algorithms, applied parameter optimization technique (Grid- Search), feature transformation techniques (binning and one-hot-encoding), and feature selection algorithm (principal component analysis). All the models were trained on the ISBSG datasets and implemented by using the scikit-learn package in the Python language. The proposed model uses a multilayer perceptron as its underlying algorithm, applies binning of the features to transform continuous features and one-hot-encoding technique to transform categorical data into numerical values as feature transformation techniques, does feature selection based on the principal component analysis method, and performs parameter tuning based on the GridSearch algorithm. We demonstrate that our effort prediction model mostly outperforms the other existing models in terms of prediction accuracy based on the mean absolute residual parameter.

Processes ◽  
2019 ◽  
Vol 7 (12) ◽  
pp. 928 ◽  
Author(s):  
Miguel De-la-Torre ◽  
Omar Zatarain ◽  
Himer Avila-George ◽  
Mirna Muñoz ◽  
Jimy Oblitas ◽  
...  

This paper explores five multivariate techniques for information fusion on sorting the visual ripeness of Cape gooseberry fruits (principal component analysis, linear discriminant analysis, independent component analysis, eigenvector centrality feature selection, and multi-cluster feature selection.) These techniques are applied to the concatenated channels corresponding to red, green, and blue (RGB), hue, saturation, value (HSV), and lightness, red/green value, and blue/yellow value (L*a*b) color spaces (9 features in total). Machine learning techniques have been reported for sorting the Cape gooseberry fruits’ ripeness. Classifiers such as neural networks, support vector machines, and nearest neighbors discriminate on fruit samples using different color spaces. Despite the color spaces being equivalent up to a transformation, a few classifiers enable better performances due to differences in the pixel distribution of samples. Experimental results show that selection and combination of color channels allow classifiers to reach similar levels of accuracy; however, combination methods still require higher computational complexity. The highest level of accuracy was obtained using the seven-dimensional principal component analysis feature space.


2020 ◽  
Vol 7 (3) ◽  
pp. 565
Author(s):  
Krisan Aprian Widagdo ◽  
Kusworo Adi ◽  
Rahmat Gernowo

<p>Pengamatan citra Pap Smear merupakan langkah yang sangat penting dalam mendiagnosis awal terhadap gangguan servik. Pengamatan tersebut membutuhkan sumber daya yang besar. Dalam hal ini machine learning dapat mengatasi masalah tersebut. Akan tetapi, keakuratan machine learning bergantung pada fitur yang digunakan. Hanya fitur relevan dan diskriminatif yang mampu memberikan hasil klasifikasi akurat. Pada penelitian ini menggabungkan <em>Fisher Score</em> dan <em>Principal Component Analysis </em>(PCA). Pertama Fisher Score memilih fitur relevan berdasarkan perangkingan. Langkah selanjutnya PCA mentransformasikan kandidat fitur menjadi dataset baru yang tidak saling berkorelasi. Metode jaringan syaraf tiruan <em>Backpropagation</em> digunakan untuk mengevaluasi performa kombinasi Fisher Score dan PCA. Model dievaluasi dengan metode 5 <em>fold cross validation</em>. Selain itu kombinasi ini dibandingkan dengan model fitur asli dan model fitur hasil Fscore. Hasil percobaan menunjukkan kombinasi fisher score dan PCA menghasilkan performa terbaik (akurasi 0.964±0.006, Sensitivity 0.990±0.005 dan Specificity 0.889±0.009). Dari segi waktu komputasi, kombinasi Fisher Score dan PCA membutuhkan waktu relative cepat. Penelitian ini membuktikan bahwa penggunaan feature selection dan feature extraction mampu meningkatkan kinerja klasifikasi dengan waktu yang relative singkat.</p><p> </p><p class="Judul2"><strong><em>Abstract</em></strong></p><p class="Judul2"> </p><p class="Abstract"><em>Examination Pap Smear images is an important step to early diagnose cervix dysplasia. It needs a lot of resources. In this case, Machine Learning can solve this problem. However, Machine learning depends on the features used. Only relevant and discriminant features can provide an accurate classification result. In this work, combining feature selection Fisher Score (FScore) and Principal Component Analysis (PCA) is applied. First, FScore selects relevant features based on rangking score. And then PCA transforms candidate features into a new uncorrelated dataset. Artificial Neural Network Backpropagation used to evaluate performance combination FScore PCA. The model evaluated with 5 fold cross validation. The other hand, this combination compared with original features model and FScore model. Experimental result shows the combination of Fscore PCA produced the best performance (Accuracy 0.964±0.006, Sensitivity 0.990±0.005 and Specificity 0.889±0.009). In term of computational time, this combination needed a reasonable time. In this work, it was proved that applying feature selection and feature extraction could improve performance classification with a promising time.</em></p>


Author(s):  
Hyeuk Kim

Unsupervised learning in machine learning divides data into several groups. The observations in the same group have similar characteristics and the observations in the different groups have the different characteristics. In the paper, we classify data by partitioning around medoids which have some advantages over the k-means clustering. We apply it to baseball players in Korea Baseball League. We also apply the principal component analysis to data and draw the graph using two components for axis. We interpret the meaning of the clustering graphically through the procedure. The combination of the partitioning around medoids and the principal component analysis can be used to any other data and the approach makes us to figure out the characteristics easily.


2020 ◽  
Author(s):  
Jiawei Peng ◽  
Yu Xie ◽  
Deping Hu ◽  
Zhenggang Lan

The system-plus-bath model is an important tool to understand nonadiabatic dynamics for large molecular systems. The understanding of the collective motion of a huge number of bath modes is essential to reveal their key roles in the overall dynamics. We apply the principal component analysis (PCA) to investigate the bath motion based on the massive data generated from the MM-SQC (symmetrical quasi-classical dynamics method based on the Meyer-Miller mapping Hamiltonian) nonadiabatic dynamics of the excited-state energy transfer dynamics of Frenkel-exciton model. The PCA method clearly clarifies that two types of bath modes, which either display the strong vibronic couplings or have the frequencies close to electronic transition, are very important to the nonadiabatic dynamics. These observations are fully consistent with the physical insights. This conclusion is obtained purely based on the PCA understanding of the trajectory data, without the large involvement of pre-defined physical knowledge. The results show that the PCA approach, one of the simplest unsupervised machine learning methods, is very powerful to analyze the complicated nonadiabatic dynamics in condensed phase involving many degrees of freedom.


Author(s):  
CUAUHTÉMOC LÓPEZ-MARTÍN ◽  
ALAIN ABRAN

Expert-based effort prediction in software projects can be taught, beginning with the practices learned in an academic environment in courses designed to encourage them. However, the length of such courses is a major concern for both industry and academia. Industry has to work without its employees while they are taking such a course, and academic institutions find it hard to fit the course into an already tight schedule. In this research, the set of Personal Software Process (PSP) practices is reordered and the practices are distributed among fewer assignments, in an attempt to address these concerns. This study involved 148 practitioners taking graduate courses who developed 1,036 software course assignments. The hypothesis on which it is based is the following: When the activities in the original PSP set are reordered into fewer assignments, the result is expert-based effort prediction that is statistically significantly better.


Sign in / Sign up

Export Citation Format

Share Document