An approach for fault prediction in SOA-based systems using machine learning techniques

Purpose Software fault prediction is an important concept that can be applied at an early stage of the software life cycle. Effective prediction of faults may improve the reliability and testability of software systems. As service-oriented architecture (SOA)-based systems become more and more complex, the interaction between participating services increases frequently. The component services may generate enormous reports and fault information. Although considerable research has stressed on developing fault-proneness prediction models in service-oriented systems (SOS) using machine learning (ML) techniques, there has been little work on assessing how effective the source code metrics are for fault prediction. The paper aims to discuss this issue. Design/methodology/approach In this paper, the authors have proposed a fault prediction framework to investigate fault prediction in SOS using metrics of web services. The effectiveness of the model has been explored by applying six ML techniques, namely, Naïve Bayes, Artificial Networks (ANN), Adaptive Boosting (AdaBoost), decision tree, Random Forests and Support Vector Machine (SVM), along with five feature selection techniques to extract the essential metrics. The authors have explored accuracy, precision, recall, f-measure and receiver operating characteristic curves of the area under curve values as performance measures. Findings The experimental results show that the proposed system can classify the fault-proneness of web services, whether the service is faulty or non-faulty, as a binary-valued output automatically and effectively. Research limitations/implications One possible threat to internal validity in the study is the unknown effects of undiscovered faults. Specifically, the authors have injected possible faults into the classes using Java C3.0 tool and only fixed faults are injected into the classes. However, considering the Java C3.0 community of development, testing and use, the authors can generalize that the undiscovered faults should be few and have less impact on the results presented in this study, and that the results may be limited to the investigated complexity metrics and the used ML techniques. Originality/value In the literature, only few studies have been observed to directly concentrate on metrics-based fault-proneness prediction of SOS using ML techniques. However, most of the contributions are regarding the fault prediction of the general systems rather than SOS. A majority of them have considered reliability, changeability, maintainability using a logging/history-based approach and mathematical modeling rather than fault prediction in SOS using metrics. Thus, the authors have extended the above contributions further by applying supervised ML techniques over web services metrics and measured their capability by employing fault injection methods.

Download Full-text

Machine Learning-Based Prediction of Air Quality

Applied Sciences ◽

10.3390/app10249151 ◽

2020 ◽

Vol 10 (24) ◽

pp. 9151

Author(s):

Yun-Chia Liang ◽

Yona Maimury ◽

Angela Hsiang-Ling Chen ◽

Josue Rodolfo Cuevas Juarez

Keyword(s):

Machine Learning ◽

Air Quality ◽

Random Forest ◽

Prediction Models ◽

Superior Performance ◽

Support Vector ◽

Economic Activities ◽

Adaptive Boosting ◽

Series Of Experiments ◽

Artificial Neural Network Ann

Air, an essential natural resource, has been compromised in terms of quality by economic activities. Considerable research has been devoted to predicting instances of poor air quality, but most studies are limited by insufficient longitudinal data, making it difficult to account for seasonal and other factors. Several prediction models have been developed using an 11-year dataset collected by Taiwan’s Environmental Protection Administration (EPA). Machine learning methods, including adaptive boosting (AdaBoost), artificial neural network (ANN), random forest, stacking ensemble, and support vector machine (SVM), produce promising results for air quality index (AQI) level predictions. A series of experiments, using datasets for three different regions to obtain the best prediction performance from the stacking ensemble, AdaBoost, and random forest, found the stacking ensemble delivers consistently superior performance for R2 and RMSE, while AdaBoost provides best results for MAE.

Download Full-text

Forecasting the risk at infractions: an ensemble comparison of machine learning approach

Industrial Management & Data Systems ◽

10.1108/imds-10-2020-0603 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Lei Li ◽

Desheng Wu

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Short Term Memory ◽

Model Performance ◽

Large Data ◽

Support Vector ◽

Learning Approaches ◽

Content Type ◽

Day To Day Operations ◽

Prediction Approach

PurposeThe infraction of securities regulations (ISRs) of listed firms in their day-to-day operations and management has become one of common problems. This paper proposed several machine learning approaches to forecast the risk at infractions of listed corporates to solve financial problems that are not effective and precise in supervision.Design/methodology/approachThe overall proposed research framework designed for forecasting the infractions (ISRs) include data collection and cleaning, feature engineering, data split, prediction approach application and model performance evaluation. We select Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machines, Artificial Neural Network and Long Short-Term Memory Networks (LSTMs) as ISRs prediction models.FindingsThe research results show that prediction performance of proposed models with the prior infractions provides a significant improvement of the ISRs than those without prior, especially for large sample set. The results also indicate when judging whether a company has infractions, we should pay attention to novel artificial intelligence methods, previous infractions of the company, and large data sets.Originality/valueThe findings could be utilized to address the problems of identifying listed corporates' ISRs at hand to a certain degree. Overall, results elucidate the value of the prior infraction of securities regulations (ISRs). This shows the importance of including more data sources when constructing distress models and not only focus on building increasingly more complex models on the same data. This is also beneficial to the regulatory authorities.

Download Full-text

Machine Learning Frameworks in Cancer Detection

E3S Web of Conferences ◽

10.1051/e3sconf/202129701073 ◽

2021 ◽

Vol 297 ◽

pp. 01073

Author(s):

Sabyasachi Pramanik ◽

K. Martin Sagayam ◽

Om Prakash Jena

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Cancer Development ◽

Support Vector ◽

Learning Approaches ◽

Learning Techniques ◽

Fact Finding ◽

Risk Of Cancer

Cancer has been described as a diverse illness with several distinct subtypes that may occur simultaneously. As a result, early detection and forecast of cancer types have graced essentially in cancer fact-finding methods since they may help to improve the clinical treatment of cancer survivors. The significance of categorizing cancer suffers into higher or lower-threat categories has prompted numerous fact-finding associates from the bioscience and genomics field to investigate the utilization of machine learning (ML) algorithms in cancer diagnosis and treatment. Because of this, these methods have been used with the goal of simulating the development and treatment of malignant diseases in humans. Furthermore, the capacity of machine learning techniques to identify important characteristics from complicated datasets demonstrates the significance of these technologies. These technologies include Bayesian networks and artificial neural networks, along with a number of other approaches. Decision Trees and Support Vector Machines which have already been extensively used in cancer research for the creation of predictive models, also lead to accurate decision making. The application of machine learning techniques may undoubtedly enhance our knowledge of cancer development; nevertheless, a sufficient degree of validation is required before these approaches can be considered for use in daily clinical practice. An overview of current machine learning approaches utilized in the simulation of cancer development is presented in this paper. All of the supervised machine learning approaches described here, along with a variety of input characteristics and data samples, are used to build the prediction models. In light of the increasing trend towards the use of machine learning methods in biomedical research, we offer the most current papers that have used these approaches to predict risk of cancer or patient outcomes in order to better understand cancer.

Download Full-text

Developing a Hyperparameter Tuning Based Machine Learning Approach of Heart Disease Prediction

Journal of Applied Science & Process Engineering ◽

10.33736/jaspe.2639.2020 ◽

2020 ◽

Vol 7 (2) ◽

pp. 631-647

Author(s):

Emrana Kabir Hashi ◽

Md. Shahid Uz Zaman

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Prediction Models ◽

Traditional Approach ◽

Machine Learning Techniques ◽

Support Vector ◽

Disease Prediction ◽

K Nearest Neighbor ◽

Traditional System ◽

Prediction Approach

Machine learning techniques are widely used in healthcare sectors to predict fatal diseases. The objective of this research was to develop and compare the performance of the traditional system with the proposed system that predicts the heart disease implementing the Logistic regression, K-nearest neighbor, Support vector machine, Decision tree, and Random Forest classification models. The proposed system helped to tune the hyperparameters using the grid search approach to the five mentioned classification algorithms. The performance of the heart disease prediction system is the major research issue. With the hyperparameter tuning model, it can be used to enhance the performance of the prediction models. The achievement of the traditional and proposed system was evaluated and compared in terms of accuracy, precision, recall, and F1 score. As the traditional system achieved accuracies between 81.97% and 90.16%., the proposed hyperparameter tuning model achieved accuracies in the range increased between 85.25% and 91.80%. These evaluations demonstrated that the proposed prediction approach is capable of achieving more accurate results compared with the traditional approach in predicting heart disease with the acquisition of feasible performance.

Download Full-text

Ensemble Techniques-Based Software Fault Prediction in an Open-Source Project

Research Anthology on Usage and Development of Open Source Software ◽

10.4018/978-1-7998-9158-1.ch036 ◽

2021 ◽

pp. 693-709

Author(s):

Wasiur Rhmann ◽

Gufran Ahmad Ansari

Keyword(s):

Machine Learning ◽

Open Source ◽

Software Testing ◽

Prediction Models ◽

Fault Prediction ◽

Machine Learning Techniques ◽

Data Repository ◽

Software Fault Prediction ◽

Ensemble Models ◽

Software Fault

Software engineering repositories have been attracted by researchers to mine useful information about the different quality attributes of the software. These repositories have been helpful to software professionals to efficiently allocate various resources in the life cycle of software development. Software fault prediction is a quality assurance activity. In fault prediction, software faults are predicted before actual software testing. As exhaustive software testing is impossible, the use of software fault prediction models can help the proper allocation of testing resources. Various machine learning techniques have been applied to create software fault prediction models. In this study, ensemble models are used for software fault prediction. Change metrics-based data are collected for an open-source android project from GIT repository and code-based metrics data are obtained from PROMISE data repository and datasets kc1, kc2, cm1, and pc1 are used for experimental purpose. Results showed that ensemble models performed better compared to machine learning and hybrid search-based algorithms. Bagging ensemble was found to be more effective in the prediction of faults in comparison to soft and hard voting.

Download Full-text

A Novel Approach of Weighted Support Vector Machine with Applied Chance Theory for Forecasting Air Pollution Phenomenon in Egypt

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026818500013 ◽

2018 ◽

Vol 17 (01) ◽

pp. 1850001 ◽

Cited By ~ 4

Author(s):

Nabil Mohamed Eldakhly ◽

Magdy Aboul-Ela ◽

Areeg Abdalla

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Prediction Models ◽

Learning Algorithms ◽

Management Control ◽

Air Pollutant ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Chance Theory

The particulate matter air pollutant of diameter less than 10 micrometers (PM[Formula: see text]), a category of pollutants including solid and liquid particles, can be a health hazard for several reasons: it can harm lung tissues and throat, aggravate asthma and increase respiratory illness. Accurate prediction models of PM[Formula: see text] concentrations are essential for proper management, control, and making public warning strategies. Therefore, machine learning techniques have the capability to develop methods or tools that can be used to discover unseen patterns from given data to solve a particular task or problem. The chance theory has advanced concepts pertinent to treat cases where both randomness and fuzziness play simultaneous roles at one time. The main objective is to study the modification of a single machine learning algorithm — support vector machine (SVM) — applying the chance weight of the target variable, based on the chance theory, to the corresponding dataset point to be superior to the ensemble machine learning algorithms. The results of this study are outperforming of the SVM algorithms when modifying and combining with the right theory/technique, especially the chance theory over other modern ensemble learning algorithms.

Download Full-text

Software fault prediction using machine learning techniques with metric thresholds

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-210061 ◽

2021 ◽

Vol 25 (2) ◽

pp. 159-172

Author(s):

Raed Shatnawi

Keyword(s):

Machine Learning ◽

Software Metrics ◽

Prediction Models ◽

Object Oriented ◽

Fault Prediction ◽

Machine Learning Techniques ◽

Threshold Values ◽

Data Imbalance ◽

Learning Techniques

BACKGROUND: Fault data is vital to predicting the fault-proneness in large systems. Predicting faulty classes helps in allocating the appropriate testing resources for future releases. However, current fault data face challenges such as unlabeled instances and data imbalance. These challenges degrade the performance of the prediction models. Data imbalance happens because the majority of classes are labeled as not faulty whereas the minority of classes are labeled as faulty. AIM: The research proposes to improve fault prediction using software metrics in combination with threshold values. Statistical techniques are proposed to improve the quality of the datasets and therefore the quality of the fault prediction. METHOD: Threshold values of object-oriented metrics are used to label classes as faulty to improve the fault prediction models The resulting datasets are used to build prediction models using five machine learning techniques. The use of threshold values is validated on ten large object-oriented systems. RESULTS: The models are built for the datasets with and without the use of thresholds. The combination of thresholds with machine learning has improved the fault prediction models significantly for the five classifiers. CONCLUSION: Threshold values can be used to label software classes as fault-prone and can be used to improve machine learners in predicting the fault-prone classes.

Download Full-text

Estimation of cellularity in tumours treated with Neoadjuvant therapy: A comparison of Machine Learning algorithms

10.1101/2020.04.09.034348 ◽

2020 ◽

Author(s):

Mauricio Alberto Ortega-Ruíz ◽

Cefa Karabağ ◽

Victor García Garduño ◽

Constantino Carlos Reyes-Aldasoro

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Neoadjuvant Treatment ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Residual Tumour ◽

Adaptive Boosting ◽

Prediction Probability ◽

Vector Machines

AbstractThis paper describes a method for residual tumour cellularity (TC) estimation in Neoadjuvant treatment (NAT) of advanced breast cancer. This is determined manually by visual inspection by a radiologist, then an automated computation will contribute to reduce time workload and increase precision and accuracy. TC is estimated as the ratio of tumour area by total image area estimated after the NAT. The method proposed computes TC by using machine learning techniques trained with information on morphological parameters of segmented nuclei in order to classify regions of the image as tumour or normal. The data is provided by the 2019 SPIE Breast challenge, which was proposed to develop automated TC computation algorithms. Three algorithms were implemented: Support Vector Machines, Nearest K-means and Adaptive Boosting (AdaBoost) decision trees. Performance based on accuracy is compared and evaluated and the best result was obtained with Support Vector Machines. Results obtained by the methods implemented were submitted during ongoing challenge with a maximum of 0.76 of prediction probability of success.

Download Full-text

On the determinants and prediction of corporate financial distress in India

Managerial Finance ◽

10.1108/mf-06-2020-0332 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Sanjay Sehgal ◽

Ritesh Kumar Mishra ◽

Florent Deisting ◽

Rupali Vashisht

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Financial Distress ◽

Prediction Accuracy ◽

Prediction Models ◽

Accounting Information ◽

Support Vector ◽

Content Type ◽

Distress Prediction ◽

Practical Implications

PurposeThe main aim of the study is to identify some critical microeconomic determinants of financial distress and to design a parsimonious distress prediction model for an emerging economy like India. In doing so, the authors also attempt to compare the forecasting accuracy of alternative distress prediction techniques.Design/methodology/approachIn this study, the authors use two alternatives accounting information-based definitions of financial distress to construct a measure of financial distress. The authors then use the binomial logit model and two other popular machine learning–based models, namely artificial neural network and support vector machine, to compare the distress prediction accuracy rate of these alternative techniques for the Indian corporate sector.FindingsThe study’s empirical results suggest that five financial ratios, namely return on capital employed, cash flows to total liability, asset turnover ratio, fixed assets to total assets, debt to equity ratio and a measure of firm size (log total assets), play a highly significant role in distress prediction. The study’s findings suggest that machine learning-based models, namely support vector machine (SVM) and artificial neural network (ANN), are superior in terms of their prediction accuracy compared to the simple binomial logit model. Results also suggest that one-year-ahead forecasts are relatively better than the two-year-ahead forecasts.Practical implicationsThe findings of the study have some important practical implications for creditors, policymakers, regulators and other stakeholders. First, rather than monitoring and collecting information on a list of predictor variables, only six most important accounting ratios may be monitored to track the transition of a healthy firm into financial distress. Second, our six-factor model can be used to devise a sound early warning system for corporate financial distress. Three, machine learning–based distress prediction models have prediction accuracy superiority over the commonly used time series model in the available literature for distress prediction involving a binary dependent variable.Originality/valueThis study is one of the first comprehensive attempts to investigate and design a parsimonious distress prediction model for the emerging Indian economy which is currently facing high levels of corporate financial distress. Unlike the previous studies, the authors use two different accounting information-based measures of financial distress in order to identify an effective way of measuring financial distress. Some of the determinants of financial distress identified in this study are different from the popular distress prediction models used in the literature. Our distress prediction model can be useful for the other emerging markets for distress prediction.

Download Full-text

An investigation of the factors influencing cost system functionality using decision trees, support vector machines and logistic regression

International Journal of Accounting and Information Management ◽

10.1108/ijaim-04-2017-0052 ◽

2019 ◽

Vol 27 (1) ◽

pp. 27-55 ◽

Cited By ~ 1

Author(s):

Cemil Kuzey ◽

Ali Uyar ◽

Dursun Delen

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Support Vector Machines ◽

Decision Trees ◽

Prediction Models ◽

Support Vector ◽

Content Type ◽

Cost System ◽

Factors Influencing ◽

Vector Machines

Purpose The paper aims to identify and critically analyze the factors influencing cost system functionality (CSF) using several machine learning techniques including decision trees, support vector machines and logistic regression. Design/methodology/approach The study used a self-administered survey method to collect the necessary data from companies conducting business in Turkey. Several prediction models are developed and tested; a series of sensitivity analyses is performed on the developed prediction models to assess the ranked importance of factors/variables. Findings Certain factors/variables influence CSF much more than others. The findings of the study suggest that utilization of management accounting practices require a functional cost system, which is supported by a comprehensive cost data management process (i.e. acquisition, storage and utilization). Research limitations/implications The underlying data were collected using a questionnaire survey; thus, it is subjective which reflects the perceptions of the respondents. Ideally, it is expected to reflect the objective of the practices of the firms. Second, the authors have measured CSF it on a “Yes” or “No” basis which does not allow survey respondents reply in between them; thus, it might have limited the choices of the respondents. Third, the Likert scales adopted in the measurement of the other constructs might be limiting the answers of the respondents. Practical implications Information technology plays a very important role for the success of CSF practices. That is, successful implementation of a functional cost system relies heavily on a fully integrated information infrastructure capable of constantly feeding CSF with accurate, relevant and timely data. Originality/value In addition to providing evidence regarding the factors underlying CSF based on a broad range of industries interesting finding, this study also illustrates the viability of machine learning methods as a research framework to critically analyze domain specific data.

Download Full-text