Empirical Assessment of Machine Learning Techniques for Software Requirements Risk Prediction

Rashid Naseem; Zain Shaukat; Muhammad Irfan; Muhammad Arif Shah; Arshad Ahmad; Fazal Muhammad; Adam Glowacz; Larisa Dunai; Jose Antonino-Daviu; Adel Sulaiman

doi:10.3390/electronics10020168

Empirical Assessment of Machine Learning Techniques for Software Requirements Risk Prediction

Electronics ◽

10.3390/electronics10020168 ◽

2021 ◽

Vol 10 (2) ◽

pp. 168

Author(s):

Rashid Naseem ◽

Zain Shaukat ◽

Muhammad Irfan ◽

Muhammad Arif Shah ◽

Arshad Ahmad ◽

...

Keyword(s):

Machine Learning ◽

Risk Prediction ◽

Naive Bayes ◽

Absolute Error ◽

Naïve Bayes ◽

Error Rates ◽

Machine Learning Techniques ◽

Decision Table ◽

Software Project ◽

Learning Techniques

Software risk prediction is the most sensitive and crucial activity of Software Development Life Cycle (SDLC). It may lead to the success or failure of a project. The risk should be predicted earlier to make a software project successful. A model is proposed for the prediction of software requirement risks using requirement risk dataset and machine learning techniques. In addition, a comparison is made between multiple classifiers that are K-Nearest Neighbour (KNN), Average One Dependency Estimator (A1DE), Naïve Bayes (NB), Composite Hypercube on Iterated Random Projection (CHIRP), Decision Table (DT), Decision Table/Naïve Bayes Hybrid Classifier (DTNB), Credal Decision Trees (CDT), Cost-Sensitive Decision Forest (CS-Forest), J48 Decision Tree (J48), and Random Forest (RF) achieve the best suited technique for the model according to the nature of dataset. These techniques are evaluated using various evaluation metrics including CCI (correctly Classified Instances), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE), Root Relative Squared Error (RRSE), precision, recall, F-measure, Matthew’s Correlation Coefficient (MCC), Receiver Operating Characteristic Area (ROC area), Precision-Recall Curves area (PRC area), and accuracy. The inclusive outcome of this study shows that in terms of reducing error rates, CDT outperforms other techniques achieving 0.013 for MAE, 0.089 for RMSE, 4.498% for RAE, and 23.741% for RRSE. However, in terms of increasing accuracy, DT, DTNB, and CDT achieve better results.

Download Full-text

Sentiment Analysis using various Machine Learning and Deep Learning Techniques

Journal of the Nigerian Society of Physical Sciences ◽

10.46481/jnsps.2021.308 ◽

2021 ◽

pp. 385-394

Author(s):

V Umarani ◽

A Julian ◽

J Deepa

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Process ◽

Learning Techniques

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.

Download Full-text

Sentiment Analysis of Tweets on the COVID-19 Pandemic Using Machine Learning Techniques

Handbook of Research on Innovations and Applications of AI, IoT, and Cognitive Technologies - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-6870-5.ch021 ◽

2021 ◽

pp. 310-320

Author(s):

Jothikumar R. ◽

Vijay Anand R. ◽

Visu P. ◽

Kumar R. ◽

Susi S. ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Respiratory Tract ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Respiratory Tract Diseases ◽

Thought Processes ◽

Learning Techniques

Sentiment evaluation alludes to separate the sentiments from the characteristic language and to perceive the mentality about the exact theme. Novel corona infection, a harmful malady ailment, is spreading out of the blue through the quarter, which thought processes respiratory tract diseases that can change from gentle to extraordinary levels. Because of its quick nature of spreading and no conceived cure, it ushered in a vibe of stress and pressure. In this chapter, a framework perusing principally based procedure is utilized to discover the musings of the tweets related to COVID and its effect lockdown. The chapter examines the tweets identified with the hash tags of crown infection and lockdown. The tweets were marked fabulous, negative, or fair, and a posting of classifiers has been utilized to investigate the precision and execution. The classifiers utilized have been under the four models which incorporate decision tree, regression, helpful asset vector framework, and naïve Bayes forms.

Download Full-text

Peningkatan Performa Pendeteksian Anomali Menggunakan Ensemble Learning dan Feature Selection

Creative Information Technology Journal ◽

10.24076/citec.2020v7i1.238 ◽

2021 ◽

Vol 7 (1) ◽

pp. 1

Author(s):

Ripto Sudiyarno ◽

Arief Setyanto ◽

Emha Taufiq Luthfi

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Ensemble Learning ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Detection Systems ◽

Learning Techniques ◽

Performance Results

Intrusion detection systems (IDS) atau Sistem pendeteksian intrusi dikenal sebagai teknik yang sangat menonjol dan terkemuka untuk menemukan malicious activities pada jaringan komputer, tidak seperti firewall konvensional, IDS berbeda dalam hal pengidentifikasian serangan secara cerdas dengan pendekatan analitik seperti data mining dan teknik machine learning. Dalam beberapa dekade terakhir, ensemble learning sangat memajukan penelitian pada machine learning dan klasifikasi pola, serta menunjukan peningkatan hasil kinerja dibandingkan single classifier. Pada Penelitian ini dilakukan percobaan peningkatan nilai akurasi terhadap sistem pendeteksian anomali, pertama dilakukan klasifikasi menggunakan single classifier untuk didapati hasil nilai akurasi yang nantinya dibandingkan dengan hasil dari ensemble learning dan feature selection. Penggunaan ensemble learning bertujuan untuk mendapatkan nilai akurasi yang terbaik dari single classifier. Hasil didapatkan dari nilai confusion matrix dan akan dilakukan pengujian dengan cara membandingkan nilai kedua metode diatas. Penelitian berhasil mendapatkan nilai akurasi single classifier (naïve bayes) yaitu 77,4% dan nilai ensemble learning 96,8%. Kata Kunci— ensemble learning, nsl-kdd, naïve bayes, anomali, feature selectionIntrusion detection systems (IDS) are known as very prominent and leading techniques for finding malicious activities on computer networks, unlike conventional firewalls, IDS differs in terms of identifying attacks intelligently with analytic approaches such as machine learning techniques. In the last few decades, ensemble learning has greatly advanced research in machine learning and pattern classification it has shown an improve in performance results compared to a single classifier. In this study an attempt was made to increase the accuracy of anomalous detection systems, first by classification using a single classifier to find the results of accuracy which will be compared with the results of ensemble learning and feature selection. The use of ensemble learning aims to get the best accuracy value from a single classifier. The results are obtained from the value of the confusion matrix and will be tested by comparing the values of the two methods above. The research succeeded in getting a single classifier accuracy value of 77,4% and ensemble learning 96,8%. Keywords— ensemble learning, nsl-kdd, naïve bayes, anomali, feature selection

Download Full-text

Network Malware Detection using Soft Computing and Machine Learning Techniques

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1654.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 879-885

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Anomaly Detection ◽

Soft Computing ◽

Naive Bayes ◽

Malware Detection ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Network Anomaly Detection

In today’s world there is rapid increase in the information which makes addressing of security issues more important. Malware detection is an important area for research in effective and secure functioning of computer networks. Research efforts are required to protect the systems from various security attacks. In this paper, we analyze usefulness of Soft Computing and Machine Learning Techniques for network malware detection. Hamamoto et al. [1] used combination of Genetic Algorithm and Fuzzy logic for implementation of network anomaly detection. The research work proposed in this paper extends the concepts discussed in [1]. The proposed work explores use of various Machine Learning algorithms such as K-Nearest Neighbor, Naïve Bayes and Decision Tree for network anomaly detection. The experimental observations are conducted on CIDDS (Coburg Intrusion Detection Data Set) dataset [14]. It is observed that Decision Tree approach gave better results as compared to KNN and Naïve Bayes techniques. Decision Tree technique gives 99% of accuracy and precision of 1 and recall of 1.

Download Full-text

Breast Cancer Prediction Using Classification Techniques of Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2022.39743 ◽

2022 ◽

Vol 10 (1) ◽

pp. 51-57

Author(s):

Angela More

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Decision Tree Classifier ◽

Learning Techniques ◽

Tree Classifier ◽

Abstract Data

Abstract: Data analytics play vital roles in diagnosis and treatment in the health care sector. To enable practitioner decisionmaking, huge volumes of data should be processed with machine learning techniques to produce tools for prediction and classification Breast Cancer reports 1 million cases per year. We have proposed a prediction model, which is specifically designed for prediction of Breast Cancer using Machine learning algorithms Decision tree classifier, Naïve Bayes, SVM and KNearest Neighbour algorithms. The model predicts the type of tumour, the tumour can be benign (noncancerous) or malignant (cancerous) . The model uses supervised learning which is a machine learning concept where we provide dependent and independent columns to machine. It uses classification technique which predicts the type of tumour. Keywords: Cancer, Machine learning, Prediction, Data Visualization, SVM, Naïve Bayes, Classification.

Download Full-text

Doc2Vec &Naïve Bayes: Learners’ Cognitive Presence Assessment through Asynchronous Online Discussion TQ Transcripts

International Journal of Emerging Technologies in Learning (iJET) ◽

10.3991/ijet.v14i08.9964 ◽

2019 ◽

Vol 14 (08) ◽

pp. 70 ◽

Cited By ~ 3

Author(s):

Hind Hayati ◽

Abdessamad Chanaa ◽

Mohammed Khalidi Idrissi ◽

Samir Bennani

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Naïve Bayes ◽

Online Discussions ◽

Cognitive Presence ◽

Machine Learning Techniques ◽

Face To Face ◽

Learning Techniques ◽

Bayes Algorithm ◽

Context Features

Due to the lack of face to face interaction in online learning environment, this article aims essentially to give tutors the opportunity to understand and analyze learners’ cognitive behavior. In this perspective, we propose an automatic system to assess learners’ cognitive presence regarding their social interactions within synchronous online discussions. Combining Natural Language Preprocessing, Doc2Vec document embedding method and machine learning techniques; we first make some transformations and preprocessing to the given transcripts, then we apply Doc2Vec method to represent each message as a vector that will be concatenated with LIWC and context features. The vectors are input data of Naïve Bayes algorithm; a machine learning method; that aims to classify transcripts according to cognitive presence categories.

Download Full-text

Innovative Artificial Intelligence Approach for Hearing-Loss Symptoms Identification Model Using Machine Learning Techniques

Sustainability ◽

10.3390/su13105406 ◽

2021 ◽

Vol 13 (10) ◽

pp. 5406

Author(s):

Mohd Khanapi Abd Ghani ◽

Nasir G. Noma ◽

Mazin Abed Mohammed ◽

Karrar Hameed Abdulkareem ◽

Begonya Garcia-Zapirain ◽

...

Keyword(s):

Machine Learning ◽

Hearing Loss ◽

Error Rate ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Model ◽

Machine Learning Techniques ◽

Feature Transformation ◽

Learning Techniques ◽

Multivariate Bernoulli

Physicians depend on their insight and experience and on a fundamentally indicative or symptomatic approach to decide on the possible ailment of a patient. However, numerous phases of problem identification and longer strategies can prompt a longer time for consulting and can subsequently cause other patients that require attention to wait for longer. This can bring about pressure and tension concerning those patients. In this study, we focus on developing a decision-support system for diagnosing the symptoms as a result of hearing loss. The model is implemented by utilizing machine learning techniques. The Frequent Pattern Growth (FP-Growth) algorithm is used as a feature transformation method and the multivariate Bernoulli naïve Bayes classification model as the classifier. To find the correlation that exists between the hearing thresholds and symptoms of hearing loss, the FP-Growth and association rule algorithms were first used to experiment with small sample and large sample datasets. The result of these two experiments showed the existence of this relationship, and that the performance of the hybrid of the FP-Growth and naïve Bayes algorithms in identifying hearing-loss symptoms was found to be efficient, with a very small error rate. The average accuracy rate and average error rate for the multivariate Bernoulli model with FP-Growth feature transformation, using five training sets, are 98.25% and 1.73%, respectively.

Download Full-text

Evaluating Machine Learning Methods for Predicting Diabetes among Female Patients in Bangladesh

Information ◽

10.3390/info11080374 ◽

2020 ◽

Vol 11 (8) ◽

pp. 374

Author(s):

Badiuzzaman Pranto ◽

Sk. Maliha Mehnaz ◽

Esha Bintee Mahid ◽

Imran Mahmud Sadman ◽

Ahsanur Rahman ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Bayes Classifier ◽

K Nearest Neighbor ◽

Machine Learning Methods ◽

Learning Techniques

Machine Learning has a significant impact on different aspects of science and technology including that of medical researches and life sciences. Diabetes Mellitus, more commonly known as diabetes, is a chronic disease that involves abnormally high levels of glucose sugar in blood cells and the usage of insulin in the human body. This article has focused on analyzing diabetes patients as well as detection of diabetes using different Machine Learning techniques to build up a model with a few dependencies based on the PIMA dataset. The model has been tested on an unseen portion of PIMA and also on the dataset collected from Kurmitola General Hospital, Dhaka, Bangladesh. The research is conducted to demonstrate the performance of several classifiers trained on a particular country’s diabetes dataset and tested on patients from a different country. We have evaluated decision tree, K-nearest neighbor, random forest, and Naïve Bayes in this research and the results show that both random forest and Naïve Bayes classifier performed well on both datasets.

Download Full-text

DEVELOPING AN EARLY PREDICTIVE SYSTEM FOR IDENTIFYING GENETIC BIOMARKERS ASSOCIATED TO ALZHEIMER’S DISEASE USING MACHINE LEARNING TECHNIQUES

Biomedical Engineering Applications Basis and Communications ◽

10.4015/s1016237219500406 ◽

2019 ◽

Vol 31 (05) ◽

pp. 1950040 ◽

Cited By ~ 1

Author(s):

Marwa Mostafa Abd El Hamid ◽

Mai S. Mabrouk ◽

Yasser M. K. Omar

Keyword(s):

Machine Learning ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Classification Accuracy ◽

Naive Bayes ◽

Learning Algorithms ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Whole Genome ◽

Learning Techniques

Alzheimer’s disease (AD) is an irreversible, progressive disorder that assaults the nerve cells of the brain. It is the most widely recognized kind of dementia among older adults. Apolipoprotein E (APOE), is one of the most common genetic risk factors for AD whose significant association with AD is observed in various genome-wide association studies (GWAS). Single nucleotide polymorphisms (SNPs) are the most common type of genetic variation among individuals. SNPs related to many common diseases like AD. SNPs are recognized as significant biomarkers for this disease, they help in understanding and detecting the disease in its early stages. Detecting SNPs biomarkers associated to the disease with high classification accuracy leads to early prediction and diagnosis. Machine learning techniques are utilized to discover new biomarkers of the disease. Sequential minimal optimization (SMO) algorithm with different kernels, Naive Bayes (NB), tree augmented Naive Bayes (TAN) and K2 learning algorithm have been applied on all genetic data of Alzheimer’s disease neuroimaging initiative phase 1 (ADNI-1)/Whole genome sequencing (WGS) datasets. The highest classification accuracy was achieved using 500 SNPs based on the [Formula: see text]-value threshold ([Formula: see text]-value [Formula: see text]). In whole genome approach ADNI-1, results revealed that NB and K2 learning algorithms scored an overall accuracy of 98% and 98.40%, respectively. In whole genome approach WGS, NB and K2 learning algorithms scored an overall accuracy of 99.63% and 99.75%, respectively.

Download Full-text

Investigating Tree Family Machine Learning Techniques for a Predictive System to Unveil Software Defects

Complexity ◽

10.1155/2020/6688075 ◽

2020 ◽

Vol 2020 ◽

pp. 1-21

Author(s):

Rashid Naseem ◽

Bilal Khan ◽

Arshad Ahmad ◽

Ahmad Almogren ◽

Saima Jabeen ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Software Development ◽

Absolute Error ◽

Error Rates ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defects ◽

Squared Error ◽

Learning Techniques

Software defects prediction at the initial period of the software development life cycle remains a critical and important assignment. Defect prediction and correctness leads to the assurance of the quality of software systems and has remained integral to study in the previous years. The quick forecast of imperfect or defective modules in software development can serve the development squad to use the existing assets competently and effectively to provide remarkable software products in a given short timeline. Hitherto, several researchers have industrialized defect prediction models by utilizing statistical and machine learning techniques that are operative and effective approaches to pinpoint the defective modules. Tree family machine learning techniques are well-thought-out to be one of the finest and ordinarily used supervised learning methods. In this study, different tree family machine learning techniques are employed for software defect prediction using ten benchmark datasets. These techniques include Credal Decision Tree (CDT), Cost-Sensitive Decision Forest (CS-Forest), Decision Stump (DS), Forest by Penalizing Attributes (Forest-PA), Hoeffding Tree (HT), Decision Tree (J48), Logistic Model Tree (LMT), Random Forest (RF), Random Tree (RT), and REP-Tree (REP-T). Performance of each technique is evaluated using different measures, i.e., mean absolute error (MAE), relative absolute error (RAE), root mean squared error (RMSE), root relative squared error (RRSE), specificity, precision, recall, F-measure (FM), G-measure (GM), Matthew’s correlation coefficient (MCC), and accuracy. The overall outcomes of this paper suggested RF technique by producing best results in terms of reducing error rates as well as increasing accuracy on five datasets, i.e., AR3, PC1, PC2, PC3, and PC4. The average accuracy achieved by RF is 90.2238%. The comprehensive outcomes of this study can be used as a reference point for other researchers. Any assertion concerning the enhancement in prediction through any new model, technique, or framework can be benchmarked and verified.

Download Full-text