Multi-label classification approach for quranic verses labeling

Machine learning involves the task of training systems to be able to make decisions without being explicitly programmed. Important among machine learning tasks is classification involving the process of training machines to make predictions from predefined labels. Classification is broadly categorized into three distinct groups: single-label (SL), multi-class, and multi-label (ML) classification. This research work presents an application of a multi-label classification (MLC) technique in automating Quranic verses labeling. MLC has been gaining attention in recent years. This is due to the increasing amount of works based on real-world classification problems of multi-label data. In traditional classification problems, patterns are associated with a single-label from a set of disjoint labels. However, in MLC, an instance of data is associated with a set of labels. In this paper, three standard <em>MLC</em> methods: <span>binary relevance (BR), classifier chain (CC), and label powerset (LP) algorithms are implemented with four baseline classifiers: support vector machine (SVM), naïve Bayes (NB), k-nearest neighbors (k-NN), and J48. The research methodology adopts the multi-label problem transformation (PT) approach. The results are validated using six conventional performance metrics. These include: hamming loss, accuracy, one error, micro-F1, macro-F1, and avg. precision. From the results, the classifiers effectively achieved above 70% accuracy mark. Overall, SVM achieved the best results with CC and LP algorithms.</span>

Download Full-text

Feature-Based Opinion Mining and Managed Machine Learning with Sentiment Classification Models

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b4555.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 3992-3998

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Opinion Mining ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Data Intensive ◽

Learning Tasks ◽

Feature Based

Sentiment Analysis is individuals' opinions and feedbacks study towards a substance, which can be items, services, movies, people or events. The opinions are mostly expressed as remarks or reviews. With the social network, gatherings and websites, these reviews rose as a significant factor for the client’s decision to buy anything or not. These days, a vast scalable computing environment provides us with very sophisticated way of carrying out various data-intensive natural language processing (NLP) and machine-learning tasks to examine these reviews. One such example is text classification, a compelling method for predicting the clients' sentiment. In this paper, we attempt to center our work of sentiment analysis on movie review database. We look at the sentiment expression to order the extremity of the movie reviews on a size of 0(highly disliked) to 4(highly preferred) and perform feature extraction and ranking and utilize these features to prepare our multilabel classifier to group the movie review into its right rating. This paper incorporates sentiment analysis utilizing feature-based opinion mining and managed machine learning. The principle center is to decide the extremity of reviews utilizing nouns, verbs, and adjectives as opinion words. In addition, a comparative study on different classification approaches has been performed to determine the most appropriate classifier to suit our concern problem space. In our study, we utilized six distinctive machine learning algorithms – Naïve Bayes, Logistic Regression, SVM (Support Vector Machine), RF (Random Forest) KNN (K nearest neighbors) and SoftMax Regression.

Download Full-text

An Empirical Approach for Extreme Behavior Identification through Tweets Using Machine Learning

Applied Sciences ◽

10.3390/app9183723 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3723

Author(s):

Sharif ◽

Mumtaz ◽

Shafiq ◽

Riaz ◽

Ali ◽

...

Keyword(s):

Machine Learning ◽

Research Work ◽

Principal Component ◽

Research Area ◽

Ensemble Classification ◽

High Dimensional ◽

Support Vector ◽

K Nearest Neighbors ◽

N Gram ◽

Low Dimensional

The rise of social media has led to an increasing online cyber-war via hate and violent comments or speeches, and even slick videos that lead to the promotion of extremism and radicalization. An analysis to sense cyber-extreme content from microblogging sites, specifically Twitter, is a challenging, and an evolving research area since it poses several challenges owing short, noisy, context-dependent, and dynamic nature content. The related tweets were crawled using query words and then carefully labelled into two classes: Extreme (having two sub-classes: pro-Afghanistan government and pro-Taliban) and Neutral. An Exploratory Data Analysis (EDA) using Principal Component Analysis (PCA), was performed for tweets data (having Term Frequency—Inverse Document Frequency (TF-IDF) features) to reduce a high-dimensional data space into a low-dimensional (usually 2-D or 3-D) space. PCA-based visualization has shown better cluster separation between two classes (extreme and neutral), whereas cluster separation, within sub-classes of extreme class, was not clear. The paper also discusses the pros and cons of applying PCA as an EDA in the context of textual data that is usually represented by a high-dimensional feature set. Furthermore, the classification algorithms like naïve Bayes’, K Nearest Neighbors (KNN), random forest, Support Vector Machine (SVM) and ensemble classification methods (with bagging and boosting), etc., were applied with PCA-based reduced features and with a complete set of features (TF-IDF features extracted from n-gram terms in the tweets). The analysis has shown that an SVM demonstrated an average accuracy of 84% compared with other classification models. It is pertinent to mention that this is the novel reported research work in the context of Afghanistan war zone for Twitter content analysis using machine learning methods.

Download Full-text

Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics

Molecules ◽

10.3390/molecules24152811 ◽

2019 ◽

Vol 24 (15) ◽

pp. 2811 ◽

Cited By ~ 4

Author(s):

Rácz ◽

Bajusz ◽

Héberger

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Learning Algorithm ◽

Classification Problems ◽

Machine Learning Classification ◽

Learning Tasks ◽

Sum Of Ranking Differences ◽

Multi Level

Machine learning classification algorithms are widely used for the prediction and classification of the different properties of molecules such as toxicity or biological activity. the prediction of toxic vs. non-toxic molecules is important due to testing on living animals, which has ethical and cost drawbacks as well. The quality of classification models can be determined with several performance parameters. which often give conflicting results. In this study, we performed a multi-level comparison with the use of different performance metrics and machine learning classification methods. Well-established and standardized protocols for the machine learning tasks were used in each case. The comparison was applied to three datasets (acute and aquatic toxicities) and the robust, yet sensitive, sum of ranking differences (SRD) and analysis of variance (ANOVA) were applied for evaluation. The effect of dataset composition (balanced vs. imbalanced) and 2-class vs. multiclass classification scenarios was also studied. Most of the performance metrics are sensitive to dataset composition, especially in 2-class classification problems. The optimal machine learning algorithm also depends significantly on the composition of the dataset.

Download Full-text

Classification of Soils into Hydrologic Groups Using Machine Learning

Data ◽

10.3390/data5010002 ◽

2019 ◽

Vol 5 (1) ◽

pp. 2 ◽

Cited By ~ 4

Author(s):

Shiny Abraham ◽

Chau Huynh ◽

Huy Vu

Keyword(s):

Machine Learning ◽

Water Conservation ◽

Large Scale ◽

Performance Metrics ◽

Gaussian Kernel ◽

Support Vector ◽

K Nearest Neighbors ◽

Group B ◽

Soil Groups

Hydrologic soil groups play an important role in the determination of surface runoff, which, in turn, is crucial for soil and water conservation efforts. Traditionally, placement of soil into appropriate hydrologic groups is based on the judgement of soil scientists, primarily relying on their interpretation of guidelines published by regional or national agencies. As a result, large-scale mapping of hydrologic soil groups results in widespread inconsistencies and inaccuracies. This paper presents an application of machine learning for classification of soil into hydrologic groups. Based on features such as percentages of sand, silt and clay, and the value of saturated hydraulic conductivity, machine learning models were trained to classify soil into four hydrologic groups. The results of the classification obtained using algorithms such as k-Nearest Neighbors, Support Vector Machine with Gaussian Kernel, Decision Trees, Classification Bagged Ensembles and TreeBagger (Random Forest) were compared to those obtained using estimation based on soil texture. The performance of these models was compared and evaluated using per-class metrics and micro- and macro-averages. Overall, performance metrics related to kNN, Decision Tree and TreeBagger exceeded those for SVM-Gaussian Kernel and Classification Bagged Ensemble. Among the four hydrologic groups, it was noticed that group B had the highest rate of false positives.

Download Full-text

Modelling on Car-Sharing Serial Prediction Based on Machine Learning and Deep Learning

Complexity ◽

10.1155/2022/8843000 ◽

2022 ◽

Vol 2022 ◽

pp. 1-20

Author(s):

Nihad Brahimi ◽

Huaping Zhang ◽

Lin Dai ◽

Jianzi Zhang

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Research Work ◽

Gradient Boosting ◽

Support Vector ◽

Learning Models ◽

Car Sharing ◽

K Nearest Neighbors ◽

Extreme Gradient Boosting ◽

On The Road

The car-sharing system is a popular rental model for cars in shared use. It has become particularly attractive due to its flexibility; that is, the car can be rented and returned anywhere within one of the authorized parking slots. The main objective of this research work is to predict the car usage in parking stations and to investigate the factors that help to improve the prediction. Thus, new strategies can be designed to make more cars on the road and fewer in the parking stations. To achieve that, various machine learning models, namely vector autoregression (VAR), support vector regression (SVR), eXtreme gradient boosting (XGBoost), k-nearest neighbors (kNN), and deep learning models specifically long short-time memory (LSTM), gated recurrent unit (GRU), convolutional neural network (CNN), CNN-LSTM, and multilayer perceptron (MLP), were performed on different kinds of features. These features include the past usage levels, Chongqing’s environmental conditions, and temporal information. After comparing the obtained results using different metrics, we found that CNN-LSTM outperformed other methods to predict the future car usage. Meanwhile, the model using all the different feature categories results in the most precise prediction than any of the models using one feature category at a time

Download Full-text

Classification of Student Performance Dataset using Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1114.1292s219 ◽

2019 ◽

Vol 9 (2S2) ◽

pp. 752-757

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Student Performance ◽

Performance Metrics ◽

Naive Bayes ◽

Research Work ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector

The scope of this research work is to identify the efficient machine learning algorithm for predicting the behavior of a student from the student performance dataset. We applied Support Vector Machines, K-Nearest Neighbor, Decision Tree and Naïve Bayes algorithms to predict the grade of a student and compared their prediction results in terms of various performance metrics. The students who visited many resources for reference, made academic related discussions and interactions in the class room, absent for minimum days, cared by parents care have shown great improvement in the final grade. Among the machine learning techniques we have used, SVM has shown more accuracy in terms of four important attribute. The accuracy rate of SVM after tuning is 0.80. The KNN and decision tree achieves the accuracy of 0.64, 0.65 respectively whereas the Naïve Bayes achieves 0.77.

Download Full-text

Validation of Machine Learning Models for Health Insurance Risks Assessment

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1670.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 4247-4256

Keyword(s):

Health Insurance ◽

Performance Metrics ◽

Research Work ◽

Support Vector ◽

K Nearest Neighbors ◽

Universal Healthcare ◽

Parameter Configuration ◽

Explained Variance ◽

Risks Assessment ◽

Almost All

A universal healthcare policy success is impossible without the use of insurance instruments. The healthcare and insurance industries are on the verge of integrating seamlessly with the help of sensors and algorithms. This research work focuses on validating an algorithm that can help to model and classify health insurance risk data. Six algorithms Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), Naive Bayes (NB) and Support Vector Machine (SVM) were evaluated and objective validation of these algorithms has been demonstrated. To maintain the replicability of the study the data and code are available in public repository. From the study, it is clear that the KNN algorithm is best suited as a risk classifier. This is evidence from the values of R2 , error metrics, completeness score, explained variance, normalized mutual score v measure score, precision, recall, f1 score, and accuracy metrics. Secondly, the algorithms have been validated using 10 k-fold method using five types of performance metrics. In almost all cases, it was found that the KNN algorithm performs consistently and is the most suitable numerically. This can be attributed that the standard deviation remains tight of performance metrics in evaluation. From all the validation test, it can be claimed that on the current dataset, the KNN algorithm with Accuracy, Homogeneity Score Explained variance and Normalized mutual score hyper-parameter configuration is the best performer

Download Full-text

An Initial Machine Learning-Based Victim’s Scream Detection Analysis for Burning Sites

Applied Sciences ◽

10.3390/app11188425 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8425

Author(s):

Fairuz Samiha Saeed ◽

Abdullah Al Bashit ◽

Vishu Viswanathan ◽

Damian Valles

Keyword(s):

Machine Learning ◽

Embedded System ◽

Performance Metrics ◽

Short Term Memory ◽

Research Work ◽

Automated System ◽

Support Vector ◽

Detection Analysis ◽

Time Critical ◽

In Fire

Fire incidents are responsible for severe damage and thousands of deaths every year all over the world. Extreme temperatures, low visibility, toxic gases, and unknown locations of victims create difficulties and delays in rescue operations, escalating the risk of injury or death. It is time-critical to detect the victims trapped inside the burning sites for facilitating the rescue operations. This research work presents an audio-based automated system for victim detection in fire emergencies, investigating two machine learning (ML) methods: support vector machines (SVM) and long short-term memory (LSTM). The performance of these two ML techniques has been evaluated based on a variety of performance metrics. Our analyses show that both ML methods provide superior scream detection performance, with SVM slightly overperforming LSTM. Because of its lower complexity, SVM is a better candidate for real-time implementation in our autonomous embedded system vehicle (AESV).

Download Full-text

A Comparative Study of Different Machine Learning Algorithms for Disease Prediction

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i7/0177 ◽

2017 ◽

Vol 7 (7) ◽

pp. 172

Author(s):

Anantvir Singh Romana

Keyword(s):

Machine Learning ◽

Subsequent Treatment ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Disease Prediction ◽

Classification Problems ◽

Learning Techniques ◽

Neural Network Classifiers ◽

Diagnostic Detection

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.

Download Full-text

Use of Machine Learning to Investigate the Quantitative Checklist for Autism in Toddlers (Q-CHAT) towards Early Autism Screening

Diagnostics ◽

10.3390/diagnostics11030574 ◽

2021 ◽

Vol 11 (3) ◽

pp. 574

Author(s):

Gennaro Tartarisco ◽

Giovanni Cicceri ◽

Davide Di Pietro ◽

Elisa Leonardi ◽

Stefania Aiello ◽

...

Keyword(s):

Machine Learning ◽

High Performance ◽

Behavioral Science ◽

Autistic Traits ◽

Classification Performance ◽

Recursive Feature Elimination ◽

Diagnostic Tools ◽

Support Vector ◽

K Nearest Neighbors ◽

Autism Screening

In the past two decades, several screening instruments were developed to detect toddlers who may be autistic both in clinical and unselected samples. Among others, the Quantitative CHecklist for Autism in Toddlers (Q-CHAT) is a quantitative and normally distributed measure of autistic traits that demonstrates good psychometric properties in different settings and cultures. Recently, machine learning (ML) has been applied to behavioral science to improve the classification performance of autism screening and diagnostic tools, but mainly in children, adolescents, and adults. In this study, we used ML to investigate the accuracy and reliability of the Q-CHAT in discriminating young autistic children from those without. Five different ML algorithms (random forest (RF), naïve Bayes (NB), support vector machine (SVM), logistic regression (LR), and K-nearest neighbors (KNN)) were applied to investigate the complete set of Q-CHAT items. Our results showed that ML achieved an overall accuracy of 90%, and the SVM was the most effective, being able to classify autism with 95% accuracy. Furthermore, using the SVM–recursive feature elimination (RFE) approach, we selected a subset of 14 items ensuring 91% accuracy, while 83% accuracy was obtained from the 3 best discriminating items in common to ours and the previously reported Q-CHAT-10. This evidence confirms the high performance and cross-cultural validity of the Q-CHAT, and supports the application of ML to create shorter and faster versions of the instrument, maintaining high classification accuracy, to be used as a quick, easy, and high-performance tool in primary-care settings.

Download Full-text