A comparative study of classifier techniques for lift index data analysis

2018 ◽  
Vol 25 (2) ◽  
pp. 632-641 ◽  
Author(s):  
Mohammad Asjad ◽  
Azazullah Alam ◽  
Faisal Hasan

Purpose A classifier technique is one of the important tools which may be used to classify the data or information into systematic manner based on certain criteria pertaining to get the accurate statistical information for decision making. It plays a vital role in the various applications, such as business organization, e-commerce, health care, scientific and engineering application. The purpose of this paper is to examine the performance of different classification techniques in lift index (LI) data classification. Design/methodology/approach The analyses consist of two stages. First, the random data are generated for lifting task through computer programming, which is then put into the National Institute for Occupational Safety and Health equation for LI estimation. Based on the evaluated index, the task may be classified into two groups, i.e. high-risk and low-risk task. The classified task is considered to analyze the performance of different tools like Artificial Neural Network (ANN), discriminant analysis (DA) and support vector machines (SVMs). Findings The work clearly demonstrates the accuracy and computational ability of ANN, DA and SVM for data classification problems in general and LI data in particular. From the research it may be concluded that SVM may outperform ANN and DA. Research limitations/implications The research is limited to a particular kind of data that may be further explored by selecting the different controllable parameters and model specification. The study can also be applied to realistic problem of manual loading. It is expected that this will help researchers, designers and practicing engineers by making them aware of the performance of classification techniques in this area. Originality/value The objective of this research work is to assess and compare the relative performance of some well-known classification techniques like DA, ANN and SVM, which suggest that data characteristics considerably impact the classification performance of the methods.

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Shilpa Sharma ◽  
Punam Rattan ◽  
Anurag Sharma ◽  
Mohammad Shabaz

Purpose This paper aims to introduce recently an unregulated unsupervised algorithm focused on voice activity detection by data clustering maximum margin, i.e. support vector machine. The algorithm for clustering K-mean used to solve speech behaviour detection issues was later applied, the application, therefore, did not permit the identification of voice detection. This is critical in demands for speech recognition. Design/methodology/approach Here, the authors find a voice activity detection detector based on a report provided by a K-mean algorithm that permits sliding window detection of voice and noise. However, first, it needs an initial detection pause. The machine initialized by the algorithm will work on health-care infrastructure and provides a platform for health-care professionals to detect the clear voice of patients. Findings Timely usage discussion on many histories of NOISEX-92 var reveals the average non-speech and the average signal-to-noise ratios hit concentrations which are higher than modern voice activity detection. Originality/value Research work is original.


Author(s):  
Sophi Shilpa Gururajapathy ◽  
Hazlie Mokhlis ◽  
Hazlee Azil Illias

PurposeThe purpose of this paper is to identify faults in distribution systems which are unavoidable because of adverse weather conditions and unexpected accidents. Hence, quick fault location is vital for continuous power supply. However, most fault location methods depend on the stored database for locating fault. The database is created by simulation, which is time consuming. Therefore, in this work, a comprehensive fault location method to detect faulty section and fault distance from one-ended bus using limited simulated data is proposed.Design/methodology/approachThe work uses voltage sag data measured at a primary substation. Support vector machine estimates the data which are not simulated. The possible faulty section is determined using matching approach and fault distance using mathematical analysis.FindingsThis work proposed a ranking analysis for multiple possible faulty sections, and the fault distance is calculated using Euclidean distance approach.Practical implicationsThe research work uses Malaysian distribution system as it represents a practical distribution system with multiple branches and limited measurement at primary substation. The work requires only metering devices to identify fault which is cost effective. In addition, the distribution system is simulated using real-time PSCAD by which the capability of proposed method can be fully tested.Originality/valueThe paper presents a new method for fault analysis. It reduces simulation time and storage space of database. The work identifies faulty section and ranks the prior faulty section. It also identifies fault distance using a mathematical approach.


Author(s):  
Abdullahi Adeleke ◽  
Noor Azah Samsudin ◽  
Mohd Hisyam Abdul Rahim ◽  
Shamsul Kamal Ahmad Khalid ◽  
Riswan Efendi

Machine learning involves the task of training systems to be able to make decisions without being explicitly programmed. Important among machine learning tasks is classification involving the process of training machines to make predictions from predefined labels. Classification is broadly categorized into three distinct groups: single-label (SL), multi-class, and multi-label (ML) classification. This research work presents an application of a multi-label classification (MLC) technique in automating Quranic verses labeling. MLC has been gaining attention in recent years. This is due to the increasing amount of works based on real-world classification problems of multi-label data. In traditional classification problems, patterns are associated with a single-label from a set of disjoint labels. However, in MLC, an instance of data is associated with a set of labels. In this paper, three standard <em>MLC</em> methods: <span>binary relevance (BR), classifier chain (CC), and label powerset (LP) algorithms are implemented with four baseline classifiers: support vector machine (SVM), naïve Bayes (NB), k-nearest neighbors (k-NN), and J48. The research methodology adopts the multi-label problem transformation (PT) approach. The results are validated using six conventional performance metrics. These include: hamming loss, accuracy, one error, micro-F1, macro-F1, and avg. precision. From the results, the classifiers effectively achieved above 70% accuracy mark. Overall, SVM achieved the best results with CC and LP algorithms.</span>


Author(s):  
Umar Sidiq ◽  
Syed Mutahar Aaqib ◽  
Rafi Ahmad Khan

Classification is one of the most considerable supervised learning data mining technique used to classify predefined data sets the classification is mainly used in healthcare sectors for making decisions, diagnosis system and giving better treatment to the patients. In this work, the data set used is taken from one of recognized lab of Kashmir. The entire research work is to be carried out with ANACONDA3-5.2.0 an open source platform under Windows 10 environment. An experimental study is to be carried out using classification techniques such as k nearest neighbors, Support vector machine, Decision tree and Naïve bayes. The Decision Tree obtained highest accuracy of 98.89% over other classification techniques.


Author(s):  
A. K. Shakya ◽  
A. Ramola ◽  
A. Kandwal ◽  
R. Prakash

<p><strong>Abstract.</strong> The Advanced Land Observing Satellite (ALOS) is developed by the Japanese Aerospace Exploration Agency (JAXA) which was launched in the year 2006 for the Earth observation and exploration purpose. The ALOS was carrying PRISM, AVNIR-2 and PALSAR sensors for this purpose. PALSAR is L-Band synthetic aperture radar (SAR). The PALSAR sensor is designed in a way that it can work in all weather conditions with a resolution of 10 meters. In this research work we have made an investigation on the accuracy obtained from the various supervised classification techniques. We have compared the accuracy obtained by classifying the ALOS PALSAR data of the Roorkee region of Uttarakhand, India. The training ROI’S (Region of Interest) are created manually with the assistance of ArcGIS Earth and for the testing purpose, we have used the Global positioning system (GPS) coordinates of the region. Supervised classification techniques included in this comparison are Parallelepiped classification (PC), Minimum distance classification (MDC), Mahalanobis distance classification (MaDC), Maximum likelihood classification (MLC), Spectral angle mapper (SAM), Spectral information divergence (SID) and Support vector machine (SVM). Later, through the post classification confusion matrix accuracy assessment test is performed and the corresponding value of the kappa coefficient is obtained. In the result, we have concluded MDC as best in term of overall accuracy with 82.3634% and MLC with a kappa value of 0.7591. Finally, a peculiar relationship is developed in between classification accuracy and kappa coefficient.</p>


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Sai Prasanthi Kasimsetti ◽  
Asdaque Hussain

Purpose The research work is attained by Spurious Transmission–based Enhanced Packet Reordering Method (ST-EPRM). The packet reordering necessity is evaded by presenting random linear network coding process on wireless network physical layer which function on basis of sequence numbers. The spurious retransmission happening over wireless network is obtained by presenting monitoring concept for reducing number of spurious retransmissions because it might need more than three DUPACKs for triggering fast retransmit. This monitoring node performs as centralized node as well variation amid buffer length and number of packets being sent can be predicted. This information helps in differentiating spurious retransmission from the packet loss. Design/methodology/approach Based on transmission detection, action is accomplished whether to retransmit or evade transmission. Monitoring node selection is achieved by presenting improved cuckoo search algorithm. The modified support vector machine algorithm is greatly used for variation-based spurious transmission. Findings The research work which is attained by ST-EPRM. The packet reordering necessity is evaded by presenting random linear network coding process on wireless network physical layer which function on basis of sequence numbers. The spurious retransmission happening over wireless network is obtained by presenting monitoring concept for reducing number of spurious retransmissions because it might need more than three DUPACKs for triggering fast retransmit. This monitoring node performs as centralized node as well variation amid buffer length and number of packets being sent can be predicted. This information helps in differentiating spurious retransmission from the packet loss. Originality/value Based on transmission detection, action is accomplished whether to retransmit or evade transmission. Monitoring node selection is achieved by presenting improved cuckoo search algorithm. The modified support vector machine algorithm is greatly used for variation-based spurious transmission.


2019 ◽  
Vol 37 (6) ◽  
pp. 1040-1058 ◽  
Author(s):  
Shuo Xu ◽  
Xin An

Purpose Image classification is becoming a supporting technology in several image-processing tasks. Due to rich semantic information contained in the images, it is very popular for an image to have several labels or tags. This paper aims to develop a novel multi-label classification approach with superior performance. Design/methodology/approach Many multi-label classification problems share two main characteristics: label correlations and label imbalance. However, most of current methods are devoted to either model label relationship or to only deal with unbalanced problem with traditional single-label methods. In this paper, multi-label classification problem is regarded as an unbalanced multi-task learning problem. Multi-task least-squares support vector machine (MTLS-SVM) is generalized for this problem, renamed as multi-label LS-SVM (ML2S-SVM). Findings Experimental results on the emotions, scene, yeast and bibtex data sets indicate that the ML2S-SVM is competitive with respect to the state-of-the-art methods in terms of Hamming loss and instance-based F1 score. The values of resulting parameters largely influence the performance of ML2S-SVM, so it is necessary for users to identify proper parameters in advance. Originality/value On the basis of MTLS-SVM, a novel multi-label classification approach, ML2S-SVM, is put forward. This method can overcome the unbalanced problem but also explicitly models arbitrary order correlations among labels by allowing multiple labels to share a subspace. In addition, the multi-label classification approach has a wider range of applications. That is to say, it is not limited to the field of image classification.


2020 ◽  
Vol 15 (4) ◽  
pp. 309-317
Author(s):  
Nashreen Sultana ◽  
Nonita Sharma ◽  
Krishna Pal Sharma ◽  
Shobhit Verma

Background: Ensemble building is a popular method for improving model accuracy for classification problems as well as regression. Objective: In this research work, we propose a sequential ensemble model to predict the number of incidences for communicable diseases like influenza, hand foot and mouth disease (HFMD), and diarrhea and compare it with applied models for prediction. Methods: The weekly dataset of the three diseases, namely, influenza, HFMD, and diarrhea, are collected from the official government site of Hong Kong from the year 2010 to 2018. The data was preprocessed by taking log transformation and z-score transformation. The proposed sequential ensemble model is applied to the processed dataset to predict future occurrences. Results: The result of the proposed ensemble model is compared against standard support vector regression (SVR) using different error metrics such as root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE). In the case of all the threedisease datasets, the proposed ensemble model gives better results in comparison to the standard SVR model. Conclusion: The main objective of this research work is to minimize the prediction error; the proposed sequential ensemble model has shown a significant result in terms of prediction errors.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Hari Hara Krishna Kumar Viswanathan ◽  
Punniyamoorthy Murugesan ◽  
Sundar Rengasamy ◽  
Lavanya Vilvanathan

PurposeThe purpose of this study is to compare the classification learning ability of our algorithm based on boosted support vector machine (B-SVM), against other classification techniques in predicting the credit ratings of banks. The key feature of this study is the usage of an imbalanced dataset (in the response variable/rating) with a smaller number of observations (number of banks).Design/methodology/approachIn general, datasets in banking sector are small and imbalanced too. In this study, 23 Scheduled Commercial Banks (SCBs) have been chosen (in India), and their corresponding corporate ratings have been collated from the Indian subsidiary of reputed global rating agency. The top management of the rating agency provided 12 input (quantitative) variables that are considered essential for rating a bank within India. In order to overcome the challenge of dataset being imbalanced and having small number of observations, this study uses an algorithm, namely “Modified Boosted Support Vector Machines” (MBSVMs) proposed by Punniyamoorthy Murugesan and Sundar Rengasamy. This study also compares the classification ability of the aforementioned algorithm against other classification techniques such as multi-class SVM, back propagation neural networks, multi-class linear discriminant analysis (LDA) and k-nearest neighbors (k-NN) classification, on the basis of geometric mean (GM).FindingsThe performances of each algorithm have been compared based on one metric—the geometric mean, also known as GMean (GM). This metric typically indicates the class-wise sensitivity by using the values of products. The findings of the study prove that the proposed MBSVM technique outperforms the other techniques.Research limitations/implicationsThis study provides an algorithm to predict ratings of banks where the dataset is small and imbalanced. One of the limitations of this research study is that subjective factors have not been included in our model; the sole focus is on the results generated by the models (driven by quantitative parameters). In future, studies may be conducted which may include subjective parameters (proxied by relevant and quantifiable variables).Practical implicationsVarious stakeholders such as investors, regulators and central banks can predict the credit ratings of banks by themselves, by inputting appropriate data to the model.Originality/valueIn the process of rating banks, the usage of an imbalanced dataset can lessen the performance of the soft-computing techniques. In order to overcome this, the authors have come up with a novel classification approach based on “MBSVMs”, which can be used as a yardstick for such imbalanced datasets. For this purpose, through primary research, 12 features have been identified that are considered essential by the credit rating agencies.


Author(s):  
Anantvir Singh Romana

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.


Sign in / Sign up

Export Citation Format

Share Document