scholarly journals Ensemble Classification Approach for Sarcasm Detection

2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Jyoti Godara ◽  
Isha Batra ◽  
Rajni Aron ◽  
Mohammad Shabaz

Cognitive science is a technology which focuses on analyzing the human brain using the application of DM. The databases are utilized to gather and store the large volume of data. The authenticated information is extracted using measures. This research work is based on detecting the sarcasm from the text data. This research work introduces a scheme to detect sarcasm based on PCA algorithm, K -means algorithm, and ensemble classification. The four ensemble classifiers are designed with the objective of detecting the sarcasm. The first ensemble classification algorithm (SKD) is the combination of SVM, KNN, and decision tree. In the second ensemble classifier (SLD), SVM, logistic regression, and decision tree classifiers are combined for the sarcasm detection. In the third ensemble model (MLD), MLP, logistic regression, and decision tree are combined, and the last one (SLM) is the combination of MLP, logistic regression, and SVM. The proposed model is implemented in Python and tested on five datasets of different sizes. The performance of the models is tested with regard to various metrics.

Author(s):  
Dhilsath Fathima.M ◽  
S. Justin Samuel ◽  
R. Hari Haran

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.


Evolution of the current modern era demands a huge and good power quality supply day by day. Power utility suppliers and power exchange specialist organizations face a noteworthy test in recognizing the kind of Power Quality Disturbances (PQD). Our research illustrates the technique of PQD classification by utilizing wavelet signal decomposition and Ensemble classification. A normal wave without disturbance and waves with PQD events of single-type and hybrid-type were generated using MATLAB using the mathematical model as per the definition and parameters outlined by IEEE 1159 and IEC61000 customary. Discrete Wavelet Transform (DWT) is pertained to decompose the signal form the generated PQD to get the illustration in time and frequency domain. In this research work, our database consists of 14000 generated signals of a normal wave and the PQDs, which were divided into 80% for the train set and 20% for the test set for each PQDs. An ensemble methodology for multiclass order was chosen as the classifier of the component vector for the PQD. Examinations were conjointly made with elective sorts of classifiers and different kinds of mother wavelet channel capacities to observe and investigate the exhibition qualification. The outcomes demonstrated that the blend of DWT and Ensemble Classifier delivers an optimal solution to recognize the class of PQD with a precision of 100% for each train and test set.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Jiangbo Zou ◽  
Xiaokang Fu ◽  
Lingling Guo ◽  
Chunhua Ju ◽  
Jingjing Chen

Ensemble classifiers improve the classification accuracy by incorporating the decisions made by its component classifiers. Basically, there are two steps to create an ensemble classifier: one is to generate base classifiers and the other is to align the base classifiers to achieve maximum accuracy integrally. One of the major problems in creating ensemble classifiers is the classification accuracy and diversity of the component classifiers. In this paper, we propose an ensemble classifier generating algorithm to improve the accuracy of an ensemble classification and to maximize the diversity of its component classifiers. In this algorithm, information entropy is introduced to measure the diversity of component classifiers, and a cyclic iterative optimization selection tactic is applied to select component classifiers from base classifiers, in which the number of component classifiers is dynamically adjusted to minimize system cost. It is demonstrated that our method has an obvious lower memory cost with higher classification accuracy compared with existing classifier methods.


In this research paper, various ensemble classifiers are used to predict occupancy status using samples of light, temperature, humidity, CO2 , humidity ratio sensor data. Occupancy detection will save energy making room for smart buildings in smart cities. It paves ways to decide on heating, ventilation, cooling and lighting. To achieve 'white box' output and facilitate explanatory interpretation, decision tree was employed, Several weak learner decision trees were melded to form RUSBoosted Tree ensemble classifier. On investigation of the results, it is seen that RUSBoostedTree Ensemble gives the highest accuracy rate of 99%


2018 ◽  
Vol 20 (3) ◽  
pp. 321-357 ◽  
Author(s):  
Kalyan Nagaraj ◽  
Biplab Bhattacharjee ◽  
Amulyashree Sridhar ◽  
Sharvani GS

Purpose Phishing is one of the major threats affecting businesses worldwide in current times. Organizations and customers face the hazards arising out of phishing attacks because of anonymous access to vulnerable details. Such attacks often result in substantial financial losses. Thus, there is a need for effective intrusion detection techniques to identify and possibly nullify the effects of phishing. Classifying phishing and non-phishing web content is a critical task in information security protocols, and full-proof mechanisms have yet to be implemented in practice. The purpose of the current study is to present an ensemble machine learning model for classifying phishing websites. Design/methodology/approach A publicly available data set comprising 10,068 instances of phishing and legitimate websites was used to build the classifier model. Feature extraction was performed by deploying a group of methods, and relevant features extracted were used for building the model. A twofold ensemble learner was developed by integrating results from random forest (RF) classifier, fed into a feedforward neural network (NN). Performance of the ensemble classifier was validated using k-fold cross-validation. The twofold ensemble learner was implemented as a user-friendly, interactive decision support system for classifying websites as phishing or legitimate ones. Findings Experimental simulations were performed to access and compare the performance of the ensemble classifiers. The statistical tests estimated that RF_NN model gave superior performance with an accuracy of 93.41 per cent and minimal mean squared error of 0.000026. Research limitations/implications The research data set used in this study is publically available and easy to analyze. Comparative analysis with other real-time data sets of recent origin must be performed to ensure generalization of the model against various security breaches. Different variants of phishing threats must be detected rather than focusing particularly toward phishing website detection. Originality/value The twofold ensemble model is not applied for classification of phishing websites in any previous studies as per the knowledge of authors.


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Nasrin Ostvar ◽  
Amir Masoud Eftekhari Moghadam

In recent years, ensemble classification methods have been widely investigated in both industry and literature in the field of machine learning and artificial intelligence. The main advantage of this approach is to benefit from a set of classifiers instead of using a single classifier with the aim of improving the prediction performance, such as accuracy. Selecting the base classifiers and the method for combining them are the most challenging issues in the ensemble classifiers. In this paper, we propose a heterogeneous dynamic ensemble classifier (HDEC) which uses multiple classification algorithms. The main advantage of using heterogeneous algorithms is increasing the diversity among the base classifiers as it is a key point for an ensemble system to be successful. In this method, we first train many classifiers with the original data. Then, they are separated based on their strength in recognizing either positive or negative instances. For doing this, we consider the true positive rate and true negative rate, respectively. In the next step, the classifiers are categorized into two groups according to their efficiency in the mentioned measures. Finally, the outputs of the two groups are compared with each other to generate the final prediction. For evaluating the proposed approach, it has been applied to 12 datasets from the UCI and LIBSVM repositories and calculated two popular prediction performance metrics, including accuracy and geometric mean. The experimental results show the superiority of the proposed approach in comparison to other state-of-the-art methods.


2012 ◽  
Vol 546-547 ◽  
pp. 576-581
Author(s):  
Feng Qian ◽  
Lin Wen Xu

In the highly competitive market, to meet consumer’s need is a critical factor for product success. So, acceptability evaluation and prediction is important in product development. This study developed an intelligent model to evaluate and predict consumer acceptability. The model used IG as ranking method to rank the features of importance firstly. In addition, it employed the Bayesian Network (BN) and Radial Basis Function (RBF) Networks and their ensembles to build a prediction model. To demonstrate applicability of the proposed model, we adopted a real case, mp3 evaluation, to show that the consumer acceptability problem can be easily evaluated and predicted using the proposed model. The results show that ensemble classifiers are more accurate than a single classifier. This ensemble model not only helps manufacturer in evaluating the importance of product features but also predicting consumer acceptability.


Credit card frauds are on the rise and are getting smarter with the passage of time. Usually, fraudulent transactions are conducted by stealing the credit card. When the loss of the card is not noticed by the cardholder, a huge loss can be faced by the credit card company. In the existing work, it has been found that the researchers have utilized Voting based method to identify credit card frauds. The problem with voting based method is that they are more complex and more time consuming. In this research work, a hybrid approach based on KNN and Naive Bayes for the detection of credit card frauds. KNN will be used as the base classifier and it will return predicted result. The predicted result will be provided as input to the Naive Bayes classifier which will generate the final result. The proposed model will be compared with existing techniques and the results are analyzed in terms of recall, precision, accuracy and execution time.


2021 ◽  
Vol 12 (11) ◽  
pp. 1916-1924
Author(s):  
Tamanna Siddiqui, Et. al.

Sarcasm is well-defined as a cutting, frequently sarcastic remark intended to fast ridicule or dislike. Irony detection is the assignment of fittingly labeling the text as’ Sarcasm’ or ’non- Sarcasm.’ There is a challenging task owing to the deficiency of facial expressions and intonation in the text. Social media and micro-blogging websites are extensively explored for getting the information to extract the opinion of the target because a huge of text data existence is put out into the open field into social media like Twitter. Such large, openly available text data could be utilized for a variety of researches. Here we applied text data set for classifying Sarcasm and experiments have been made from the textual data extracted from the Twitter data set. Text data set downloaded from Kaggle, including 1984 tweets that collected from Twitter. These data already have labels here. In this paper, we apply these data to train our model Classifiers for different algorithms to see the ability of model machine learning to recognize sarcasm and non-sarcasm through a set of the process start by text pre-processing feature extraction (TF-IDF) and apply different classification algorithms, such as Decision Tree classifier, Multinomial Naïve Bayes Classifier, Support vector machines, and Logistic Regression classifier. Then tuning a model fitting the best results, we get in (TF-IDF) we achieve 0.94% in Multinomial NB, Decision Tree Classifier we achieve 0.93%, Logistic Regression we achieve 0.97%, and Support vector machines (SVM) we achieve 0.42%. All these result models were improved, except the SVM model has the lowest accuracy. The results were extracted, and the evaluation of the results has been proved above to be good in accuracy for identifying sarcastic impressions of people.


Author(s):  
Savita Sangam ◽  
Subhash Shinde

<p>These days it has become a common practice for business organizations and individuals to make use of social media for sharing the opinions about the products or the services.  Consumers are also ready to share their views on certain products or commodities.  Thus huge amount of unstructured social media data gets generated day by day. Gradually heap of text data will be formed in many areas like automated business, education, health care, and show business and so on. Opinion mining also referred as sentiment analysis or sentiment classification, deals with mining of the review text and classifying the opinions or the sentiments of that text as positive or negative. In this paper we propose an ensemble classifier model consisting of Support Vector Machine and Artificial Neural Network. It combines the knowledge from two feature sets for sentiment classification. The proposed model shows the acceptable performance in terms of accuracy when compared with the baseline model.</p>


Sign in / Sign up

Export Citation Format

Share Document