Optimized deep belief network and entropy-based hybrid bounding model for incremental text categorization

2020 ◽  
Vol 16 (3) ◽  
pp. 347-368
Author(s):  
V. Srilakshmi ◽  
K. Anuradha ◽  
C. Shoba Bindu

Purpose This paper aims to model a technique that categorizes the texts from huge documents. The progression in internet technologies has raised the count of document accessibility, and thus the documents available online become countless. The text documents comprise of research article, journal papers, newspaper, technical reports and blogs. These large documents are useful and valuable for processing real-time applications. Also, these massive documents are used in several retrieval methods. Text classification plays a vital role in information retrieval technologies and is considered as an active field for processing massive applications. The aim of text classification is to categorize the large-sized documents into different categories on the basis of its contents. There exist numerous methods for performing text-related tasks such as profiling users, sentiment analysis and identification of spams, which is considered as a supervised learning issue and is addressed with text classifier. Design/methodology/approach At first, the input documents are pre-processed using the stop word removal and stemming technique such that the input is made effective and capable for feature extraction. In the feature extraction process, the features are extracted using the vector space model (VSM) and then, the feature selection is done for selecting the highly relevant features to perform text categorization. Once the features are selected, the text categorization is progressed using the deep belief network (DBN). The training of the DBN is performed using the proposed grasshopper crow optimization algorithm (GCOA) that is the integration of the grasshopper optimization algorithm (GOA) and Crow search algorithm (CSA). Moreover, the hybrid weight bounding model is devised using the proposed GCOA and range degree. Thus, the proposed GCOA + DBN is used for classifying the text documents. Findings The performance of the proposed technique is evaluated using accuracy, precision and recall is compared with existing techniques such as naive bayes, k-nearest neighbors, support vector machine and deep convolutional neural network (DCNN) and Stochastic Gradient-CAViaR + DCNN. Here, the proposed GCOA + DBN has improved performance with the values of 0.959, 0.959 and 0.96 for precision, recall and accuracy, respectively. Originality/value This paper proposes a technique that categorizes the texts from massive sized documents. From the findings, it can be shown that the proposed GCOA-based DBN effectively classifies the text documents.

2020 ◽  
Vol 54 (4) ◽  
pp. 529-549
Author(s):  
Arshey M. ◽  
Angel Viji K. S.

PurposePhishing is a serious cybersecurity problem, which is widely available through multimedia, such as e-mail and Short Messaging Service (SMS) to collect the personal information of the individual. However, the rapid growth of the unsolicited and unwanted information needs to be addressed, raising the necessity of the technology to develop any effective anti-phishing methods.Design/methodology/approachThe primary intention of this research is to design and develop an approach for preventing phishing by proposing an optimization algorithm. The proposed approach involves four steps, namely preprocessing, feature extraction, feature selection and classification, for dealing with phishing e-mails. Initially, the input data set is subjected to the preprocessing, which removes stop words and stemming in the data and the preprocessed output is given to the feature extraction process. By extracting keyword frequency from the preprocessed, the important words are selected as the features. Then, the feature selection process is carried out using the Bhattacharya distance such that only the significant features that can aid the classification are selected. Using the selected features, the classification is done using the deep belief network (DBN) that is trained using the proposed fractional-earthworm optimization algorithm (EWA). The proposed fractional-EWA is designed by the integration of EWA and fractional calculus to determine the weights in the DBN optimally.FindingsThe accuracy of the methods, naive Bayes (NB), DBN, neural network (NN), EWA-DBN and fractional EWA-DBN is 0.5333, 0.5455, 0.5556, 0.5714 and 0.8571, respectively. The sensitivity of the methods, NB, DBN, NN, EWA-DBN and fractional EWA-DBN is 0.4558, 0.5631, 0.7035, 0.7045 and 0.8182, respectively. Likewise, the specificity of the methods, NB, DBN, NN, EWA-DBN and fractional EWA-DBN is 0.5052, 0.5631, 0.7028, 0.7040 and 0.8800, respectively. It is clear from the comparative table that the proposed method acquired the maximal accuracy, sensitivity and specificity compared with the existing methods.Originality/valueThe e-mail phishing detection is performed in this paper using the optimization-based deep learning networks. The e-mails include a number of unwanted messages that are to be detected in order to avoid the storage issues. The importance of the method is that the inclusion of the historical data in the detection process enhances the accuracy of detection.


2020 ◽  
Vol 54 (5) ◽  
pp. 585-601
Author(s):  
N. Venkata Sailaja ◽  
L. Padmasree ◽  
N. Mangathayaru

PurposeText mining has been used for various knowledge discovery based applications, and thus, a lot of research has been contributed towards it. Latest trending research in the text mining is adopting the incremental learning data, as it is economical while dealing with large volume of information.Design/methodology/approachThe primary intention of this research is to design and develop a technique for incremental text categorization using optimized Support Vector Neural Network (SVNN). The proposed technique involves four major steps, such as pre-processing, feature selection, classification and feature extraction. Initially, the data is pre-processed based on stop word removal and stemming. Then, the feature extraction is done by extracting semantic word-based features and Term Frequency and Inverse Document Frequency (TF-IDF). From the extracted features, the important features are selected using Bhattacharya distance measure and the features are subjected as the input to the proposed classifier. The proposed classifier performs incremental learning using SVNN, wherein the weights are bounded in a limit using rough set theory. Moreover, for the optimal selection of weights in SVNN, Moth Search (MS) algorithm is used. Thus, the proposed classifier, named Rough set MS-SVNN, performs the text categorization for the incremental data, given as the input.FindingsFor the experimentation, the 20 News group dataset, and the Reuters dataset are used. Simulation results indicate that the proposed Rough set based MS-SVNN has achieved 0.7743, 0.7774 and 0.7745 for the precision, recall and F-measure, respectively.Originality/valueIn this paper, an online incremental learner is developed for the text categorization. The text categorization is done by developing the Rough set MS-SVNN classifier, which classifies the incoming texts based on the boundary condition evaluated by the Rough set theory, and the optimal weights from the MS. The proposed online text categorization scheme has the basic steps, like pre-processing, feature extraction, feature selection and classification. The pre-processing is carried out to identify the unique words from the dataset, and the features like semantic word-based features and TF-IDF are obtained from the keyword set. Feature selection is done by setting a minimum Bhattacharya distance measure, and the selected features are provided to the proposed Rough set MS-SVNN for the classification.


2021 ◽  
Vol 12 (1) ◽  
pp. 47
Author(s):  
Dexin Gao ◽  
Xihao Lin

According to the complex fault mechanism of direct current (DC) charging points for electric vehicles (EVs) and the poor application effect of traditional fault diagnosis methods, a new kind of fault diagnosis method for DC charging points for EVs based on deep belief network (DBN) is proposed, which combines the advantages of DBN in feature extraction and processing nonlinear data. This method utilizes the actual measurement data of the charging points to realize the unsupervised feature extraction and parameter fine-tuning of the network, and builds the deep network model to complete the accurate fault diagnosis of the charging points. The effectiveness of this method is examined by comparing with the backpropagation neural network, radial basis function neural network, support vector machine, and convolutional neural network in terms of accuracy and model convergence time. The experimental results prove that the proposed method has a higher fault diagnosis accuracy than the above fault diagnosis methods.


Kybernetes ◽  
2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Pandiaraj A. ◽  
Sundar C. ◽  
Pavalarajan S.

Purpose Up to date development in sentiment analysis has resulted in a symbolic growth in the volume of study, especially on more subjective text types, namely, product or movie reviews. The key difference between these texts with news articles is that their target is defined and unique across the text. Hence, the reviews on newspaper articles can deal with three subtasks: correctly spotting the target, splitting the good and bad content from the reviews on the concerned target and evaluating different opinions provided in a detailed manner. On defining these tasks, this paper aims to implement a new sentiment analysis model for article reviews from the newspaper. Design/methodology/approach Here, tweets from various newspaper articles are taken and the sentiment analysis process is done with pre-processing, semantic word extraction, feature extraction and classification. Initially, the pre-processing phase is performed, in which different steps such as stop word removal, stemming, blank space removal are carried out and it results in producing the keywords that speak about positive, negative or neutral. Further, semantic words (similar) are extracted from the available dictionary by matching the keywords. Next, the feature extraction is done for the extracted keywords and semantic words using holoentropy to attain information statistics, which results in the attainment of maximum related information. Here, two categories of holoentropy features are extracted: joint holoentropy and cross holoentropy. These extracted features of entire keywords are finally subjected to a hybrid classifier, which merges the beneficial concepts of neural network (NN), and deep belief network (DBN). For improving the performance of sentiment classification, modification is done by inducing the idea of a modified rider optimization algorithm (ROA), so-called new steering updated ROA (NSU-ROA) into NN and DBN for weight update. Hence, the average of both improved classifiers will provide the classified sentiment as positive, negative or neutral from the reviews of newspaper articles effectively. Findings Three data sets were considered for experimentation. The results have shown that the developed NSU-ROA + DBN + NN attained high accuracy, which was 2.6% superior to particle swarm optimization, 3% superior to FireFly, 3.8% superior to grey wolf optimization, 5.5% superior to whale optimization algorithm and 3.2% superior to ROA-based DBN + NN from data set 1. The classification analysis has shown that the accuracy of the proposed NSU − DBN + NN was 3.4% enhanced than DBN + NN, 25% enhanced than DBN and 28.5% enhanced than NN and 32.3% enhanced than support vector machine from data set 2. Thus, the effective performance of the proposed NSU − ROA + DBN + NN on sentiment analysis of newspaper articles has been proved. Originality/value This paper adopts the latest optimization algorithm called the NSU-ROA to effectively recognize the sentiments of the newspapers with NN and DBN. This is the first work that uses NSU-ROA-based optimization for accurate identification of sentiments from newspaper articles.


Actuators ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 56
Author(s):  
Yongye Wu ◽  
Zhanlong Zhang ◽  
Rui Xiao ◽  
Peiyu Jiang ◽  
Zijian Dong ◽  
...  

The converter transformer is a special power transformer that connects the converter bridge to the AC system in the HVDC transmission system. Due to the special structure of the converter transformer, it is necessary to test its operation state during its manufacture and processing to ensure the safety of its future connection to the grid. Numerous studies have shown that vibration signals in transformers can reflect their operating state. Therefore, in order to achieve an effective identification of the operation state of the converter transformer, this paper proposes a method for identifying the operation state of the converter transformer based on vibration detection technology and a deep belief network optimization algorithm. This paper firstly describes the background, principle and application of vibration detection technology, using vibration measurement systems with piezoelectric acceleration sensors, piezoelectric actuators and data acquisition instruments to collect vibration signals at different measurement points on the converter transformer in states of no-load and on-load. By analyzing the time-frequency characteristics of the vibration signals, fast Fourier transform (FFT), wavelet packet decomposition (WPD) and time domain indexes (TDI) are combined into a fused feature extraction method to extract the eigenvalues of the vibration signals, so that the fused eigenvectors of the signals can be constructed. Considering the excellent performance of deep learning in classification, the deep belief network is used to classify the signals’ eigenvectors. To effectively improve the network classification efficiency, the sparrow search algorithm was introduced to build a mathematical model based on the behavioral characteristics of sparrow populations and combine the model with a deep belief network, so as to achieve adaptive parameter optimization of the network and accurate classification of the signals’ eigenvectors. The proposed method is applied to a 500 kV converter transformer for experimental verification. The experimental results show that the fused feature extraction method was able to fully extract the features of the vibration signal, and the deep belief network optimization algorithm had higher classification accuracy and better operational efficiency, and was able to effectively achieve accurate identification of the operation state of the converter transformer. In addition, the method achieved a precision response to the detection results of the vibration sensors, contributing to future improvements in converter transformer manufacturing technology.


2012 ◽  
Vol 532-533 ◽  
pp. 1191-1195 ◽  
Author(s):  
Zhen Yan Liu ◽  
Wei Ping Wang ◽  
Yong Wang

This paper introduces the design of a text categorization system based on Support Vector Machine (SVM). It analyzes the high dimensional characteristic of text data, the reason why SVM is suitable for text categorization. According to system data flow this system is constructed. This system consists of three subsystems which are text representation, classifier training and text classification. The core of this system is the classifier training, but text representation directly influences the currency of classifier and the performance of the system. Text feature vector space can be built by different kinds of feature selection and feature extraction methods. No research can indicate which one is the best method, so many feature selection and feature extraction methods are all developed in this system. For a specific classification task every feature selection method and every feature extraction method will be tested, and then a set of the best methods will be adopted.


2021 ◽  
Vol 12 (3) ◽  
pp. 185-207
Author(s):  
Anjali A. Shejul ◽  
Kinage K. S. ◽  
Eswara Reddy B.

Age estimation has been paid great attention in the field of intelligent surveillance, face recognition, biometrics, etc. In contrast to other facial variations, aging variation presents several unique characteristics, which make age estimation very challenging. The overall process of age estimation is performed using three important steps. In the first step, the pre-processing is performed from the input image based on Viola-Jones algorithm to detect the face region. In the second step, feature extraction is done based on three important features such as local transform directional pattern (LTDP), active appearance model (AAM), and the new feature, deep appearance model (Deep AM). After feature extraction, the classification is carried out based on the extracted features using deep belief network (DBN), where the DBN classifier is trained optimally using the proposed learning algorithm named as crow-sine cosine algorithm (CS).


Sensor Review ◽  
2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Rabeb Faleh ◽  
Sami Gomri ◽  
Khalifa Aguir ◽  
Abdennaceur Kachouri

Purpose The purpose of this paper is to deal with the classification improvement of pollutant using WO3 gases sensors. To evaluate the discrimination capacity, some experiments were achieved using three gases: ozone, ethanol, acetone and a mixture of ozone and ethanol via four WO3 sensors. Design/methodology/approach To improve the classification accuracy and enhance selectivity, some combined features that were configured through the principal component analysis were used. First, evaluate the discrimination capacity; some experiments were performed using three gases: ozone, ethanol, acetone and a mixture of ozone and ethanol, via four WO3 sensors. To this end, three features that are derivate, integral and the time corresponding to the peak derivate have been extracted from each transient sensor response according to four WO3 gas sensors used. Then these extracted parameters were used in a combined array. Findings The results show that the proposed feature extraction method could extract robust information. The Extreme Learning Machine (ELM) was used to identify the studied gases. In addition, ELM was compared with the Support Vector Machine (SVM). The experimental results prove the superiority of the combined features method in our E-nose application, as this method achieves the highest classification rate of 90% using the ELM and 93.03% using the SVM based on Radial Basis Kernel Function SVM-RBF. Originality/value Combined features have been configured from transient response to improve the classification accuracy. The achieved results show that the proposed feature extraction method could extract robust information. The ELM and SVM were used to identify the studied gases.


Author(s):  
Sarmad Mahar ◽  
Sahar Zafar ◽  
Kamran Nishat

Headnotes are the precise explanation and summary of legal points in an issued judgment. Law journals hire experienced lawyers to write these headnotes. These headnotes help the reader quickly determine the issue discussed in the case. Headnotes comprise two parts. The first part comprises the topic discussed in the judgment, and the second part contains a summary of that judgment. In this thesis, we design, develop and evaluate headnote prediction using machine learning, without involving human involvement. We divided this task into a two steps process. In the first step, we predict law points used in the judgment by using text classification algorithms. The second step generates a summary of the judgment using text summarization techniques. To achieve this task, we created a Databank by extracting data from different law sources in Pakistan. We labelled training data generated based on Pakistan law websites. We tested different feature extraction methods on judiciary data to improve our system. Using these feature extraction methods, we developed a dictionary of terminology for ease of reference and utility. Our approach achieves 65% accuracy by using Linear Support Vector Classification with tri-gram and without stemmer. Using active learning our system can continuously improve the accuracy with the increased labelled examples provided by the users of the system.


Sign in / Sign up

Export Citation Format

Share Document