Application of Data Denoising and Classification Algorithm Based on RPCA and Multigroup Random Walk Random Forest in Engineering

Data classification algorithms are often used in the engineering field, but the data measured in the actual engineering often contains different types and degrees of noise, such as vibration noise caused by water flow when measuring the natural frequencies of aqueducts or other hydraulic structures, which will affect the accuracy of classification. In reality, these noises often appear disorganized and stochastic and some existing algorithms exhibit poor performance in the face of these non-Gaussian noise. Therefore, the classification algorithms with excellent performance are needed. To address this issue, a hybrid algorithm of robust principal component analysis (RPCA) combined multigroup random walk random forest (MRWRF) is proposed in this paper. On the one hand RPCA can effectively remove part of non-Gaussian noise, and on the other hand MRWRF can select a better number of decision trees (DTs), which can effectively improve random forest (RF) robustness and classification performance, and the combination of RPCA and MRWRF can effectively classify data with non-Gaussian distribution noise. Compared with other existing algorithms, this hybrid algorithm has strong robustness and preferable classification performance and can thus provide a new approach for data classification problems in engineering.

Download Full-text

Transformer Oil Quality Assessment Using Random Forest with Feature Engineering

Energies ◽

10.3390/en14071809 ◽

2021 ◽

Vol 14 (7) ◽

pp. 1809

Author(s):

Mohammed El Amine Senoussaoui ◽

Mostefa Brahami ◽

Issouf Fofana

Keyword(s):

Machine Learning ◽

Random Forest ◽

Oil Quality ◽

Principal Component ◽

Condition Assessment ◽

Classification Performance ◽

Transformer Oil ◽

Classification Model ◽

Insulation Degradation ◽

Transformer Oils

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.

Download Full-text

A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data

PLoS ONE ◽

10.1371/journal.pone.0258326 ◽

2021 ◽

Vol 16 (10) ◽

pp. e0258326

Author(s):

Wen Bo Liu ◽

Sheng Nan Liang ◽

Xi Wen Qin

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Random Forest ◽

Dimension Reduction ◽

Principal Component ◽

Classification Performance ◽

Kernel Functions ◽

Reduction Algorithm ◽

Expression Data ◽

Weighted Kernel

Gene expression data has the characteristics of high dimensionality and a small sample size and contains a large number of redundant genes unrelated to a disease. The direct application of machine learning to classify this type of data will not only incur a great time cost but will also sometimes fail to improved classification performance. To counter this problem, this paper proposes a dimension-reduction algorithm based on weighted kernel principal component analysis (WKPCA), constructs kernel function weights according to kernel matrix eigenvalues, and combines multiple kernel functions to reduce the feature dimensions. To further improve the dimensional reduction efficiency of WKPCA, t-class kernel functions are constructed, and corresponding theoretical proofs are given. Moreover, the cumulative optimal performance rate is constructed to measure the overall performance of WKPCA combined with machine learning algorithms. Naive Bayes, K-nearest neighbour, random forest, iterative random forest and support vector machine approaches are used in classifiers to analyse 6 real gene expression dataset. Compared with the all-variable model, linear principal component dimension reduction and single kernel function dimension reduction, the results show that the classification performance of the 5 machine learning methods mentioned above can be improved effectively by WKPCA dimension reduction.

Download Full-text

HYBRID NSGA-II OPTIMIZATION FOR IMPROVING THE THREE-TERM BP NETWORK FOR MULTICLASS CLASSIFICATION PROBLEMS

Journal of Information and Communication Technology ◽

10.32890/jict2015.14.0.8154 ◽

2015 ◽

Author(s):

Ashraf Osman Ibrahim ◽

Siti Mariyam Shamsuddin ◽

Sultan Noman Qasem

Keyword(s):

Genetic Algorithm ◽

Evolutionary Algorithm ◽

Hybrid Algorithm ◽

Search Algorithm ◽

Classification Performance ◽

Classification Problems ◽

Nsga Ii ◽

Bp Network ◽

Multiobjective Evolutionary Algorithm

Recently, hybrid algorithms have received considerable attention from a number of researchers. This paper presents a hybrid of the multiobjective evolutionary algorithm to gain a better accuracy of the fi nal solutions. The aim of using the hybrid algorithm is to improve the multiobjective evolutionary algorithm performance in terms of the enhancement of all the individuals in the population and increase the quality of the Pareto optimal solutions. The multiobjective evolutionary algorithm used in this study is a nondominated sorting genetic algorithm-II (NSGA-II) together with its hybrid, the backpropagation algorithm (BP), which is used as a local search algorithm to optimize the accuracy and complexity of the three-term backpropagation (TBP) network. The outcome positively demonstrates that the hybrid algorithm is able to improve the classification performance with a smaller number of hidden nodes and is effective in multiclass classifi cation problems. Furthermore, the results indicate that the proposed hybrid method is a potentially useful classifi er for enhancing the classification process ability when compared with the multiobjective genetic algorithm based on the TBP network (MOGATBP) and certain other methods found in the literature.

Download Full-text

A New Method for Solving Supervised Data Classification Problems

Abstract and Applied Analysis ◽

10.1155/2014/318478 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Parvaneh Shabanzadeh ◽

Rubiyah Yusof

Keyword(s):

Mining Industry ◽

Data Classification ◽

Classification Performance ◽

Classification Model ◽

Selection Algorithm ◽

Classification Problems ◽

Nonsmooth Nonconvex Optimization ◽

Derivative Free ◽

Real World Datasets ◽

New Feature

Supervised data classification is one of the techniques used to extract nontrivial information from data. Classification is a widely used technique in various fields, including data mining, industry, medicine, science, and law. This paper considers a new algorithm for supervised data classification problems associated with the cluster analysis. The mathematical formulations for this algorithm are based on nonsmooth, nonconvex optimization. A new algorithm for solving this optimization problem is utilized. The new algorithm uses a derivative-free technique, with robustness and efficiency. To improve classification performance and efficiency in generating classification model, a new feature selection algorithm based on techniques of convex programming is suggested. Proposed methods are tested on real-world datasets. Results of numerical experiments have been presented which demonstrate the effectiveness of the proposed algorithms.

Download Full-text

Comparison with Classification Algorithms in Data Mining of a Fuel Automation System's Sales Data

Global Economics Review ◽

10.31703/ger.2020(v-i).20 ◽

2020 ◽

Vol V (I) ◽

pp. 245-254

Author(s):

Ilhan Tarimer ◽

Buse Cennet Karadag

Keyword(s):

Data Mining ◽

Logistic Regression ◽

Random Forest ◽

Naive Bayes ◽

Data Classification ◽

Classification Algorithms ◽

Sales Data ◽

Station Data ◽

Classification Procedures ◽

Accuracy Rates

This article deals with Otobil and pumps sales estimates at fuel stations. The fuel station data used in the study consists of 2384 data in total. Depending upon these data, classification procedures were performed on fuel station sales data using classification algorithms. In the study the classification algorithms that J48, Random Forest, KStar, Logistic Regression, IBk and Naive Bayes algorithms are used to compare the sales data estimations by using a software. The results obtained show that the accuracy rates of the J48 algorithm are more successful than others in general. It understands that these sales estimations shall encourage fuel station owners and association bodies to get more gainful.

Download Full-text

K-Means Cluster Based Undersampling Ensemble for Imbalanced Data Classification

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5188.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2074-2079

Keyword(s):

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

Classification Algorithms ◽

Classification Problems ◽

Imbalanced Classification ◽

Imbalanced Data Classification ◽

Traditional Classification ◽

Boosting Method ◽

Ensemble Algorithms

Imbalanced data classification is a critical and challenging problem in both data mining and machine learning. Imbalanced data classification problems present in many application areas like rare medical diagnosis, risk management, fault-detection, etc. The traditional classification algorithms yield poor results in imbalanced classification problems. In this paper, K-Means cluster based undersampling ensemble algorithm is proposed to solve the imbalanced data classification problem. The proposed method combines K-Means cluster based undersampling and boosting method. The experimental results show that the proposed algorithm outperforms the other sampling ensemble algorithms of previous studies.

Download Full-text

Classification Performance Using Principal Component Analysis and Different Value of the Ratio R

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2011.2.2180 ◽

2011 ◽

Vol 6 (2) ◽

pp. 317 ◽

Cited By ~ 2

Author(s):

Jasmina Novakovic ◽

Sinisa Rankov

Keyword(s):

Principal Component Analysis ◽

Feature Extraction ◽

Principal Component ◽

Component Analysis ◽

Classification Performance ◽

Classification Algorithms ◽

Data Sets ◽

Best Value ◽

Inherent Feature ◽

Different Types

A comparison between several classification algorithms with feature extraction on real dataset is presented. Principal Component Analysis (PCA) has been used for feature extraction with different values of the ratio R, evaluated and compared using four different types of classifiers on two real benchmark data sets. Accuracy of the classifiers is influenced by the choice of different values of the ratio R. There is no best value of the ratio R, for different datasets and different classifiers accuracy curves as a function of the number of features used may significantly differ. In our cases feature extraction is especially effective for classification algorithms that do not have any inherent feature selections or feature extraction build in, such as the nearest neighbour methods or some types of neural networks.

Download Full-text