scholarly journals Application of Data Denoising and Classification Algorithm Based on RPCA and Multigroup Random Walk Random Forest in Engineering

2019 ◽  
Vol 2019 ◽  
pp. 1-15
Author(s):  
Renchao Wang ◽  
Yanlei Wang ◽  
Yuming Ma

Data classification algorithms are often used in the engineering field, but the data measured in the actual engineering often contains different types and degrees of noise, such as vibration noise caused by water flow when measuring the natural frequencies of aqueducts or other hydraulic structures, which will affect the accuracy of classification. In reality, these noises often appear disorganized and stochastic and some existing algorithms exhibit poor performance in the face of these non-Gaussian noise. Therefore, the classification algorithms with excellent performance are needed. To address this issue, a hybrid algorithm of robust principal component analysis (RPCA) combined multigroup random walk random forest (MRWRF) is proposed in this paper. On the one hand RPCA can effectively remove part of non-Gaussian noise, and on the other hand MRWRF can select a better number of decision trees (DTs), which can effectively improve random forest (RF) robustness and classification performance, and the combination of RPCA and MRWRF can effectively classify data with non-Gaussian distribution noise. Compared with other existing algorithms, this hybrid algorithm has strong robustness and preferable classification performance and can thus provide a new approach for data classification problems in engineering.

Energies ◽  
2021 ◽  
Vol 14 (7) ◽  
pp. 1809
Author(s):  
Mohammed El Amine Senoussaoui ◽  
Mostefa Brahami ◽  
Issouf Fofana

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.


PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0258326
Author(s):  
Wen Bo Liu ◽  
Sheng Nan Liang ◽  
Xi Wen Qin

Gene expression data has the characteristics of high dimensionality and a small sample size and contains a large number of redundant genes unrelated to a disease. The direct application of machine learning to classify this type of data will not only incur a great time cost but will also sometimes fail to improved classification performance. To counter this problem, this paper proposes a dimension-reduction algorithm based on weighted kernel principal component analysis (WKPCA), constructs kernel function weights according to kernel matrix eigenvalues, and combines multiple kernel functions to reduce the feature dimensions. To further improve the dimensional reduction efficiency of WKPCA, t-class kernel functions are constructed, and corresponding theoretical proofs are given. Moreover, the cumulative optimal performance rate is constructed to measure the overall performance of WKPCA combined with machine learning algorithms. Naive Bayes, K-nearest neighbour, random forest, iterative random forest and support vector machine approaches are used in classifiers to analyse 6 real gene expression dataset. Compared with the all-variable model, linear principal component dimension reduction and single kernel function dimension reduction, the results show that the classification performance of the 5 machine learning methods mentioned above can be improved effectively by WKPCA dimension reduction.


Author(s):  
Ashraf Osman Ibrahim ◽  
Siti Mariyam Shamsuddin ◽  
Sultan Noman Qasem

Recently, hybrid algorithms have received considerable attention from a number of researchers. This paper presents a hybrid of the multiobjective evolutionary algorithm to gain a better accuracy of the fi nal solutions. The aim of using the hybrid algorithm is to improve the multiobjective evolutionary algorithm performance in terms of the enhancement of all the individuals in the population and increase the quality of the Pareto optimal solutions. The multiobjective evolutionary algorithm used in this study is a nondominated sorting genetic algorithm-II (NSGA-II) together with its hybrid, the backpropagation algorithm (BP), which is used as a local search algorithm to optimize the accuracy and complexity of the three-term backpropagation (TBP) network. The outcome positively demonstrates that the hybrid algorithm is able to improve the classification performance with a smaller number of hidden nodes and is effective in multiclass classifi cation problems. Furthermore, the results indicate that the proposed hybrid method is a potentially useful classifi er for enhancing the classification process ability when compared with the multiobjective genetic algorithm based on the TBP network (MOGATBP) and certain other methods found in the literature.  


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Parvaneh Shabanzadeh ◽  
Rubiyah Yusof

Supervised data classification is one of the techniques used to extract nontrivial information from data. Classification is a widely used technique in various fields, including data mining, industry, medicine, science, and law. This paper considers a new algorithm for supervised data classification problems associated with the cluster analysis. The mathematical formulations for this algorithm are based on nonsmooth, nonconvex optimization. A new algorithm for solving this optimization problem is utilized. The new algorithm uses a derivative-free technique, with robustness and efficiency. To improve classification performance and efficiency in generating classification model, a new feature selection algorithm based on techniques of convex programming is suggested. Proposed methods are tested on real-world datasets. Results of numerical experiments have been presented which demonstrate the effectiveness of the proposed algorithms.


2020 ◽  
Vol V (I) ◽  
pp. 245-254
Author(s):  
Ilhan Tarimer ◽  
Buse Cennet Karadag

This article deals with Otobil and pumps sales estimates at fuel stations. The fuel station data used in the study consists of 2384 data in total. Depending upon these data, classification procedures were performed on fuel station sales data using classification algorithms. In the study the classification algorithms that J48, Random Forest, KStar, Logistic Regression, IBk and Naive Bayes algorithms are used to compare the sales data estimations by using a software. The results obtained show that the accuracy rates of the J48 algorithm are more successful than others in general. It understands that these sales estimations shall encourage fuel station owners and association bodies to get more gainful.


Imbalanced data classification is a critical and challenging problem in both data mining and machine learning. Imbalanced data classification problems present in many application areas like rare medical diagnosis, risk management, fault-detection, etc. The traditional classification algorithms yield poor results in imbalanced classification problems. In this paper, K-Means cluster based undersampling ensemble algorithm is proposed to solve the imbalanced data classification problem. The proposed method combines K-Means cluster based undersampling and boosting method. The experimental results show that the proposed algorithm outperforms the other sampling ensemble algorithms of previous studies.


Author(s):  
Jasmina Novakovic ◽  
Sinisa Rankov

A comparison between several classification algorithms with feature extraction on real dataset is presented. Principal Component Analysis (PCA) has been used for feature extraction with different values of the ratio R, evaluated and compared using four different types of classifiers on two real benchmark data sets. Accuracy of the classifiers is influenced by the choice of different values of the ratio R. There is no best value of the ratio R, for different datasets and different classifiers accuracy curves as a function of the number of features used may significantly differ. In our cases feature extraction is especially effective for classification algorithms that do not have any inherent feature selections or feature extraction build in, such as the nearest neighbour methods or some types of neural networks.


2012 ◽  
Vol 71 (17) ◽  
pp. 1541-1555
Author(s):  
V. A. Baranov ◽  
S. V. Baranov ◽  
A. V. Nozdrachev ◽  
A. A. Rogov

Sign in / Sign up

Export Citation Format

Share Document