scholarly journals A High-Dimensional and Small-Sample Submersible Fault Detection Method Based on Feature Selection and Data Augmentation

Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 204
Author(s):  
Penghui Zhao ◽  
Qinghe Zheng ◽  
Zhongjun Ding ◽  
Yi Zhang ◽  
Hongjun Wang ◽  
...  

The fault detection of manned submersibles plays a very important role in protecting the safety of submersible equipment and personnel. However, the diving sensor data is scarce and high-dimensional, so this paper proposes a submersible fault detection method, which is made up of feature selection module based on hierarchical clustering and Autoencoder (AE), the improved Deep Convolutional Generative Adversarial Networks (DCGAN)-based data augmentation module and fault detection module using Convolutional Neural Network (CNN) with LeNet-5 structure. First, feature selection is developed to select the features that have a strong correlation with failure event. Second, data augmentation model is conducted to generate sufficient data for training the CNN model, including rough data generation and data refiners. Finally, a fault detection framework with LeNet-5 is trained and fine-tuned by synthetic data, and tested using real data. Experiment results based on sensor data from submersible hydraulic system demonstrate that our proposed method can successfully detect the fault samples. The detection accuracy of proposed method can reach 97% and our method significantly outperforms other classic detection algorithms.

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Jing Zhang ◽  
Guang Lu ◽  
Jiaquan Li ◽  
Chuanwen Li

Mining useful knowledge from high-dimensional data is a hot research topic. Efficient and effective sample classification and feature selection are challenging tasks due to high dimensionality and small sample size of microarray data. Feature selection is necessary in the process of constructing the model to reduce time and space consumption. Therefore, a feature selection model based on prior knowledge and rough set is proposed. Pathway knowledge is used to select feature subsets, and rough set based on intersection neighborhood is then used to select important feature in each subset, since it can select features without redundancy and deals with numerical features directly. In order to improve the diversity among base classifiers and the efficiency of classification, it is necessary to select part of base classifiers. Classifiers are grouped into several clusters by k-means clustering using the proposed combination distance of Kappa-based diversity and accuracy. The base classifier with the best classification performance in each cluster will be selected to generate the final ensemble model. Experimental results on three Arabidopsis thaliana stress response datasets showed that the proposed method achieved better classification performance than existing ensemble models.


2021 ◽  
pp. 444-454
Author(s):  
Liu Weiwei ◽  
Lei Shuya ◽  
Zheng Xiaokun ◽  
Li Han ◽  
Wang Xinyu ◽  
...  

Sensors ◽  
2020 ◽  
Vol 20 (22) ◽  
pp. 6673
Author(s):  
Lichuan Zou ◽  
Hong Zhang ◽  
Chao Wang ◽  
Fan Wu ◽  
Feng Gu

In high-resolution Synthetic Aperture Radar (SAR) ship detection, the number of SAR samples seriously affects the performance of the algorithms based on deep learning. In this paper, aiming at the application requirements of high-resolution ship detection in small samples, a high-resolution SAR ship detection method combining an improved sample generation network, Multiscale Wasserstein Auxiliary Classifier Generative Adversarial Networks (MW-ACGAN) and the Yolo v3 network is proposed. Firstly, the multi-scale Wasserstein distance and gradient penalty loss are used to improve the original Auxiliary Classifier Generative Adversarial Networks (ACGAN), so that the improved network can stably generate high-resolution SAR ship images. Secondly, the multi-scale loss term is added to the network, so the multi-scale image output layers are added, and multi-scale SAR ship images can be generated. Then, the original ship data set and the generated data are combined into a composite data set to train the Yolo v3 target detection network, so as to solve the problem of low detection accuracy under small sample data set. The experimental results of Gaofen-3 (GF-3) 3 m SAR data show that the MW-ACGAN network can generate multi-scale and multi-class ship slices, and the confidence level of ResNet18 is higher than that of ACGAN network, with an average score of 0.91. The detection results of Yolo v3 network model show that the detection accuracy trained by the composite data set is as high as 94%, which is far better than that trained only by the original SAR data set. These results show that our method can make the best use of the original data set, improve the accuracy of ship detection.


2019 ◽  
Vol 17 (4) ◽  
pp. 340-359 ◽  
Author(s):  
A N M Bazlur Rashid ◽  
Tonmoy Choudhury

The term “big data” characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs – volume, velocity, variety, and veracity - to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-and-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-to-use distributed, scalable, and fault-tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-the-art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions.


2021 ◽  
Author(s):  
Merim Dzaferagic ◽  
Nicola Marchetti ◽  
Irene Macaluso

This paper addresses the issue of reliability in Industrial Internet of Things (IIoT) in case of missing sensors measurements due to network or hardware problems. We propose to support the fault detection and classification modules, which are the two critical components of a monitoring system for IIoT, with a generative model. The latter is responsible of imputing missing sensor measurements so that the monitoring system performance is robust to missing data. In particular, we adopt Generative Adversarial Networks (GANs) to generate missing sensor measurements and we propose to fine-tune the training of the GAN based on the impact that the generated data have on the fault detection and classification modules. We conduct a thorough evaluation of the proposed approach using the extended Tennessee Eastman Process dataset. Results show that the GAN-imputed data mitigate the impact on the fault detection and classification even in the case of persistently missing measurements from sensors that are critical for the correct functioning of the monitoring system.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Xuguang Liu

Aiming at the anomaly detection problem in sensor data, traditional algorithms usually only focus on the continuity of single-source data and ignore the spatiotemporal correlation between multisource data, which reduces detection accuracy to a certain extent. Besides, due to the rapid growth of sensor data, centralized cloud computing platforms cannot meet the real-time detection needs of large-scale abnormal data. In order to solve this problem, a real-time detection method for abnormal data of IoT sensors based on edge computing is proposed. Firstly, sensor data is represented as time series; K-nearest neighbor (KNN) algorithm is further used to detect outliers and isolated groups of the data stream in time series. Secondly, an improved DBSCAN (Density Based Spatial Clustering of Applications with Noise) algorithm is proposed by considering spatiotemporal correlation between multisource data. It can be set according to sample characteristics in the window and overcomes the slow convergence problem using global parameters and large samples, then makes full use of data correlation to complete anomaly detection. Moreover, this paper proposes a distributed anomaly detection model for sensor data based on edge computing. It performs data processing on computing resources close to the data source as much as possible, which improves the overall efficiency of data processing. Finally, simulation results show that the proposed method has higher computational efficiency and detection accuracy than traditional methods and has certain feasibility.


Sign in / Sign up

Export Citation Format

Share Document