Classification of Complex Machine Data to be Used for Structural Health Monitoring Purposes

Author(s):  
S. Schiffer ◽  
D. Söffker

In this contribution a recently developed new modeling and classification approach to be used with streamed measurement data of industrial processes is applied. This briefly repeated approach can be used for fault classification and diagnostic purposes. The approach is based on a fuzzy-like modeling using statistical features from training data. Based on the trained model classification can be realized allowing to distinguish unknown data sets to the given number of data classes each related to states. Beside the brief introduction to the proposed approach, experimental data are used to demonstrate the approach and the complex example distinguishing different wear states of machine components during operation.

Author(s):  
Hyunjae Kim ◽  
Jong Moon Ha ◽  
Jungho Park ◽  
Sunuwe Kim ◽  
Keunsu Kim ◽  
...  

In the 2015 PHM Data Challenge Competition, the goal of the competition problem was to diagnose failure of industrial plant systems using incomplete data. The available data consisted of sensor measurements, control reference signals, and fault logs. A detailed description of the plant system of interest was not revealed, and partial fault logs were eliminated from the dataset. This paper presents a fault log recovery method using a machine-learning-based fault classification approach for failure diagnosis. For optimal performance, it was critical to be able to utilize a set of incomplete data and to select relevant features. First, physical interpretation of the given data was performed to select proper features for a fault classifier. Second, Fisher discriminant analysis (FDA) was employed to minimize the effect of outliers in the incomplete data sets. Finally, the type of the missing fault logs and the duration of the corresponding faults were recovered. The proposed approach, based on the use of an incomplete-data-trained FDA classifier, led to the second-highest score in the 2015 PHM Data Challenge Competition.


2018 ◽  
Vol 25 (3) ◽  
pp. 655-670 ◽  
Author(s):  
Tsung-Wei Ke ◽  
Aaron S. Brewster ◽  
Stella X. Yu ◽  
Daniela Ushizima ◽  
Chao Yang ◽  
...  

A new tool is introduced for screening macromolecular X-ray crystallography diffraction images produced at an X-ray free-electron laser light source. Based on a data-driven deep learning approach, the proposed tool executes a convolutional neural network to detect Bragg spots. Automatic image processing algorithms described can enable the classification of large data sets, acquired under realistic conditions consisting of noisy data with experimental artifacts. Outcomes are compared for different data regimes, including samples from multiple instruments and differing amounts of training data for neural network optimization.


Author(s):  
Y. SARATH KUMAR ◽  
ESWAR KODALI ◽  
P. HARINI

In this paper we proposed a lexical-pattern-based approach to extract aliases of a given name. We use a set of names and their aliases as training data to extract lexical patterns that describe numerous ways in which information related to aliases of a name is presented on the web. An individual is typically referred by numerous name aliases on the web. Accurate identification of aliases of a given person name is useful in various web related tasks such as information retrieval, sentiment analysis, personal name disambiguation, and relation extraction. We propose a method to extract aliases of a given personal name from the web. Given a personal name, the proposed method first extracts a set of candidate aliases. Second, we rank the extracted candidates according to the likelihood of a candidate being a correct alias of the given name. We evaluate the proposed method on three data sets: an English personal names data set, an English place names data set, and a Japanese personal names data set. The proposed method outperforms numerous baselines and previously proposed name alias extraction methods, achieving a statistically significant mean reciprocal rank (MRR) of 0.67.


2018 ◽  
Vol 10 (10) ◽  
pp. 1564 ◽  
Author(s):  
Patrick Bradley ◽  
Sina Keller ◽  
Martin Weinmann

In this paper, we investigate the potential of unsupervised feature selection techniques for classification tasks, where only sparse training data are available. This is motivated by the fact that unsupervised feature selection techniques combine the advantages of standard dimensionality reduction techniques (which only rely on the given feature vectors and not on the corresponding labels) and supervised feature selection techniques (which retain a subset of the original set of features). Thus, feature selection becomes independent of the given classification task and, consequently, a subset of generally versatile features is retained. We present different techniques relying on the topology of the given sparse training data. Thereby, the topology is described with an ultrametricity index. For the latter, we take into account the Murtagh Ultrametricity Index (MUI) which is defined on the basis of triangles within the given data and the Topological Ultrametricity Index (TUI) which is defined on the basis of a specific graph structure. In a case study addressing the classification of high-dimensional hyperspectral data based on sparse training data, we demonstrate the performance of the proposed unsupervised feature selection techniques in comparison to standard dimensionality reduction and supervised feature selection techniques on four commonly used benchmark datasets. The achieved classification results reveal that involving supervised feature selection techniques leads to similar classification results as involving unsupervised feature selection techniques, while the latter perform feature selection independently from the given classification task and thus deliver generally versatile features.


Author(s):  
Pengfei Zhang ◽  
Minzhou Dong ◽  
Junhong Duan

In order to improve the classifier classification accuracy of by using convolutional neural network training, a large amount of labeled data is often required, but sometimes labeled data is not easily obtained.This paper proposes a solution based on the idea of integrated GMM clustering and label delivery for classifying images with few labeled samples, assigning tags to unlabeled data through certain rules, and converting unlabeled data into labeled data for training of the model.In this paper, experiments are performed on hand-written digital recognition data sets. The results show that the present algorithm has a great improvement in the accuracy of model classification comparing with the method of using only labeled samples in the case of few labeled samples. The effectiveness of the present algorithm is validated.


Geophysics ◽  
2013 ◽  
Vol 78 (1) ◽  
pp. E41-E46 ◽  
Author(s):  
Laurens Beran ◽  
Barry Zelt ◽  
Leonard Pasion ◽  
Stephen Billings ◽  
Kevin Kingdon ◽  
...  

We have developed practical strategies for discriminating between buried unexploded ordnance (UXO) and metallic clutter. These methods are applicable to time-domain electromagnetic data acquired with multistatic, multicomponent sensors designed for UXO classification. Each detected target is characterized by dipole polarizabilities estimated via inversion of the observed sensor data. The polarizabilities are intrinsic target features and so are used to distinguish between UXO and clutter. We tested this processing with four data sets from recent field demonstrations, with each data set characterized by metrics of data and model quality. We then developed techniques for building a representative training data set and determined how the variable quality of estimated features affects overall classification performance. Finally, we devised a technique to optimize classification performance by adapting features during target prioritization.


2014 ◽  
Vol 2014 ◽  
pp. 1-13 ◽  
Author(s):  
Feng Hu ◽  
Xiao Liu ◽  
Jin Dai ◽  
Hong Yu

The classification problem for imbalance data is paid more attention to. So far, many significant methods are proposed and applied to many fields. But more efficient methods are needed still. Hypergraph may not be powerful enough to deal with the data in boundary region, although it is an efficient tool to knowledge discovery. In this paper, the neighborhood hypergraph is presented, combining rough set theory and hypergraph. After that, a novel classification algorithm for imbalance data based on neighborhood hypergraph is developed, which is composed of three steps: initialization of hyperedge, classification of training data set, and substitution of hyperedge. After conducting an experiment of 10-fold cross validation on 18 data sets, the proposed algorithm has higher average accuracy than others.


2020 ◽  
Vol 3 (2) ◽  
Author(s):  
Yoga Religia ◽  
Gatot Tri Pranoto ◽  
Egar Dika Santosa

Normally, most of the bank's wealth is obtained from providing credit loans so that a marketing bank must be able to reduce the risk of non-performing credit loans. The risk of providing loans can be minimized by studying patterns from existing lending data. One technique that can be used to solve this problem is to use data mining techniques. Data mining makes it possible to find hidden information from large data sets by way of classification. The Random Forest (RF) algorithm is a classification algorithm that can be used to deal with data imbalancing problems. The purpose of this study is to discuss the use of the RF algorithm for classification of South German Credit data. This research is needed because currently there is no previous research that applies the RF algorithm to classify South German Credit data specifically. Based on the tests that have been done, the optimal performance of the classification algorithm RF on South German Credit data is the comparison of training data of 85% and testing data of 15% with an accuracy of 78.33%.


With the advent of digital era, billions of the documents generate every day that need to be managed, processed and classified. Enormous size of text data is available on world wide web and other sources. As a first step of managing this mammoth data is the classification of available documents in right categories. Supervised machine learning approaches try to solve the problem of document classification but working on large data sets of heterogeneous classes is a big challenge. Automatic tagging and classification of the text document is a useful task due to its many potential applications such as classifying emails into spam or non-spam categories, news articles into political, entertainment, stock market, sports news, etc. The paper proposes a novel approach for classifying the text into known classes using an ensemble of refined Support Vector Machines. The advantage of proposed technique is that it can considerably reduce the size of the training data by adopting dimensionality reduction as pre-training step. The proposed technique has been used on three bench-marked data sets namely CMU Dataset, 20 Newsgroups Dataset, and Classic Dataset. Experimental results show that proposed approach is more accurate and efficient as compared to other state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document