Identification of Outliers in Hyperspectral Raman Image Data by Nearest Neighbor Comparison

2002 ◽  
Vol 56 (11) ◽  
pp. 1458-1461 ◽  
Author(s):  
Caleb J. Behrend ◽  
Catherine P. Tarnowski ◽  
Michael D. Morris

A new algorithm for removal of cosmic spikes from hyperspectral Raman image data sets is presented. Spectra in a 3 × 3 pixel neighborhood are used to identify outlier-contaminated data points in the central pixel of that neighborhood. A preliminary despiking of the neighboring spectra is performed by median filtering. Correlations between the central pixel spectrum and its despiked neighbors are calculated, and the most highly correlated spectrum is used to identify outliers. Spike-contaminated data are replaced using results of polynomial interpolation. Because the neighborhood contains spectra obtained in three different frames, even large multi-pixel spikes are identified. Spatial, spectral, and temporal variation in signal is used to accurately identify outliers without the acquisition of any spectra other than those needed to generate the image itself. Sharp boundaries between regions of high chemical contrast do not interfere with outlier identification.

2021 ◽  
Vol 87 (6) ◽  
pp. 445-455
Author(s):  
Yi Ma ◽  
Zezhong Zheng ◽  
Yutang Ma ◽  
Mingcang Zhu ◽  
Ran Huang ◽  
...  

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.


2021 ◽  
Vol 7 (12) ◽  
pp. 254
Author(s):  
Loris Nanni ◽  
Michelangelo Paci ◽  
Sheryl Brahnam ◽  
Alessandra Lumini

Convolutional neural networks (CNNs) have gained prominence in the research literature on image classification over the last decade. One shortcoming of CNNs, however, is their lack of generalizability and tendency to overfit when presented with small training sets. Augmentation directly confronts this problem by generating new data points providing additional information. In this paper, we investigate the performance of more than ten different sets of data augmentation methods, with two novel approaches proposed here: one based on the discrete wavelet transform and the other on the constant-Q Gabor transform. Pretrained ResNet50 networks are finetuned on each augmentation method. Combinations of these networks are evaluated and compared across four benchmark data sets of images representing diverse problems and collected by instruments that capture information at different scales: a virus data set, a bark data set, a portrait dataset, and a LIGO glitches data set. Experiments demonstrate the superiority of this approach. The best ensemble proposed in this work achieves state-of-the-art (or comparable) performance across all four data sets. This result shows that varying data augmentation is a feasible way for building an ensemble of classifiers for image classification.


2009 ◽  
Vol 18 (06) ◽  
pp. 883-904
Author(s):  
YUN LI ◽  
BAO-LIANG LU ◽  
TENG-FEI ZHANG

Principal components analysis (PCA) is a popular linear feature extractor, and widely used in signal processing, face recognition, etc. However, axes of the lower-dimensional space, i.e., principal components, are a set of new variables carrying no clear physical meanings. Thus we propose unsupervised feature selection algorithms based on eigenvectors analysis to identify critical original features for principal component. The presented algorithms are based on k-nearest neighbor rule to find the predominant row components and eight new measures are proposed to compute the correlation between row components in transformation matrix. Experiments are conducted on benchmark data sets and facial image data sets for gender classification to show their superiorities.


Author(s):  
JIE JI ◽  
QIANGFU ZHAO

This paper proposes a hybrid learning method to speed up the classification procedure of Support Vector Machines (SVM). Comparing most algorithms trying to decrease the support vectors in an SVM classifier, we focus on reducing the data points that need SVM for classification, and reduce the number of support vectors for each SVM classification. The system uses a Nearest Neighbor Classifier (NNC) to treat data points attentively. In the training phase, the NNC selects data near partial decision boundary, and then trains sub SVM for each Voronoi pair. For classification, most non-boundary data points are classified by NNC directly, while remaining boundary data points are passed to a corresponding local expert SVM. We also propose a data selection method for training reliable expert SVM. Experimental results on several generated and public machine learning data sets show that the proposed method significantly accelerates the testing speed.


Author(s):  
FULIN LUO ◽  
JIAMIN LIU ◽  
HONG HUANG ◽  
YUMEI LIU

Locally linear embedding (LLE) depends on the Euclidean distance (ED) to select the k-nearest neighbors. However, the ED may not reflect the actual geometry structure of data, which may lead to the selection of ineffective neighbors. The aim of our work is to make full use of the local spectral angle (LSA) to find proper neighbors for dimensionality reduction (DR) and classification of hyperspectral remote sensing data. At first, we propose an improved LLE method, called local spectral angle LLE (LSA-LLE), for DR. It uses the ED of data to obtain large-scale neighbors, then utilizes the spectral angle to get the exact neighbors in the large-scale neighbors. Furthermore, a local spectral angle-based nearest neighbor classifier (LSANN) has been proposed for classification. Experiments on two hyperspectral image data sets demonstrate the effectiveness of the presented methods.


2019 ◽  
Vol 2019 ◽  
pp. 1-17
Author(s):  
Shaodi Ge ◽  
Hongjun Li ◽  
Liuhong Luo

Coclustering approaches for grouping data points and features have recently been receiving extensive attention. In this paper, we propose a constrained dual graph regularized orthogonal nonnegative matrix trifactorization (CDONMTF) algorithm to solve the coclustering problems. The new method improves the clustering performance obviously by employing hard constraints to retain the priori label information of samples, establishing two nearest neighbor graphs to encode the geometric structure of data manifold and feature manifold, and combining with biorthogonal constraints as well. In addition, we have also derived the iterative optimization scheme of CDONMTF and proved its convergence. Clustering experiments on 5 UCI machine-learning data sets and 7 image benchmark data sets show that the achievement of the proposed algorithm is superior to that of some existing clustering algorithms.


Author(s):  
Loris Nanni ◽  
Michelangelo Paci ◽  
Sheryl Brahnam ◽  
Alessandra Lumini

Convolutional Neural Networks (CNNs) have gained prominence in the research literature on image classification over the last decade. One shortcoming of CNNs, however, is their lack of generalizability and tendency to overfit when presented with small training sets. Augmentation directly confronts this problem by generating new data points providing additional information. In this paper, we investigate the performance of more than ten different sets of data augmentation methods, with two novel approaches proposed here: one based on the Discrete Wavelet Transform and the other on the Constant-Q Gabor transform. Pretrained ResNet50 networks are finetuned on each augmentation method. Combinations of these networks are evaluated and compared across three benchmark data sets of images representing diverse problems and collected by instruments that capture information at different scales: a virus data set, a bark data set, and a LIGO glitches data set. Experiments demonstrate the superiority of this approach. The best ensemble proposed in this work achieves state-of-the-art performance across all three data sets. This result shows that varying data augmentation is a feasible way for building an ensemble of classifiers for image classification (code available at https://github.com/LorisNanni).


2021 ◽  
Vol 18 (1) ◽  
pp. 172988142199334
Author(s):  
Guangchao Zhang ◽  
Junrong Liu

With the urgent demand of consumers for diversified automobile modeling, simple, efficient, and intelligent automobile modeling analysis and modeling method is an urgent problem to be solved in current automobile modeling design. The purpose of this article is to analyze the modeling preference and trend of the current automobile market in time, which can assist the modeling design of new models of automobile main engine factories and strengthen their branding family. Intelligent rapid modeling shortens the current modeling design cycle, so that the product rapid iteration is to occupy an active position in the automotive market. In this article, aiming at the family analysis of automobile front face, the image database of automobile front face modeling analysis was created. The database included two data sets of vehicle signs and no vehicle signs, and the image data of vehicle front face modeling of most models of 22 domestic mainstream brands were collected. Then, this article adopts the image classification processing method in computer vision to conduct car brand classification training on the database. Based on ResNet-8 and other model architectures, it trains and classifies the intelligent vehicle brand classification database with and without vehicle label. Finally, based on the shape coefficient, a 3D wireframe model and a curved surface model are obtained. The experimental results show that the 3D curve model can be obtained based on a single image from any angle, which greatly shortens the modeling period by 92%.


Author(s):  
Daniel Overhoff ◽  
Peter Kohlmann ◽  
Alex Frydrychowicz ◽  
Sergios Gatidis ◽  
Christian Loewe ◽  
...  

Purpose The DRG-ÖRG IRP (Deutsche Röntgengesellschaft-Österreichische Röntgengesellschaft international radiomics platform) represents a web-/cloud-based radiomics platform based on a public-private partnership. It offers the possibility of data sharing, annotation, validation and certification in the field of artificial intelligence, radiomics analysis, and integrated diagnostics. In a first proof-of-concept study, automated myocardial segmentation and automated myocardial late gadolinum enhancement (LGE) detection using radiomic image features will be evaluated for myocarditis data sets. Materials and Methods The DRG-ÖRP IRP can be used to create quality-assured, structured image data in combination with clinical data and subsequent integrated data analysis and is characterized by the following performance criteria: Possibility of using multicentric networked data, automatically calculated quality parameters, processing of annotation tasks, contour recognition using conventional and artificial intelligence methods and the possibility of targeted integration of algorithms. In a first study, a neural network pre-trained using cardiac CINE data sets was evaluated for segmentation of PSIR data sets. In a second step, radiomic features were applied for segmental detection of LGE of the same data sets, which were provided multicenter via the IRP. Results First results show the advantages (data transparency, reliability, broad involvement of all members, continuous evolution as well as validation and certification) of this platform-based approach. In the proof-of-concept study, the neural network demonstrated a Dice coefficient of 0.813 compared to the expert's segmentation of the myocardium. In the segment-based myocardial LGE detection, the AUC was 0.73 and 0.79 after exclusion of segments with uncertain annotation.The evaluation and provision of the data takes place at the IRP, taking into account the FAT (fairness, accountability, transparency) and FAIR (findable, accessible, interoperable, reusable) criteria. Conclusion It could be shown that the DRG-ÖRP IRP can be used as a crystallization point for the generation of further individual and joint projects. The execution of quantitative analyses with artificial intelligence methods is greatly facilitated by the platform approach of the DRG-ÖRP IRP, since pre-trained neural networks can be integrated and scientific groups can be networked.In a first proof-of-concept study on automated segmentation of the myocardium and automated myocardial LGE detection, these advantages were successfully applied.Our study shows that with the DRG-ÖRP IRP, strategic goals can be implemented in an interdisciplinary way, that concrete proof-of-concept examples can be demonstrated, and that a large number of individual and joint projects can be realized in a participatory way involving all groups. Key Points:  Citation Format


Sign in / Sign up

Export Citation Format

Share Document