Unsupervised Novelty Detection Using Deep Autoencoders with Density Based Clustering

Novelty detection is a classification problem to identify abnormal patterns; therefore, it is an important task for applications such as fraud detection, fault diagnosis and disease detection. However, when there is no label that indicates normal and abnormal data, it will need expensive domain and professional knowledge, so an unsupervised novelty detection approach will be used. On the other hand, nowadays, using novelty detection on high dimensional data is a big challenge and previous research suggests approaches based on principal component analysis (PCA) and an autoencoder in order to reduce dimensionality. In this paper, we propose deep autoencoders with density based clustering (DAE-DBC); this approach calculates compressed data and error threshold from deep autoencoder model, sending the results to a density based cluster. Points that are not involved in any groups are not considered a novelty; the grouping points will be defined as a novelty group depending on the ratio of the points exceeding the error threshold. We have conducted the experiment by substituting components to show that the components of the proposed method together are more effective. As a result of the experiment, the DAE-DBC approach is more efficient; its area under the curve (AUC) is shown to be 13.5 percent higher than state-of-the-art algorithms and other versions of the proposed method that we have demonstrated.

Download Full-text

Two-Dimensional Aortic Size Normalcy: A Novelty Detection Approach

Diagnostics ◽

10.3390/diagnostics11020220 ◽

2021 ◽

Vol 11 (2) ◽

pp. 220

Author(s):

Paolo Frasconi ◽

Daniele Baracchi ◽

Betti Giusti ◽

Ada Kura ◽

Gaia Spaziani ◽

...

Keyword(s):

Novelty Detection ◽

Relative Effectiveness ◽

Area Under The Curve ◽

Support Vector ◽

The Novel ◽

Z Score ◽

Detection Approach ◽

Relevant Variables ◽

Patients At Risk ◽

Aortic Size

Background: To develop a tool for assessing normalcy of the thoracic aorta (TA) by echocardiography, based on either a linear regression model (Z-score), or a machine learning technique, namely one-class support vector machine (OC-SVM) (Q-score). Methods: TA diameters were measured in 1112 prospectively enrolled healthy subjects, aging 5 to 89 years. Considering sex, age and body surface area we developed two calculators based on the traditional Z-score and the novel Q-score. The calculators were compared in 198 adults with TA > 40 mm, and in 466 patients affected by either Marfan syndrome or bicuspid aortic valve (BAV). Results: Q-score attained a better Area Under the Curve (0.989; 95% CI 0.984–0.993, sensitivity = 97.5%, specificity = 95.4%) than Z-score (0.955; 95% CI 0.942–0.967, sensitivity = 81.3%, specificity = 93.3%; p < 0.0001) in patients with TA > 40 mm. The prevalence of TA dilatation in Marfan and BAV patients was higher as Z-score > 2 than as Q-score < 4% (73.4% vs. 50.09%, p < 0.00001). Conclusions: Q-score is a novel tool for assessing TA normalcy based on a model requiring less assumptions about the distribution of the relevant variables. Notably, diameters do not need to depend linearly on anthropometric measurements. Additionally, Q-score can capture the joint distribution of these variables with all four diameters simultaneously, thus accounting for the overall aortic shape. This approach results in a lower rate of predicted TA abnormalcy in patients at risk of TA aneurysm. Further prognostic studies will be necessary for assessing the relative effectiveness of Q-score versus Z-score.

Download Full-text

An Unsupervised Change Detection Approach for Remote Sensing Image Using Principal Component Analysis and Genetic Algorithm

Lecture Notes in Computer Science - Image and Graphics ◽

10.1007/978-3-319-21978-3_52 ◽

2015 ◽

pp. 589-602 ◽

Cited By ~ 3

Author(s):

Lin Wu ◽

Yunhong Wang ◽

Jiangtao Long ◽

Zhisheng Liu

Keyword(s):

Remote Sensing ◽

Genetic Algorithm ◽

Principal Component Analysis ◽

Change Detection ◽

Principal Component ◽

Remote Sensing Image ◽

Component Analysis ◽

Detection Approach

Download Full-text

Comparison of Novelty Detection Methods for Detection of Various Rotary Machinery Faults

Sensors ◽

10.3390/s21103536 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3536

Author(s):

Jakub Górski ◽

Adam Jabłoński ◽

Mateusz Heesch ◽

Michał Dziendzikowski ◽

Ziemowit Dworakowski

Keyword(s):

Novelty Detection ◽

Feature Space ◽

Detection Methods ◽

Data Sets ◽

Similar Distribution ◽

Detection Algorithms ◽

Rotary Machinery ◽

Detection Approach ◽

Input Reconstruction ◽

Quantitative Indicators

Condition monitoring is an indispensable element related to the operation of rotating machinery. In this article, the monitoring system for the parallel gearbox was proposed. The novelty detection approach is used to develop the condition assessment support system, which requires data collection for a healthy structure. The measured signals were processed to extract quantitative indicators sensitive to the type of damage occurring in this type of structure. The indicator’s values were used for the development of four different novelty detection algorithms. Presented novelty detection models operate on three principles: feature space distance, probability distribution, and input reconstruction. One of the distance-based models is adaptive, adjusting to new data flowing in the form of a stream. The authors test the developed algorithms on experimental and simulation data with a similar distribution, using the training set consisting mainly of samples generated by the simulator. Presented in the article results demonstrate the effectiveness of the trained models on both data sets.

Download Full-text

Prediction of Transcription Factor Binding Sites of SP1 on Human Chromosome1

Applied Sciences ◽

10.3390/app11115123 ◽

2021 ◽

Vol 11 (11) ◽

pp. 5123

Author(s):

Maiada M. Mahmoud ◽

Nahla A. Belal ◽

Aliaa Youssif

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

Messenger Rna ◽

Area Under The Curve ◽

Noisy Data ◽

Transcription Factor Binding Sites ◽

Classification Problem ◽

Transcription Factor Binding ◽

K Nearest Neighbors ◽

Factor Binding

Transcription factors (TFs) are proteins that control the transcription of a gene from DNA to messenger RNA (mRNA). TFs bind to a specific DNA sequence called a binding site. Transcription factor binding sites have not yet been completely identified, and this is considered to be a challenge that could be approached computationally. This challenge is considered to be a classification problem in machine learning. In this paper, the prediction of transcription factor binding sites of SP1 on human chromosome1 is presented using different classification techniques, and a model using voting is proposed. The highest Area Under the Curve (AUC) achieved is 0.97 using K-Nearest Neighbors (KNN), and 0.95 using the proposed voting technique. However, the proposed voting technique is more efficient with noisy data. This study highlights the applicability of the voting technique for the prediction of binding sites, and highlights the outperformance of KNN on this type of data. The study also highlights the significance of using voting.

Download Full-text

Longitudinal Crack Detection Approach Based on Principal Component Analysis and Support Vector Machine for Slab Continuous Casting

steel research international ◽

10.1002/srin.202100168 ◽

2021 ◽

Author(s):

Haiyang Duan ◽

Jingjing Wei ◽

Lin Qi ◽

Xudong Wang ◽

Yu Liu ◽

...

Keyword(s):

Principal Component Analysis ◽

Support Vector Machine ◽

Continuous Casting ◽

Crack Detection ◽

Longitudinal Crack ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Slab Continuous Casting ◽

Detection Approach

Download Full-text

Raman Spectral Signatures of Serum-Derived Extracellular Vesicle-Enriched Isolates May Support the Diagnosis of CNS Tumors

Cancers ◽

10.3390/cancers13061407 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1407

Author(s):

Matyas Bukva ◽

Gabriella Dobra ◽

Juan Gomez-Perez ◽

Krisztian Koos ◽

Maria Harmati ◽

...

Keyword(s):

Lumbar Disc ◽

Area Under The Curve ◽

Principal Component ◽

Extracellular Vesicle ◽

Classification Performance ◽

Cns Tumors ◽

Clinical Samples ◽

Support Vector ◽

Serum Samples ◽

Raman Spectroscopic

Investigating the molecular composition of small extracellular vesicles (sEVs) for tumor diagnostic purposes is becoming increasingly popular, especially for diseases for which diagnosis is challenging, such as central nervous system (CNS) malignancies. Thorough examination of the molecular content of sEVs by Raman spectroscopy is a promising but hitherto barely explored approach for these tumor types. We attempt to reveal the potential role of serum-derived sEVs in diagnosing CNS tumors through Raman spectroscopic analyses using a relevant number of clinical samples. A total of 138 serum samples were obtained from four patient groups (glioblastoma multiforme, non-small-cell lung cancer brain metastasis, meningioma and lumbar disc herniation as control). After isolation, characterization and Raman spectroscopic assessment of sEVs, the Principal Component Analysis–Support Vector Machine (PCA–SVM) algorithm was performed on the Raman spectra for pairwise classifications. Classification accuracy (CA), sensitivity, specificity and the Area Under the Curve (AUC) value derived from Receiver Operating Characteristic (ROC) analyses were used to evaluate the performance of classification. The groups compared were distinguishable with 82.9–92.5% CA, 80–95% sensitivity and 80–90% specificity. AUC scores in the range of 0.82–0.9 suggest excellent and outstanding classification performance. Our results support that Raman spectroscopic analysis of sEV-enriched isolates from serum is a promising method that could be further developed in order to be applicable in the diagnosis of CNS tumors.

Download Full-text

Computing Robust Principal Components by A* Search

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213018600138 ◽

2018 ◽

Vol 27 (07) ◽

pp. 1860013 ◽

Cited By ~ 1

Author(s):

Swair Shah ◽

Baokun He ◽

Crystal Maung ◽

Haim Schweitzer

Keyword(s):

State Of The Art ◽

Principal Component ◽

Low Rank ◽

A Algorithm ◽

Running Time ◽

Current State ◽

Dimensionality Reduction Technique ◽

Related Variant ◽

Low Rank Representation ◽

The Cost

Principal Component Analysis (PCA) is a classical dimensionality reduction technique that computes a low rank representation of the data. Recent studies have shown how to compute this low rank representation from most of the data, excluding a small amount of outlier data. We show how to convert this problem into graph search, and describe an algorithm that solves this problem optimally by applying a variant of the A* algorithm to search for the outliers. The results obtained by our algorithm are optimal in terms of accuracy, and are shown to be more accurate than results obtained by the current state-of-the- art algorithms which are shown not to be optimal. This comes at the cost of running time, which is typically slower than the current state of the art. We also describe a related variant of the A* algorithm that runs much faster than the optimal variant and produces a solution that is guaranteed to be near the optimal. This variant is shown experimentally to be more accurate than the current state-of-the-art and has a comparable running time.

Download Full-text

Novelty Detection Approach for Keystroke Dynamics Identity Verification

Intelligent Data Engineering and Automated Learning - Lecture Notes in Computer Science ◽

10.1007/978-3-540-45080-1_143 ◽

2003 ◽

pp. 1016-1023 ◽

Cited By ~ 23

Author(s):

Enzhe Yu ◽

Sungzoon Cho

Keyword(s):

Novelty Detection ◽

Keystroke Dynamics ◽

Identity Verification ◽

Detection Approach

Download Full-text

An Electronic Component Recognition Algorithm Based on Deep Learning with a Faster SqueezeNet

Mathematical Problems in Engineering ◽

10.1155/2020/2940286 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Yuanyuan Xu ◽

Genke Yang ◽

Jiliang Luo ◽

Jianan He

Keyword(s):

Deep Learning ◽

Characteristic Curve ◽

Area Under The Curve ◽

Principal Component ◽

Recognition Algorithm ◽

Electronic Component ◽

Support Vector ◽

Learning Networks ◽

Component Recognition ◽

And Performance

Electronic component recognition plays an important role in industrial production, electronic manufacturing, and testing. In order to address the problem of the low recognition recall and accuracy of traditional image recognition technologies (such as principal component analysis (PCA) and support vector machine (SVM)), this paper selects multiple deep learning networks for testing and optimizes the SqueezeNet network. The paper then presents an electronic component recognition algorithm based on the Faster SqueezeNet network. This structure can reduce the size of network parameters and computational complexity without deteriorating the performance of the network. The results show that the proposed algorithm performs well, where the Receiver Operating Characteristic Curve (ROC) and Area Under the Curve (AUC), capacitor and inductor, reach 1.0. When the FPR is less than or equal 10 − 6 level, the TPR is greater than or equal to 0.99; its reasoning time is about 2.67 ms, achieving the industrial application level in terms of time consumption and performance.

Download Full-text