scholarly journals Unsupervised Novelty Detection Using Deep Autoencoders with Density Based Clustering

2018 ◽  
Vol 8 (9) ◽  
pp. 1468 ◽  
Author(s):  
Tsatsral Amarbayasgalan ◽  
Bilguun Jargalsaikhan ◽  
Keun Ryu

Novelty detection is a classification problem to identify abnormal patterns; therefore, it is an important task for applications such as fraud detection, fault diagnosis and disease detection. However, when there is no label that indicates normal and abnormal data, it will need expensive domain and professional knowledge, so an unsupervised novelty detection approach will be used. On the other hand, nowadays, using novelty detection on high dimensional data is a big challenge and previous research suggests approaches based on principal component analysis (PCA) and an autoencoder in order to reduce dimensionality. In this paper, we propose deep autoencoders with density based clustering (DAE-DBC); this approach calculates compressed data and error threshold from deep autoencoder model, sending the results to a density based cluster. Points that are not involved in any groups are not considered a novelty; the grouping points will be defined as a novelty group depending on the ratio of the points exceeding the error threshold. We have conducted the experiment by substituting components to show that the components of the proposed method together are more effective. As a result of the experiment, the DAE-DBC approach is more efficient; its area under the curve (AUC) is shown to be 13.5 percent higher than state-of-the-art algorithms and other versions of the proposed method that we have demonstrated.

Diagnostics ◽  
2021 ◽  
Vol 11 (2) ◽  
pp. 220
Author(s):  
Paolo Frasconi ◽  
Daniele Baracchi ◽  
Betti Giusti ◽  
Ada Kura ◽  
Gaia Spaziani ◽  
...  

Background: To develop a tool for assessing normalcy of the thoracic aorta (TA) by echocardiography, based on either a linear regression model (Z-score), or a machine learning technique, namely one-class support vector machine (OC-SVM) (Q-score). Methods: TA diameters were measured in 1112 prospectively enrolled healthy subjects, aging 5 to 89 years. Considering sex, age and body surface area we developed two calculators based on the traditional Z-score and the novel Q-score. The calculators were compared in 198 adults with TA > 40 mm, and in 466 patients affected by either Marfan syndrome or bicuspid aortic valve (BAV). Results: Q-score attained a better Area Under the Curve (0.989; 95% CI 0.984–0.993, sensitivity = 97.5%, specificity = 95.4%) than Z-score (0.955; 95% CI 0.942–0.967, sensitivity = 81.3%, specificity = 93.3%; p < 0.0001) in patients with TA > 40 mm. The prevalence of TA dilatation in Marfan and BAV patients was higher as Z-score > 2 than as Q-score < 4% (73.4% vs. 50.09%, p < 0.00001). Conclusions: Q-score is a novel tool for assessing TA normalcy based on a model requiring less assumptions about the distribution of the relevant variables. Notably, diameters do not need to depend linearly on anthropometric measurements. Additionally, Q-score can capture the joint distribution of these variables with all four diameters simultaneously, thus accounting for the overall aortic shape. This approach results in a lower rate of predicted TA abnormalcy in patients at risk of TA aneurysm. Further prognostic studies will be necessary for assessing the relative effectiveness of Q-score versus Z-score.


Sensors ◽  
2021 ◽  
Vol 21 (10) ◽  
pp. 3536
Author(s):  
Jakub Górski ◽  
Adam Jabłoński ◽  
Mateusz Heesch ◽  
Michał Dziendzikowski ◽  
Ziemowit Dworakowski

Condition monitoring is an indispensable element related to the operation of rotating machinery. In this article, the monitoring system for the parallel gearbox was proposed. The novelty detection approach is used to develop the condition assessment support system, which requires data collection for a healthy structure. The measured signals were processed to extract quantitative indicators sensitive to the type of damage occurring in this type of structure. The indicator’s values were used for the development of four different novelty detection algorithms. Presented novelty detection models operate on three principles: feature space distance, probability distribution, and input reconstruction. One of the distance-based models is adaptive, adjusting to new data flowing in the form of a stream. The authors test the developed algorithms on experimental and simulation data with a similar distribution, using the training set consisting mainly of samples generated by the simulator. Presented in the article results demonstrate the effectiveness of the trained models on both data sets.


2021 ◽  
Vol 11 (11) ◽  
pp. 5123
Author(s):  
Maiada M. Mahmoud ◽  
Nahla A. Belal ◽  
Aliaa Youssif

Transcription factors (TFs) are proteins that control the transcription of a gene from DNA to messenger RNA (mRNA). TFs bind to a specific DNA sequence called a binding site. Transcription factor binding sites have not yet been completely identified, and this is considered to be a challenge that could be approached computationally. This challenge is considered to be a classification problem in machine learning. In this paper, the prediction of transcription factor binding sites of SP1 on human chromosome1 is presented using different classification techniques, and a model using voting is proposed. The highest Area Under the Curve (AUC) achieved is 0.97 using K-Nearest Neighbors (KNN), and 0.95 using the proposed voting technique. However, the proposed voting technique is more efficient with noisy data. This study highlights the applicability of the voting technique for the prediction of binding sites, and highlights the outperformance of KNN on this type of data. The study also highlights the significance of using voting.


Cancers ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 1407
Author(s):  
Matyas Bukva ◽  
Gabriella Dobra ◽  
Juan Gomez-Perez ◽  
Krisztian Koos ◽  
Maria Harmati ◽  
...  

Investigating the molecular composition of small extracellular vesicles (sEVs) for tumor diagnostic purposes is becoming increasingly popular, especially for diseases for which diagnosis is challenging, such as central nervous system (CNS) malignancies. Thorough examination of the molecular content of sEVs by Raman spectroscopy is a promising but hitherto barely explored approach for these tumor types. We attempt to reveal the potential role of serum-derived sEVs in diagnosing CNS tumors through Raman spectroscopic analyses using a relevant number of clinical samples. A total of 138 serum samples were obtained from four patient groups (glioblastoma multiforme, non-small-cell lung cancer brain metastasis, meningioma and lumbar disc herniation as control). After isolation, characterization and Raman spectroscopic assessment of sEVs, the Principal Component Analysis–Support Vector Machine (PCA–SVM) algorithm was performed on the Raman spectra for pairwise classifications. Classification accuracy (CA), sensitivity, specificity and the Area Under the Curve (AUC) value derived from Receiver Operating Characteristic (ROC) analyses were used to evaluate the performance of classification. The groups compared were distinguishable with 82.9–92.5% CA, 80–95% sensitivity and 80–90% specificity. AUC scores in the range of 0.82–0.9 suggest excellent and outstanding classification performance. Our results support that Raman spectroscopic analysis of sEV-enriched isolates from serum is a promising method that could be further developed in order to be applicable in the diagnosis of CNS tumors.


2018 ◽  
Vol 27 (07) ◽  
pp. 1860013 ◽  
Author(s):  
Swair Shah ◽  
Baokun He ◽  
Crystal Maung ◽  
Haim Schweitzer

Principal Component Analysis (PCA) is a classical dimensionality reduction technique that computes a low rank representation of the data. Recent studies have shown how to compute this low rank representation from most of the data, excluding a small amount of outlier data. We show how to convert this problem into graph search, and describe an algorithm that solves this problem optimally by applying a variant of the A* algorithm to search for the outliers. The results obtained by our algorithm are optimal in terms of accuracy, and are shown to be more accurate than results obtained by the current state-of-the- art algorithms which are shown not to be optimal. This comes at the cost of running time, which is typically slower than the current state of the art. We also describe a related variant of the A* algorithm that runs much faster than the optimal variant and produces a solution that is guaranteed to be near the optimal. This variant is shown experimentally to be more accurate than the current state-of-the-art and has a comparable running time.


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Yuanyuan Xu ◽  
Genke Yang ◽  
Jiliang Luo ◽  
Jianan He

Electronic component recognition plays an important role in industrial production, electronic manufacturing, and testing. In order to address the problem of the low recognition recall and accuracy of traditional image recognition technologies (such as principal component analysis (PCA) and support vector machine (SVM)), this paper selects multiple deep learning networks for testing and optimizes the SqueezeNet network. The paper then presents an electronic component recognition algorithm based on the Faster SqueezeNet network. This structure can reduce the size of network parameters and computational complexity without deteriorating the performance of the network. The results show that the proposed algorithm performs well, where the Receiver Operating Characteristic Curve (ROC) and Area Under the Curve (AUC), capacitor and inductor, reach 1.0. When the FPR is less than or equal 10 − 6   level, the TPR is greater than or equal to 0.99; its reasoning time is about 2.67 ms, achieving the industrial application level in terms of time consumption and performance.


Sign in / Sign up

Export Citation Format

Share Document