Stacked Convolutional Sparse Auto-Encoders for Representation Learning

2021 ◽  
Vol 15 (2) ◽  
pp. 1-21
Author(s):  
Yi Zhu ◽  
Lei Li ◽  
Xindong Wu

Deep learning seeks to achieve excellent performance for representation learning in image datasets. However, supervised deep learning models such as convolutional neural networks require a large number of labeled image data, which is intractable in applications, while unsupervised deep learning models like stacked denoising auto-encoder cannot employ label information. Meanwhile, the redundancy of image data incurs performance degradation on representation learning for aforementioned models. To address these problems, we propose a semi-supervised deep learning framework called stacked convolutional sparse auto-encoder, which can learn robust and sparse representations from image data with fewer labeled data records. More specifically, the framework is constructed by stacking layers. In each layer, higher layer feature representations are generated by features of lower layers in a convolutional way with kernels learned by a sparse auto-encoder. Meanwhile, to solve the data redundance problem, the algorithm of Reconstruction Independent Component Analysis is designed to train on patches for sphering the input data. The label information is encoded using a Softmax Regression model for semi-supervised learning. With this framework, higher level representations are learned by layers mapping from image data. It can boost the performance of the base subsequent classifiers such as support vector machines. Extensive experiments demonstrate the superior classification performance of our framework compared to several state-of-the-art representation learning methods.

Author(s):  
Yuejun Liu ◽  
Yifei Xu ◽  
Xiangzheng Meng ◽  
Xuguang Wang ◽  
Tianxu Bai

Background: Medical imaging plays an important role in the diagnosis of thyroid diseases. In the field of machine learning, multiple dimensional deep learning algorithms are widely used in image classification and recognition, and have achieved great success. Objective: The method based on multiple dimensional deep learning is employed for the auxiliary diagnosis of thyroid diseases based on SPECT images. The performances of different deep learning models are evaluated and compared. Methods: Thyroid SPECT images are collected with three types, they are hyperthyroidism, normal and hypothyroidism. In the pre-processing, the region of interest of thyroid is segmented and the amount of data sample is expanded. Four CNN models, including CNN, Inception, VGG16 and RNN, are used to evaluate deep learning methods. Results: Deep learning based methods have good classification performance, the accuracy is 92.9%-96.2%, AUC is 97.8%-99.6%. VGG16 model has the best performance, the accuracy is 96.2% and AUC is 99.6%. Especially, the VGG16 model with a changing learning rate works best. Conclusion: The standard CNN, Inception, VGG16, and RNN four deep learning models are efficient for the classification of thyroid diseases with SPECT images. The accuracy of the assisted diagnostic method based on deep learning is higher than that of other methods reported in the literature.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2611
Author(s):  
Andrew Shepley ◽  
Greg Falzon ◽  
Christopher Lawson ◽  
Paul Meek ◽  
Paul Kwan

Image data is one of the primary sources of ecological data used in biodiversity conservation and management worldwide. However, classifying and interpreting large numbers of images is time and resource expensive, particularly in the context of camera trapping. Deep learning models have been used to achieve this task but are often not suited to specific applications due to their inability to generalise to new environments and inconsistent performance. Models need to be developed for specific species cohorts and environments, but the technical skills required to achieve this are a key barrier to the accessibility of this technology to ecologists. Thus, there is a strong need to democratize access to deep learning technologies by providing an easy-to-use software application allowing non-technical users to train custom object detectors. U-Infuse addresses this issue by providing ecologists with the ability to train customised models using publicly available images and/or their own images without specific technical expertise. Auto-annotation and annotation editing functionalities minimize the constraints of manually annotating and pre-processing large numbers of images. U-Infuse is a free and open-source software solution that supports both multiclass and single class training and object detection, allowing ecologists to access deep learning technologies usually only available to computer scientists, on their own device, customised for their application, without sharing intellectual property or sensitive data. It provides ecological practitioners with the ability to (i) easily achieve object detection within a user-friendly GUI, generating a species distribution report, and other useful statistics, (ii) custom train deep learning models using publicly available and custom training data, (iii) achieve supervised auto-annotation of images for further training, with the benefit of editing annotations to ensure quality datasets. Broad adoption of U-Infuse by ecological practitioners will improve ecological image analysis and processing by allowing significantly more image data to be processed with minimal expenditure of time and resources, particularly for camera trap images. Ease of training and use of transfer learning means domain-specific models can be trained rapidly, and frequently updated without the need for computer science expertise, or data sharing, protecting intellectual property and privacy.


2021 ◽  
Vol 11 (9) ◽  
pp. 4233
Author(s):  
Biprodip Pal ◽  
Debashis Gupta ◽  
Md. Rashed-Al-Mahfuz ◽  
Salem A. Alyami ◽  
Mohammad Ali Moni

The COVID-19 pandemic requires the rapid isolation of infected patients. Thus, high-sensitivity radiology images could be a key technique to diagnose patients besides the polymerase chain reaction approach. Deep learning algorithms are proposed in several studies to detect COVID-19 symptoms due to the success in chest radiography image classification, cost efficiency, lack of expert radiologists, and the need for faster processing in the pandemic area. Most of the promising algorithms proposed in different studies are based on pre-trained deep learning models. Such open-source models and lack of variation in the radiology image-capturing environment make the diagnosis system vulnerable to adversarial attacks such as fast gradient sign method (FGSM) attack. This study therefore explored the potential vulnerability of pre-trained convolutional neural network algorithms to the FGSM attack in terms of two frequently used models, VGG16 and Inception-v3. Firstly, we developed two transfer learning models for X-ray and CT image-based COVID-19 classification and analyzed the performance extensively in terms of accuracy, precision, recall, and AUC. Secondly, our study illustrates that misclassification can occur with a very minor perturbation magnitude, such as 0.009 and 0.003 for the FGSM attack in these models for X-ray and CT images, respectively, without any effect on the visual perceptibility of the perturbation. In addition, we demonstrated that successful FGSM attack can decrease the classification performance to 16.67% and 55.56% for X-ray images, as well as 36% and 40% in the case of CT images for VGG16 and Inception-v3, respectively, without any human-recognizable perturbation effects in the adversarial images. Finally, we analyzed that correct class probability of any test image which is supposed to be 1, can drop for both considered models and with increased perturbation; it can drop to 0.24 and 0.17 for the VGG16 model in cases of X-ray and CT images, respectively. Thus, despite the need for data sharing and automated diagnosis, practical deployment of such program requires more robustness.


2019 ◽  
Vol 73 (5) ◽  
pp. 565-573 ◽  
Author(s):  
Yun Zhao ◽  
Mahamed Lamine Guindo ◽  
Xing Xu ◽  
Miao Sun ◽  
Jiyu Peng ◽  
...  

In this study, a method based on laser-induced breakdown spectroscopy (LIBS) was developed to detect soil contaminated with Pb. Different levels of Pb were added to soil samples in which tobacco was planted over a period of two to four weeks. Principal component analysis and deep learning with a deep belief network (DBN) were implemented to classify the LIBS data. The robustness of the method was verified through a comparison with the results of a support vector machine and partial least squares discriminant analysis. A confusion matrix of the different algorithms shows that the DBN achieved satisfactory classification performance on all samples of contaminated soil. In terms of classification, the proposed method performed better on samples contaminated for four weeks than on those contaminated for two weeks. The results show that LIBS can be used with deep learning for the detection of heavy metals in soil.


Sensors ◽  
2019 ◽  
Vol 19 (1) ◽  
pp. 210 ◽  
Author(s):  
Zied Tayeb ◽  
Juri Fedjaev ◽  
Nejla Ghaboosi ◽  
Christoph Richter ◽  
Lukas Everding ◽  
...  

Non-invasive, electroencephalography (EEG)-based brain-computer interfaces (BCIs) on motor imagery movements translate the subject’s motor intention into control signals through classifying the EEG patterns caused by different imagination tasks, e.g., hand movements. This type of BCI has been widely studied and used as an alternative mode of communication and environmental control for disabled patients, such as those suffering from a brainstem stroke or a spinal cord injury (SCI). Notwithstanding the success of traditional machine learning methods in classifying EEG signals, these methods still rely on hand-crafted features. The extraction of such features is a difficult task due to the high non-stationarity of EEG signals, which is a major cause by the stagnating progress in classification performance. Remarkable advances in deep learning methods allow end-to-end learning without any feature engineering, which could benefit BCI motor imagery applications. We developed three deep learning models: (1) A long short-term memory (LSTM); (2) a spectrogram-based convolutional neural network model (CNN); and (3) a recurrent convolutional neural network (RCNN), for decoding motor imagery movements directly from raw EEG signals without (any manual) feature engineering. Results were evaluated on our own publicly available, EEG data collected from 20 subjects and on an existing dataset known as 2b EEG dataset from “BCI Competition IV”. Overall, better classification performance was achieved with deep learning models compared to state-of-the art machine learning techniques, which could chart a route ahead for developing new robust techniques for EEG signal decoding. We underpin this point by demonstrating the successful real-time control of a robotic arm using our CNN based BCI.


2020 ◽  
Vol 5 (2) ◽  
pp. 212
Author(s):  
Hamdi Ahmad Zuhri ◽  
Nur Ulfa Maulidevi

Review ranking is useful to give users a better experience. Review ranking studies commonly use upvote value, which does not represent urgency, and it causes problems in prediction. In contrast, manual labeling as wide as the upvote value range provides a high bias and inconsistency. The proposed solution is to use a classification approach to rank the review where the labels are ordinal urgency class. The experiment involved shallow learning models (Logistic Regression, Naïve Bayesian, Support Vector Machine, and Random Forest), and deep learning models (LSTM and CNN). In constructing a classification model, the problem is broken down into several binary classifications that predict tendencies of urgency depending on the separation of classes. The result shows that deep learning models outperform other models in classification dan ranking evaluation. In addition, the review data used tend to contain vocabulary of certain product domains, so further research is needed on data with more diverse vocabulary.


2021 ◽  
Vol 2021 ◽  
pp. 1-24
Author(s):  
Youness Mourtaji ◽  
Mohammed Bouhorma ◽  
Daniyal Alghazzawi ◽  
Ghadah Aldabbagh ◽  
Abdullah Alghamdi

The phenomenon of phishing has now been a common threat, since many individuals and webpages have been observed to be attacked by phishers. The common purpose of phishing activities is to obtain user’s personal information for illegitimate usage. Considering the growing intensity of the issue, this study is aimed at developing a new hybrid rule-based solution by incorporating six different algorithm models that may efficiently detect and control the phishing issue. The study incorporates 37 features extracted from six different methods including the black listed method, lexical and host method, content method, identity method, identity similarity method, visual similarity method, and behavioral method. Furthermore, comparative analysis was undertaken between different machine learning and deep learning models which includes CART (decision trees), SVM (support vector machines), or KNN ( K -nearest neighbors) and deep learning models such as MLP (multilayer perceptron) and CNN (convolutional neural networks). Findings of the study indicated that the method was effective in analysing the URL stress through different viewpoints, leading towards the validity of the model. However, the highest accuracy level was obtained for deep learning with the given values of 97.945 for the CNN model and 93.216 for the MLP model, respectively. The study therefore concludes that the new hybrid solution must be implemented at a practical level to reduce phishing activities, due to its high efficiency and accuracy.


2021 ◽  
Author(s):  
Nithin G R ◽  
Nitish Kumar M ◽  
Venkateswaran Narasimhan ◽  
Rajanikanth Kakani ◽  
Ujjwal Gupta ◽  
...  

Pansharpening is the task of creating a High-Resolution Multi-Spectral Image (HRMS) by extracting and infusing pixel details from the High-Resolution Panchromatic Image into the Low-Resolution Multi-Spectral (LRMS). With the boom in the amount of satellite image data, researchers have replaced traditional approaches with deep learning models. However, existing deep learning models are not built to capture intricate pixel-level relationships. Motivated by the recent success of self-attention mechanisms in computer vision tasks, we propose Pansformers, a transformer-based self-attention architecture, that computes band-wise attention. A further improvement is proposed in the attention network by introducing a Multi-Patch Attention mechanism, which operates on non-overlapping, local patches of the image. Our model is successful in infusing relevant local details from the Panchromatic image while preserving the spectral integrity of the MS image. We show that our Pansformer model significantly improves the performance metrics and the output image quality on imagery from two satellite distributions IKONOS and LANDSAT-8.


2020 ◽  
Vol 34 (07) ◽  
pp. 11029-11036
Author(s):  
Jiabo Huang ◽  
Qi Dong ◽  
Shaogang Gong ◽  
Xiatian Zhu

Convolutional neural networks (CNNs) have achieved unprecedented success in a variety of computer vision tasks. However, they usually rely on supervised model learning with the need for massive labelled training data, limiting dramatically their usability and deployability in real-world scenarios without any labelling budget. In this work, we introduce a general-purpose unsupervised deep learning approach to deriving discriminative feature representations. It is based on self-discovering semantically consistent groups of unlabelled training samples with the same class concepts through a progressive affinity diffusion process. Extensive experiments on object image classification and clustering show the performance superiority of the proposed method over the state-of-the-art unsupervised learning models using six common image recognition benchmarks including MNIST, SVHN, STL10, CIFAR10, CIFAR100 and ImageNet.


Sign in / Sign up

Export Citation Format

Share Document