scholarly journals Improving Classification with CNNs using Wavelet Pooling with Nesterov-Accelerated Adam

10.29007/9c5j ◽  
2019 ◽  
Author(s):  
Allison Rossetto ◽  
Wenjin Zhou

Wavelet pooling methods can improve the classification accuracy of Convolutional Neural Networks (CNNs). Combining wavelet pooling with the Nesterov-accelerated Adam (NAdam) gradient calculation method can improve both the accuracy of the CNN. We have implemented wavelet pooling with NAdam in this work using both a Haar wavelet (WavPool-NH) and a Shannon wavelet (WavPool-NS). The WavPool-NH and WavPool- NS methods are most accurate of the methods we considered for the MNIST and LIDC- IDRI lung tumor data-sets. The WavPool-NH and WavPool-NS implementations have an accuracy of 95.92% and 95.52%, respectively, on the LIDC-IDRI data-set. This is an improvement from the 92.93% accuracy obtained on this data-set with the max pooling method. The WavPool methods also avoid overfitting which is a concern with max pool- ing. We also found WavPool performed fairly well on the CIFAR-10 data-set, however, overfitting was an issue with all the methods we considered. Wavelet pooling, especially when combined with an adaptive gradient and wavelets chosen specifically for the data, has the potential to outperform current methods.


2021 ◽  
Vol 42 (1) ◽  
pp. e90289
Author(s):  
Carlos Eduardo Belman López

Given that it is fundamental to detect positive COVID-19 cases and treat affected patients quickly to mitigate the impact of the virus, X-ray images have been subjected to research regarding COVID-19, together with deep learning models, eliminating disadvantages such as the scarcity of RT-PCR test kits, their elevated costs, and the long wait for results. The contribution of this paper is to present new models for detecting COVID-19 and other cases of pneumonia using chest X-ray images and convolutional neural networks, thus providing accurate diagnostics in binary and 4-classes classification scenarios. Classification accuracy was improved, and overfitting was prevented by following 2 actions: (1) increasing the data set size while the classification scenarios were balanced; and (2) adding regularization techniques and performing hyperparameter optimization. Additionally, the network capacity and size in the models were reduced as much as possible, making the final models a perfect option to be deployed locally on devices with limited capacities and without the need for Internet access. The impact of key hyperparameters was tested using modern deep learning packages. The final models obtained a classification accuracy of 99,17 and 94,03% for the binary and categorical scenarios, respectively, achieving superior performance compared to other studies in the literature, and requiring a significantly lower number of parameters. The models can also be placed on a digital platform to provide instantaneous diagnostics and surpass the shortage of experts and radiologists.



2020 ◽  
Vol 12 (11) ◽  
pp. 1794
Author(s):  
Naisen Yang ◽  
Hong Tang

Modern convolutional neural networks (CNNs) are often trained on pre-set data sets with a fixed size. As for the large-scale applications of satellite images, for example, global or regional mappings, these images are collected incrementally by multiple stages in general. In other words, the sizes of training datasets might be increased for the tasks of mapping rather than be fixed beforehand. In this paper, we present a novel algorithm, called GeoBoost, for the incremental-learning tasks of semantic segmentation via convolutional neural networks. Specifically, the GeoBoost algorithm is trained in an end-to-end manner on the newly available data, and it does not decrease the performance of previously trained models. The effectiveness of the GeoBoost algorithm is verified on the large-scale data set of DREAM-B. This method avoids the need for training on the enlarged data set from scratch and would become more effective along with more available data.



2018 ◽  
Vol 10 (10) ◽  
pp. 1649 ◽  
Author(s):  
Hao Li ◽  
Pedram Ghamisi ◽  
Uwe Soergel ◽  
Xiao Zhu

Recently, convolutional neural networks (CNN) have been intensively investigated for the classification of remote sensing data by extracting invariant and abstract features suitable for classification. In this paper, a novel framework is proposed for the fusion of hyperspectral images and LiDAR-derived elevation data based on CNN and composite kernels. First, extinction profiles are applied to both data sources in order to extract spatial and elevation features from hyperspectral and LiDAR-derived data, respectively. Second, a three-stream CNN is designed to extract informative spectral, spatial, and elevation features individually from both available sources. The combination of extinction profiles and CNN features enables us to jointly benefit from low-level and high-level features to improve classification performance. To fuse the heterogeneous spectral, spatial, and elevation features extracted by CNN, instead of a simple stacking strategy, a multi-sensor composite kernels (MCK) scheme is designed. This scheme helps us to achieve higher spectral, spatial, and elevation separability of the extracted features and effectively perform multi-sensor data fusion in kernel space. In this context, a support vector machine and extreme learning machine with their composite kernels version are employed to produce the final classification result. The proposed framework is carried out on two widely used data sets with different characteristics: an urban data set captured over Houston, USA, and a rural data set captured over Trento, Italy. The proposed framework yields the highest OA of 92 . 57 % and 97 . 91 % for Houston and Trento data sets. Experimental results confirm that the proposed fusion framework can produce competitive results in both urban and rural areas in terms of classification accuracy, and significantly mitigate the salt and pepper noise in classification maps.



2020 ◽  
Vol 10 (1) ◽  
pp. 55-60
Author(s):  
Owais Mujtaba Khanday ◽  
Samad Dadvandipour

Deep Neural Networks (DNN) in the past few years have revolutionized the computer vision by providing the best results on a large number of problems such as image classification, pattern recognition, and speech recognition. One of the essential models in deep learning used for image classification is convolutional neural networks. These networks can integrate a different number of features or so-called filters in a multi-layer fashion called convolutional layers. These models use convolutional, and pooling layers for feature abstraction and have neurons arranged in three dimensions: Height, Width, and Depth. Filters of 3 different sizes were used like 3×3, 5×5 and 7×7. It has been seen that the accuracy on the training data has been decreased from 100% to 97.8% as we increase the filter size and also the accuracy on the test data set decreases for 3×3 it is 98.7%, for 5×5 it is 98.5%, and for 7×7 it is 97.8%. The loss on the training data and test data per 10 epochs could be seen drastically increasing from 3.4% to 27.6% and 12.5% to 23.02%, respectively. Thus it is clear that using the filters having lesser dimensions is giving less loss than those having more dimensions. However, using the smaller filter size comes with the cost of computational complexity, which is very crucial in the case of larger data sets.



2020 ◽  
Vol 6 ◽  
Author(s):  
Jaime de Miguel Rodríguez ◽  
Maria Eugenia Villafañe ◽  
Luka Piškorec ◽  
Fernando Sancho Caparrini

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.



Author(s):  
Sebastian Nowak ◽  
Narine Mesropyan ◽  
Anton Faron ◽  
Wolfgang Block ◽  
Martin Reuter ◽  
...  

Abstract Objectives To investigate the diagnostic performance of deep transfer learning (DTL) to detect liver cirrhosis from clinical MRI. Methods The dataset for this retrospective analysis consisted of 713 (343 female) patients who underwent liver MRI between 2017 and 2019. In total, 553 of these subjects had a confirmed diagnosis of liver cirrhosis, while the remainder had no history of liver disease. T2-weighted MRI slices at the level of the caudate lobe were manually exported for DTL analysis. Data were randomly split into training, validation, and test sets (70%/15%/15%). A ResNet50 convolutional neural network (CNN) pre-trained on the ImageNet archive was used for cirrhosis detection with and without upstream liver segmentation. Classification performance for detection of liver cirrhosis was compared to two radiologists with different levels of experience (4th-year resident, board-certified radiologist). Segmentation was performed using a U-Net architecture built on a pre-trained ResNet34 encoder. Differences in classification accuracy were assessed by the χ2-test. Results Dice coefficients for automatic segmentation were above 0.98 for both validation and test data. The classification accuracy of liver cirrhosis on validation (vACC) and test (tACC) data for the DTL pipeline with upstream liver segmentation (vACC = 0.99, tACC = 0.96) was significantly higher compared to the resident (vACC = 0.88, p < 0.01; tACC = 0.91, p = 0.01) and to the board-certified radiologist (vACC = 0.96, p < 0.01; tACC = 0.90, p < 0.01). Conclusion This proof-of-principle study demonstrates the potential of DTL for detecting cirrhosis based on standard T2-weighted MRI. The presented method for image-based diagnosis of liver cirrhosis demonstrated expert-level classification accuracy. Key Points • A pipeline consisting of two convolutional neural networks (CNNs) pre-trained on an extensive natural image database (ImageNet archive) enables detection of liver cirrhosis on standard T2-weighted MRI. • High classification accuracy can be achieved even without altering the pre-trained parameters of the convolutional neural networks. • Other abdominal structures apart from the liver were relevant for detection when the network was trained on unsegmented images.



2021 ◽  
Vol 65 (1) ◽  
pp. 11-22
Author(s):  
Mengyao Lu ◽  
Shuwen Jiang ◽  
Cong Wang ◽  
Dong Chen ◽  
Tian’en Chen

HighlightsA classification model for the front and back sides of tobacco leaves was developed for application in industry.A tobacco leaf grading method that combines a CNN with double-branch integration was proposed.The A-ResNet network was proposed and compared with other classic CNN networks.The grading accuracy of eight different grades was 91.30% and the testing time was 82.180 ms, showing a relatively high classification accuracy and efficiency.Abstract. Flue-cured tobacco leaf grading is a key step in the production and processing of Chinese-style cigarette raw materials, directly affecting cigarette blend and quality stability. At present, manual grading of tobacco leaves is dominant in China, resulting in unsatisfactory grading quality and consuming considerable material and financial resources. In this study, for fast, accurate, and non-destructive tobacco leaf grading, 2,791 flue-cured tobacco leaves of eight different grades in south Anhui Province, China, were chosen as the study sample, and a tobacco leaf grading method that combines convolutional neural networks and double-branch integration was proposed. First, a classification model for the front and back sides of tobacco leaves was trained by transfer learning. Second, two processing methods (equal-scaled resizing and cropping) were used to obtain global images and local patches from the front sides of tobacco leaves. A global image-based tobacco leaf grading model was then developed using the proposed A-ResNet-65 network, and a local patch-based tobacco leaf grading model was developed using the ResNet-34 network. These two networks were compared with classic deep learning networks, such as VGGNet, GoogLeNet-V3, and ResNet. Finally, the grading results of the two grading models were integrated to realize tobacco leaf grading. The tobacco leaf classification accuracy of the final model, for eight different grades, was 91.30%, and grading of a single tobacco leaf required 82.180 ms. The proposed method achieved a relatively high grading accuracy and efficiency. It provides a method for industrial implementation of the tobacco leaf grading and offers a new approach for the quality grading of other agricultural products. Keywords: Convolutional neural network, Deep learning, Image classification, Transfer learning, Tobacco leaf grading



The objective of this research is provide to the specialists in skin cancer, a premature, rapid and non-invasive diagnosis of melanoma identification, using an image of the lesion, to apply to the treatment of a patient, the method used is the architecture contrast of Convolutional neural networks proposed by Laura Kocobinski of the University of Boston, against our architecture, which reduce the depth of the convolution filter of the last two convolutional layers to obtain maps of more significant characteristics. The performance of the model was reflected in the accuracy during the validation, considering the best result obtained, which is confirmed with the additional data set. The findings found with the application of this base architecture were improved accuracy from 0.79 to 0.83, with 30 epochs, compared to Kocobinski's AlexNet architecture, it was not possible to improve the accuracy of 0.90, however, the complexity of the network played an important role in the results we obtained, which was able to balance and obtain better results without increasing the epochs, the application of our research is very helpful for doctors, since it will allow them to quickly identify if an injury is melanoma or not and consequently treat it efficiently.



2020 ◽  
Vol 34 (04) ◽  
pp. 5620-5627 ◽  
Author(s):  
Murat Sensoy ◽  
Lance Kaplan ◽  
Federico Cerutti ◽  
Maryam Saleki

Deep neural networks are often ignorant about what they do not know and overconfident when they make uninformed predictions. Some recent approaches quantify classification uncertainty directly by training the model to output high uncertainty for the data samples close to class boundaries or from the outside of the training distribution. These approaches use an auxiliary data set during training to represent out-of-distribution samples. However, selection or creation of such an auxiliary data set is non-trivial, especially for high dimensional data such as images. In this work we develop a novel neural network model that is able to express both aleatoric and epistemic uncertainty to distinguish decision boundary and out-of-distribution regions of the feature space. To this end, variational autoencoders and generative adversarial networks are incorporated to automatically generate out-of-distribution exemplars for training. Through extensive analysis, we demonstrate that the proposed approach provides better estimates of uncertainty for in- and out-of-distribution samples, and adversarial examples on well-known data sets against state-of-the-art approaches including recent Bayesian approaches for neural networks and anomaly detection methods.



2021 ◽  
pp. 1-17
Author(s):  
Luis Sa-Couto ◽  
Andreas Wichert

Abstract Convolutional neural networks (CNNs) evolved from Fukushima's neocognitron model, which is based on the ideas of Hubel and Wiesel about the early stages of the visual cortex. Unlike other branches of neocognitron-based models, the typical CNN is based on end-to-end supervised learning by backpropagation and removes the focus from built-in invariance mechanisms, using pooling not as a way to tolerate small shifts but as a regularization tool that decreases model complexity. These properties of end-to-end supervision and flexibility of structure allow the typical CNN to become highly tuned to the training data, leading to extremely high accuracies on typical visual pattern recognition data sets. However, in this work, we hypothesize that there is a flip side to this capability, a hidden overfitting. More concretely, a supervised, backpropagation based CNN will outperform a neocognitron/map transformation cascade (MTCCXC) when trained and tested inside the same data set. Yet if we take both models trained and test them on the same task but on another data set (without retraining), the overfitting appears. Other neocognitron descendants like the What-Where model go in a different direction. In these models, learning remains unsupervised, but more structure is added to capture invariance to typical changes. Knowing that, we further hypothesize that if we repeat the same experiments with this model, the lack of supervision may make it worse than the typical CNN inside the same data set, but the added structure will make it generalize even better to another one. To put our hypothesis to the test, we choose the simple task of handwritten digit classification and take two well-known data sets of it: MNIST and ETL-1. To try to make the two data sets as similar as possible, we experiment with several types of preprocessing. However, regardless of the type in question, the results align exactly with expectation.



Sign in / Sign up

Export Citation Format

Share Document