scholarly journals From Auto-encoders to Capsule Networks: A Survey

2021 ◽  
Vol 229 ◽  
pp. 01003
Author(s):  
Omaima El Alaoui-Elfels ◽  
Taoufiq Gadi

Convolutional Neural Networks are a very powerful Deep Learning structure used in image processing, object classification and segmentation. They are very robust in extracting features from data and largely used in several domains. Nonetheless, they require a large number of training datasets and relations between features get lost in the Max-pooling step, which can lead to a wrong classification. Capsule Networks(CapsNets) were introduced to overcome these limitations by extracting features and their pose using capsules instead of neurons. This technique shows an impressive performance in one-dimensional, two-dimensional and three-dimensional datasets as well as in sparse datasets. In this paper, we present an initial understanding of CapsNets, their concept, structure and learning algorithm. We introduce the progress made by CapsNets from their introduction in 2011 until 2020. We compare different CapsNets series architectures to demonstrate strengths and challenges. Finally, we quote different implementations of Capsule Networks and show their robustness in a variety of domains. This survey provides the state-of-theartof Capsule Networks and allows other researchers to get a clear view of this new field. Besides, we discuss the open issues and the promising directions of future research, which may lead to a new generation of CapsNets.

2021 ◽  
Vol 229 ◽  
pp. 01048
Author(s):  
Omaima El Alaoui-Elfels ◽  
Taoufiq Gadi

Convolutional Neural Networks are a very powerful Deep Learning algorithm used in image processing, object classification and segmentation. They are very robust in extracting features from data and largely used in several domains. Nonetheless, they require a large number of training datasets and relations between features get lost in the Max-pooling step, which can lead to a wrong classification. Capsule Networks (CapsNets) were introduced to overcome these limitations by extracting features and their pose using capsules instead of neurons. This technique shows an impressive performance in one-dimensional, two-dimensional and three-dimensional datasets as well as in sparse datasets. In this paper, we present an initial understanding of CapsNets, their concept, structure and learning algorithm. We introduce the progress made by CapsNets from their introduction in 2011 until 2020. We compare different CapsNets series to demonstrate strengths and challenges. Finally, we quote different implementations of Capsule Networks and show their robustness in a variety of domains. This survey provides the state-of-the-art of Capsule Networks and allows other researchers to get a clear view of this new field. Besides, we discuss the open issues and the promising directions of future research, which may lead to a new generation of CapsNets.


Algorithms ◽  
2021 ◽  
Vol 14 (3) ◽  
pp. 99
Author(s):  
Yang Zheng ◽  
Jieyu Zhao ◽  
Yu Chen ◽  
Chen Tang ◽  
Shushi Yu

With the widespread success of deep learning in the two-dimensional field, how to apply deep learning methods from two-dimensional to three-dimensional field has become a current research hotspot. Among them, the polygon mesh structure in the three-dimensional representation as a complex data structure provides an effective shape approximate representation for the three-dimensional object. Although the traditional method can extract the characteristics of the three-dimensional object through the graphical method, it cannot be applied to more complex objects. However, due to the complexity and irregularity of the mesh data, it is difficult to directly apply convolutional neural networks to 3D mesh data processing. Considering this problem, we propose a deep learning method based on a capsule network to effectively classify mesh data. We first design a polynomial convolution template. Through a sliding operation similar to a two-dimensional image convolution window, we directly sample on the grid surface, and use the window sampling surface as the minimum unit of calculation. Because a high-order polynomial can effectively represent a surface, we fit the approximate shape of the surface through the polynomial, use the polynomial parameter as the shape feature of the surface, and add the center point coordinates and normal vector of the surface as the pose feature of the surface. The feature is used as the feature vector of the surface. At the same time, to solve the problem of the introduction of a large number of pooling layers in traditional convolutional neural networks, the capsule network is introduced. For the problem of nonuniform size of the input grid model, the capsule network attitude parameter learning method is improved by sharing the weight of the attitude matrix. The amount of model parameters is reduced, and the training efficiency of the 3D mesh model is further improved. The experiment is compared with the traditional method and the latest two methods on the SHREC15 data set. Compared with the MeshNet and MeshCNN, the average recognition accuracy in the original test set is improved by 3.4% and 2.1%, and the average after fusion of features the accuracy reaches 93.8%. At the same time, under the premise of short training time, this method can also achieve considerable recognition results through experimental verification. The three-dimensional mesh classification method proposed in this paper combines the advantages of graphics and deep learning methods, and effectively improves the classification effect of 3D mesh model.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Kenneth W. Dunn ◽  
Chichen Fu ◽  
David Joon Ho ◽  
Soonam Lee ◽  
Shuo Han ◽  
...  

AbstractThe scale of biological microscopy has increased dramatically over the past ten years, with the development of new modalities supporting collection of high-resolution fluorescence image volumes spanning hundreds of microns if not millimeters. The size and complexity of these volumes is such that quantitative analysis requires automated methods of image processing to identify and characterize individual cells. For many workflows, this process starts with segmentation of nuclei that, due to their ubiquity, ease-of-labeling and relatively simple structure, make them appealing targets for automated detection of individual cells. However, in the context of large, three-dimensional image volumes, nuclei present many challenges to automated segmentation, such that conventional approaches are seldom effective and/or robust. Techniques based upon deep-learning have shown great promise, but enthusiasm for applying these techniques is tempered by the need to generate training data, an arduous task, particularly in three dimensions. Here we present results of a new technique of nuclear segmentation using neural networks trained on synthetic data. Comparisons with results obtained using commonly-used image processing packages demonstrate that DeepSynth provides the superior results associated with deep-learning techniques without the need for manual annotation.


2022 ◽  
Vol 15 ◽  
Author(s):  
Meera Srikrishna ◽  
Rolf A. Heckemann ◽  
Joana B. Pereira ◽  
Giovanni Volpe ◽  
Anna Zettergren ◽  
...  

Brain tissue segmentation plays a crucial role in feature extraction, volumetric quantification, and morphometric analysis of brain scans. For the assessment of brain structure and integrity, CT is a non-invasive, cheaper, faster, and more widely available modality than MRI. However, the clinical application of CT is mostly limited to the visual assessment of brain integrity and exclusion of copathologies. We have previously developed two-dimensional (2D) deep learning-based segmentation networks that successfully classified brain tissue in head CT. Recently, deep learning-based MRI segmentation models successfully use patch-based three-dimensional (3D) segmentation networks. In this study, we aimed to develop patch-based 3D segmentation networks for CT brain tissue classification. Furthermore, we aimed to compare the performance of 2D- and 3D-based segmentation networks to perform brain tissue classification in anisotropic CT scans. For this purpose, we developed 2D and 3D U-Net-based deep learning models that were trained and validated on MR-derived segmentations from scans of 744 participants of the Gothenburg H70 Cohort with both CT and T1-weighted MRI scans acquired timely close to each other. Segmentation performance of both 2D and 3D models was evaluated on 234 unseen datasets using measures of distance, spatial similarity, and tissue volume. Single-task slice-wise processed 2D U-Nets performed better than multitask patch-based 3D U-Nets in CT brain tissue classification. These findings provide support to the use of 2D U-Nets to segment brain tissue in one-dimensional (1D) CT. This could increase the application of CT to detect brain abnormalities in clinical settings.


1998 ◽  
Vol 10 (1-3) ◽  
pp. 100-108 ◽  
Author(s):  
Alicia Colson ◽  
Ross Parry

This article argues that the analysis of a threedimensional image demanded a three-dimensional approach. The authors realise that discussions of images and image processing inveterately conceptualise representation as being flat, static, and finite. The authors recognise the need for a fresh acuteness to three-dimensionality as a meaningful – although problematic – element of visual sources. Two dramatically different examples are used to expose the shortcomings of an ingrained two-dimensional approach and to facilitate a demonstration of how modern (digital) techniques could sanction new historical/anthropological perspectives on subjects that have become all too familiar. Each example could not be more different in their temporal and geographical location, their cultural resonance, and their historiography. However, in both these visual spectacles meaning is polysemic. It is dependent upon the viewer's spatial relationship to the artifice as well as the spirito-intellectual viewer within the community. The authors postulate that the multi- faceted and multi-layered arrangement of meaning in a complex image could be assessed by working beyond the limitations of the two-dimensional methodological paradigm and by using methods and media that accommodated this type of interconnectivity and representation.


2021 ◽  
Vol 11 (15) ◽  
pp. 7016
Author(s):  
Pawel S. Dabrowski ◽  
Cezary Specht ◽  
Mariusz Specht ◽  
Artur Makar

The theory of cartographic projections is a tool which can present the convex surface of the Earth on the plane. Of the many types of maps, thematic maps perform an important function due to the wide possibilities of adapting their content to current needs. The limitation of classic maps is their two-dimensional nature. In the era of rapidly growing methods of mass acquisition of spatial data, the use of flat images is often not enough to reveal the level of complexity of certain objects. In this case, it is necessary to use visualization in three-dimensional space. The motivation to conduct the study was the use of cartographic projections methods, spatial transformations, and the possibilities offered by thematic maps to create thematic three-dimensional map imaging (T3DMI). The authors presented a practical verification of the adopted methodology to create a T3DMI visualization of the marina of the National Sailing Centre of the Gdańsk University of Physical Education and Sport (Poland). The profiled characteristics of the object were used to emphasize the key elements of its function. The results confirmed the increase in the interpretative capabilities of the T3DMI method, relative to classic two-dimensional maps. Additionally, the study suggested future research directions of the presented solution.


2021 ◽  
Vol 26 (1) ◽  
pp. 200-215
Author(s):  
Muhammad Alam ◽  
Jian-Feng Wang ◽  
Cong Guangpei ◽  
LV Yunrong ◽  
Yuanfang Chen

AbstractIn recent years, the success of deep learning in natural scene image processing boosted its application in the analysis of remote sensing images. In this paper, we applied Convolutional Neural Networks (CNN) on the semantic segmentation of remote sensing images. We improve the Encoder- Decoder CNN structure SegNet with index pooling and U-net to make them suitable for multi-targets semantic segmentation of remote sensing images. The results show that these two models have their own advantages and disadvantages on the segmentation of different objects. In addition, we propose an integrated algorithm that integrates these two models. Experimental results show that the presented integrated algorithm can exploite the advantages of both the models for multi-target segmentation and achieve a better segmentation compared to these two models.


Sign in / Sign up

Export Citation Format

Share Document