scholarly journals PlaneNet: an efficient local feature extraction network

2021 ◽  
Vol 7 ◽  
pp. e783
Author(s):  
Bin Lin ◽  
Houcheng Su ◽  
Danyang Li ◽  
Ao Feng ◽  
Hongxiang Li ◽  
...  

Due to memory and computing resources limitations, deploying convolutional neural networks on embedded and mobile devices is challenging. However, the redundant use of the 1 × 1 convolution in traditional light-weight networks, such as MobileNetV1, has increased the computing time. By utilizing the 1 × 1 convolution that plays a vital role in extracting local features more effectively, a new lightweight network, named PlaneNet, is introduced. PlaneNet can improve the accuracy and reduce the numbers of parameters and multiply-accumulate operations (Madds). Our model is evaluated on classification and semantic segmentation tasks. In the classification tasks, the CIFAR-10, Caltech-101, and ImageNet2012 datasets are used. In the semantic segmentation task, PlaneNet is tested on the VOC2012 datasets. The experimental results demonstrate that PlaneNet (74.48%) can obtain higher accuracy than MobileNetV3-Large (73.99%) and GhostNet (72.87%) and achieves state-of-the-art performance with fewer network parameters in both tasks. In addition, compared with the existing models, it has reached the practical application level on mobile devices. The code of PlaneNet on GitHub: https://github.com/LinB203/planenet.

Author(s):  
Ningyu Zhang ◽  
Xiang Chen ◽  
Xin Xie ◽  
Shumin Deng ◽  
Chuanqi Tan ◽  
...  

Document-level relation extraction aims to extract relations among multiple entity pairs from a document. Previously proposed graph-based or transformer-based models utilize the entities independently, regardless of global information among relational triples. This paper approaches the problem by predicting an entity-level relation matrix to capture local and global information, parallel to the semantic segmentation task in computer vision. Herein, we propose a Document U-shaped Network for document-level relation extraction. Specifically, we leverage an encoder module to capture the context information of entities and a U-shaped segmentation module over the image-style feature map to capture global interdependency among triples. Experimental results show that our approach can obtain state-of-the-art performance on three benchmark datasets DocRED, CDR, and GDA.


Author(s):  
Tao Hu ◽  
Pengwan Yang ◽  
Chiliang Zhang ◽  
Gang Yu ◽  
Yadong Mu ◽  
...  

Few-shot learning is a nascent research topic, motivated by the fact that traditional deep learning methods require tremendous amounts of data. The scarcity of annotated data becomes even more challenging in semantic segmentation since pixellevel annotation in segmentation task is more labor-intensive to acquire. To tackle this issue, we propose an Attentionbased Multi-Context Guiding (A-MCG) network, which consists of three branches: the support branch, the query branch, the feature fusion branch. A key differentiator of A-MCG is the integration of multi-scale context features between support and query branches, enforcing a better guidance from the support set. In addition, we also adopt a spatial attention along the fusion branch to highlight context information from several scales, enhancing self-supervision in one-shot learning. To address the fusion problem in multi-shot learning, Conv-LSTM is adopted to collaboratively integrate the sequential support features to elevate the final accuracy. Our architecture obtains state-of-the-art on unseen classes in a variant of PASCAL VOC12 dataset and performs favorably against previous work with large gains of 1.1%, 1.4% measured in mIoU in the 1-shot and 5-shot setting.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5080
Author(s):  
Baohua Qiang ◽  
Ruidong Chen ◽  
Mingliang Zhou ◽  
Yuanchao Pang ◽  
Yijie Zhai ◽  
...  

In recent years, increasing image data comes from various sensors, and object detection plays a vital role in image understanding. For object detection in complex scenes, more detailed information in the image should be obtained to improve the accuracy of detection task. In this paper, we propose an object detection algorithm by jointing semantic segmentation (SSOD) for images. First, we construct a feature extraction network that integrates the hourglass structure network with the attention mechanism layer to extract and fuse multi-scale features to generate high-level features with rich semantic information. Second, the semantic segmentation task is used as an auxiliary task to allow the algorithm to perform multi-task learning. Finally, multi-scale features are used to predict the location and category of the object. The experimental results show that our algorithm substantially enhances object detection performance and consistently outperforms other three comparison algorithms, and the detection speed can reach real-time, which can be used for real-time detection.


Sensors ◽  
2020 ◽  
Vol 20 (3) ◽  
pp. 870 ◽  
Author(s):  
Yuanyuan Guo ◽  
Yifan Xia ◽  
Jing Wang ◽  
Hui Yu ◽  
Rung-Ching Chen

Convolutional Neural Networks (CNNs) have become one of the state-of-the-art methods for various computer vision and pattern recognition tasks including facial affective computing. Although impressive results have been obtained in facial affective computing using CNNs, the computational complexity of CNNs has also increased significantly. This means high performance hardware is typically indispensable. Most existing CNNs are thus not generalizable enough for mobile devices, where the storage, memory and computational power are limited. In this paper, we focus on the design and implementation of CNNs on mobile devices for real-time facial affective computing tasks. We propose a light-weight CNN architecture which well balances the performance and computational complexity. The experimental results show that the proposed architecture achieves high performance while retaining the low computational complexity compared with state-of-the-art methods. We demonstrate the feasibility of a CNN architecture in terms of speed, memory and storage consumption for mobile devices by implementing a real-time facial affective computing application on an actual mobile device.


2020 ◽  
Author(s):  
Bingyan Liu ◽  
Daru Pan ◽  
Hui Song

Abstract Background: Glaucoma is an eye disease that causes vision loss and even blindness. The cup to disc ratio (CDR) is an important indicator for glaucoma screening and diagnosis. Accurate segmentation for the optic disc and cup helps obtain CDR. Although many deep learning-based methods have been proposed to segment the disc and cup for fundus image, achieving highly accurate segmentation performance is still a great challenge due to the heavy overlap between the optic disc and cup.Methods: In this paper, we propose a two-stage method where the optic disc is firstly located and then the optic disc and cup are segmented jointly according to the interesting areas. Also, we consider the joint optic disc and cup segmentation task as a multi-category semantic segmentation task for which a deep learning-based model named DDSC-Net (densely connected depthwise separable convolution network) is proposed. Specifically, we employ depthwise separable convolutional layer and image pyramid input to form a deeper and wider networkto improve segmentation performance. Finally, we evaluate our method on two publicly available datasets, Drishti-GS and REFUGE dataset.Results: The experiment results show that the proposed method outperforms state-of-the-art methods, such as pOSAL, GL-Net, M-Net and Stack-U-Net in terms of disc coefficients, with the scores of 0.9780 (optic disc) and 0.9123 (optic cup) on the DRISHTI-GS dataset, and the scores of 0.9601 (optic disc) and 0.8903 (optic cup) on the REFUGE dataset. Particularly, in the more challenging optic cup segmentation task, our method outperforms GL-Net by 0.7 % in terms of disc coefficients on the Drishti-GS dataset and outperforms pOSAL by 0.79 %on the REFUGE dataset, respectively.Conclusions: The promising segmentation performances reveal that our method has the potential in assisting the screening and diagnosis of glaucoma.


Nowadays, grasping robot plays an important role in many automatic systems in the industrial environment. An excellent grasping robot can detect, localize, and pick objects accurately but to perfectly achieve these tasks, it is still a challenge in the computer vision field. Especially, segmentation task, which is understood as both detection and localization, is the hardest problem. To deal with this problem, the state-of-the-art Mask Region Convolution Neural Network (Mask R-CNN) was introduced and obtained an exceptional result. But this superb model does not certainly perform well when working with harsh locations of objects. The edge and border regions are usually misunderstood as the background, this leads to the failure in localizing objects to submit a good grasping plan. Thus, in this paper, we introduce a novel method that combines the original Mask R-CNN pipeline and 3D algorithms branch to preserve and classify the edge region. This results from the improvement of the performance of Mask R-CNN in detailed segmentation. Concretely, the significant improvement practiced in harsh situations of object location was obviously discussed in the experimental result section. Both IoU and mAP indicators are increased. Specifically, mAP, which directly reflects the semantic segmentation ability of a model, raised from 0.39 to 0.46. This approach opens a better way to determine the object location and grasping plan.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Bingyan Liu ◽  
Daru Pan ◽  
Hui Song

Abstract Background Glaucoma is an eye disease that causes vision loss and even blindness. The cup to disc ratio (CDR) is an important indicator for glaucoma screening and diagnosis. Accurate segmentation for the optic disc and cup helps obtain CDR. Although many deep learning-based methods have been proposed to segment the disc and cup for fundus image, achieving highly accurate segmentation performance is still a great challenge due to the heavy overlap between the optic disc and cup. Methods In this paper, we propose a two-stage method where the optic disc is firstly located and then the optic disc and cup are segmented jointly according to the interesting areas. Also, we consider the joint optic disc and cup segmentation task as a multi-category semantic segmentation task for which a deep learning-based model named DDSC-Net (densely connected depthwise separable convolution network) is proposed. Specifically, we employ depthwise separable convolutional layer and image pyramid input to form a deeper and wider network to improve segmentation performance. Finally, we evaluate our method on two publicly available datasets, Drishti-GS and REFUGE dataset. Results The experiment results show that the proposed method outperforms state-of-the-art methods, such as pOSAL, GL-Net, M-Net and Stack-U-Net in terms of disc coefficients, with the scores of 0.9780 (optic disc) and 0.9123 (optic cup) on the DRISHTI-GS dataset, and the scores of 0.9601 (optic disc) and 0.8903 (optic cup) on the REFUGE dataset. Particularly, in the more challenging optic cup segmentation task, our method outperforms GL-Net by 0.7$$\%$$ % in terms of disc coefficients on the Drishti-GS dataset and outperforms pOSAL by 0.79$$\%$$ % on the REFUGE dataset, respectively. Conclusions The promising segmentation performances reveal that our method has the potential in assisting the screening and diagnosis of glaucoma.


2020 ◽  
Author(s):  
Bingyan Liu ◽  
Daru Pan ◽  
Hui Song

Abstract Background: Glaucoma is an eye disease that causes vision loss and even blindness. The cup to disc ratio (CDR) is an important indicator for glaucoma screening and diagnosis. Accurate segmentation for the optic disc and cup helps obtain CDR. Although many deep learning-based methods have been proposed to segment the disc and cup for fundus image, achieving highly accurate segmentation performance is still a great challenge due to the heavy overlap between the optic disc and cup. Methods: In this paper, we propose a two-stage method where the optic disc is firstly located and then the optic disc and cup are segmented jointly according to the interesting areas. Also, we consider the joint optic disc and cup segmentation task as a multi-category semantic segmentation task for which a deep learning-based model named DDSC-Net (densely connected depthwise separable convolution network) is proposed. Specifically, we employ depthwise separable convolutional layer and image pyramid input to form a deeper and wider network to improve segmentation performance. Finally, we evaluate our method on two publicly available datasets, Drishti-GS and REFUGE dataset. Results: The experiment results show that the proposed method outperforms state-of-the-art methods, such as pOSAL, GL-Net, M-Net and Stack-U-Net in terms of disc coefficients, with the scores of 0.9780 (optic disc) and 0.9123 (optic cup) on the DRISHTI-GS dataset, and the scores of 0.9601 (optic disc) and 0.8903 (optic cup) on the REFUGE dataset. Particularly, in the more challenging optic cup segmentation task, our method outperforms GL-Net by 0.7 % in terms of disc coefficients on the Drishti-GS dataset and outperforms pOSAL by 0.79 % on the REFUGE dataset, respectively. Conclusions: The promising segmentation performances reveal that our method has the potential in assisting the screening and diagnosis of glaucoma.


2021 ◽  
Vol 40 (3) ◽  
pp. 1-13
Author(s):  
Lumin Yang ◽  
Jiajie Zhuang ◽  
Hongbo Fu ◽  
Xiangzhi Wei ◽  
Kun Zhou ◽  
...  

We introduce SketchGNN , a convolutional graph neural network for semantic segmentation and labeling of freehand vector sketches. We treat an input stroke-based sketch as a graph with nodes representing the sampled points along input strokes and edges encoding the stroke structure information. To predict the per-node labels, our SketchGNN uses graph convolution and a static-dynamic branching network architecture to extract the features at three levels, i.e., point-level, stroke-level, and sketch-level. SketchGNN significantly improves the accuracy of the state-of-the-art methods for semantic sketch segmentation (by 11.2% in the pixel-based metric and 18.2% in the component-based metric over a large-scale challenging SPG dataset) and has magnitudes fewer parameters than both image-based and sequence-based methods.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Matthew D. Guay ◽  
Zeyad A. S. Emam ◽  
Adam B. Anderson ◽  
Maria A. Aronova ◽  
Irina D. Pokrovskaya ◽  
...  

AbstractBiologists who use electron microscopy (EM) images to build nanoscale 3D models of whole cells and their organelles have historically been limited to small numbers of cells and cellular features due to constraints in imaging and analysis. This has been a major factor limiting insight into the complex variability of cellular environments. Modern EM can produce gigavoxel image volumes containing large numbers of cells, but accurate manual segmentation of image features is slow and limits the creation of cell models. Segmentation algorithms based on convolutional neural networks can process large volumes quickly, but achieving EM task accuracy goals often challenges current techniques. Here, we define dense cellular segmentation as a multiclass semantic segmentation task for modeling cells and large numbers of their organelles, and give an example in human blood platelets. We present an algorithm using novel hybrid 2D–3D segmentation networks to produce dense cellular segmentations with accuracy levels that outperform baseline methods and approach those of human annotators. To our knowledge, this work represents the first published approach to automating the creation of cell models with this level of structural detail.


Sign in / Sign up

Export Citation Format

Share Document