scholarly journals Earf-YOLO:A Model for Recognizing Zhuang Minority Patterns Based on YOLOv3

Author(s):  
XIN WANG ◽  
Jingke Yan ◽  
QIN WANG ◽  
QIN QIN ◽  
JUN WANG ◽  
...  

Abstract With reference to the limitations of YOLOv3 in recognizing symbols on Zhuang pattern, such as slow detection speed, unable to detect small object, and inaccurate positioning of bounding box, we propose a new model: Earf-YOLO (Efficient Attention Receptive Field You only look once) in this paper. In EarF-YOLO, we present an attention module: CBEAM (Convolution Block Efficient Attention Module) at first, which provides feature maps from channel and spatial dimensions. In CBEAM module, a local cross-channel interaction strategy without reducing dimensionality is used to improve the performance of the convolutional neural network. Besides, we put forward the SRFB (Strength Receptive Field Block) structure. During its training, more branch structures will be generated to enrich the feature space of the convolutional block. During its prediction, the multi-branched structures will be reparametrized and fused into one main branch to improve the performance of the model. Finally, we adopt some advanced training techniques to improve the detection performance. Experiments on the dataset of Zhuang patterns and the COCO dataset show that the Earf-YOLO model can effectively reduce the error of the prediction box and the ground-truth box, and decrease the calculation time. The mAP value of this model on the dataset of Zhuang patterns and on the COCO dataset reaches 82.1 (IoU=0.5) and 62.14 (IoU=0.5) respectively.

Mathematics ◽  
2021 ◽  
Vol 9 (21) ◽  
pp. 2815
Author(s):  
Shih-Hung Yang ◽  
Yao-Mao Cheng ◽  
Jyun-We Huang ◽  
Yon-Ping Chen

Automatic fingerspelling recognition tackles the communication barrier between deaf and hearing individuals. However, the accuracy of fingerspelling recognition is reduced by high intra-class variability and low inter-class variability. In the existing methods, regular convolutional kernels, which have limited receptive fields (RFs) and often cannot detect subtle discriminative details, are applied to learn features. In this study, we propose a receptive field-aware network with finger attention (RFaNet) that highlights the finger regions and builds inter-finger relations. To highlight the discriminative details of these fingers, RFaNet reweights the low-level features of the hand depth image with those of the non-forearm image and improves finger localization, even when the wrist is occluded. RFaNet captures neighboring and inter-region dependencies between fingers in high-level features. An atrous convolution procedure enlarges the RFs at multiple scales and a non-local operation computes the interactions between multi-scale feature maps, thereby facilitating the building of inter-finger relations. Thus, the representation of a sign is invariant to viewpoint changes, which are primarily responsible for intra-class variability. On an American Sign Language fingerspelling dataset, RFaNet achieved 1.77% higher classification accuracy than state-of-the-art methods. RFaNet achieved effective transfer learning when the number of labeled depth images was insufficient. The fingerspelling representation of a depth image can be effectively transferred from large- to small-scale datasets via highlighting the finger regions and building inter-finger relations, thereby reducing the requirement for expensive fingerspelling annotations.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1737 ◽  
Author(s):  
Tae-young Ko ◽  
Seung-ho Lee

This paper proposes a novel method of semantic segmentation, consisting of modified dilated residual network, atrous pyramid pooling module, and backpropagation, that is applicable to augmented reality (AR). In the proposed method, the modified dilated residual network extracts a feature map from the original images and maintains spatial information. The atrous pyramid pooling module places convolutions in parallel and layers feature maps in a pyramid shape to extract objects occupying small areas in the image; these are converted into one channel using a 1 × 1 convolution. Backpropagation compares the semantic segmentation obtained through convolution from the final feature map with the ground truth provided by a database. Losses can be reduced by applying backpropagation to the modified dilated residual network to change the weighting. The proposed method was compared with other methods on the Cityscapes and PASCAL VOC 2012 databases. The proposed method achieved accuracies of 82.8 and 89.8 mean intersection over union (mIOU) and frame rates of 61 and 64.3 frames per second (fps) for the Cityscapes and PASCAL VOC 2012 databases, respectively. These results prove the applicability of the proposed method for implementing natural AR applications at actual speeds because the frame rate is greater than 60 fps.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Xiaodong Huang ◽  
Hui Zhang ◽  
Li Zhuo ◽  
Xiaoguang Li ◽  
Jing Zhang

Extracting the tongue body accurately from a digital tongue image is a challenge for automated tongue diagnoses, as the blurred edge of the tongue body, interference of pathological details, and the huge difference in the size and shape of the tongue. In this study, an automated tongue image segmentation method using enhanced fully convolutional network with encoder-decoder structure was presented. In the frame of the proposed network, the deep residual network was adopted as an encoder to obtain dense feature maps, and a Receptive Field Block was assembled behind the encoder. Receptive Field Block can capture adequate global contextual prior because of its structure of the multibranch convolution layers with varying kernels. Moreover, the Feature Pyramid Network was used as a decoder to fuse multiscale feature maps for gathering sufficient positional information to recover the clear contour of the tongue body. The quantitative evaluation of the segmentation results of 300 tongue images from the SIPL-tongue dataset showed that the average Hausdorff Distance, average Symmetric Mean Absolute Surface Distance, average Dice Similarity Coefficient, average precision, average sensitivity, and average specificity were 11.2963, 3.4737, 97.26%, 95.66%, 98.97%, and 98.68%, respectively. The proposed method achieved the best performance compared with the other four deep-learning-based segmentation methods (including SegNet, FCN, PSPNet, and DeepLab v3+). There were also similar results on the HIT-tongue dataset. The experimental results demonstrated that the proposed method can achieve accurate tongue image segmentation and meet the practical requirements of automated tongue diagnoses.


2020 ◽  
Vol 10 (23) ◽  
pp. 8434
Author(s):  
Peiran Peng ◽  
Ying Wang ◽  
Can Hao ◽  
Zhizhong Zhu ◽  
Tong Liu ◽  
...  

Fabric defect detection is very important in the textile quality process. Current deep learning algorithms are not effective in detecting tiny and extreme aspect ratio fabric defects. In this paper, we proposed a strong detection method, Priori Anchor Convolutional Neural Network (PRAN-Net), for fabric defect detection to improve the detection and location accuracy of fabric defects and decrease the inspection time. First, we used Feature Pyramid Network (FPN) by selected multi-scale feature maps to reserve more detailed information of tiny defects. Secondly, we proposed a trick to generate sparse priori anchors based on fabric defects ground truth boxes instead of fixed anchors to locate extreme defects more accurately and efficiently. Finally, a classification network is used to classify and refine the position of the fabric defects. The method was validated on two self-made fabric datasets. Experimental results indicate that our method significantly improved the accuracy and efficiency of detecting fabric defects and is more suitable to the automatic fabric defect detection.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Senbo Yan ◽  
Xiaowen Song ◽  
Guocong Liu

In recent years, researches in the field of salient object detection have been widely made in many industrial visual inspection tasks. Automated surface inspection (ASI) can be regarded as one of the most challenging tasks in computer vision because of its high cost of data acquisition, serious imbalance of test samples, and high real-time requirement. Inspired by the requirements of industrial ASI and the methods of salient object detection (SOD), a task mode of defect type classification plus defect area segmentation and a novel deeper and mixed supervision network (DMS) architecture is proposed. The backbone network ResNeXt-101 was pretrained on ImageNet. Firstly, we extract five multiscale feature maps from backbone and concatenate them layer by layer. In addition, to obtain the classification prediction and saliency maps in one stage, the image-level and pixel-level ground truth is trained in a same side output network. Supervision signal is imposed on each side layer to realize deeper and mixed training for the network. Furthermore, the DMS network is equipped with residual refinement mechanism to refine the saliency maps of input images. We evaluate the DMS network on 4 open access ASI datasets and compare it with other 20 methods, which indicates that mixed supervision can significantly improve the accuracy of saliency segmentation. Experiment results show that the proposed method can achieve the state-of-the-art performance.


2020 ◽  
Vol 10 (3) ◽  
pp. 724-730
Author(s):  
Chunjiang Fan ◽  
Zijian Wang ◽  
Gang Li ◽  
Jian Luo ◽  
Yang Cao ◽  
...  

Image segmentation technologies play a crucial role in medical diagnosis. This paper proposed a novel paralleling structure based on conventional 3D U-net deep network for improving the performance of CT image segmentation. In our model architecture, a new connection channel from analysis path to synthesis path was constructed for exploiting feature maps from deep spatial dimensions. 60 CT scan images of stroke patients were collected for lesion location. Finally, there were 36 valid data were selected for further analysis. The improved method led to better achievement for this task, which segment stroke CT scan images into healthy parts and injury parts. The performance on the test set obtained by our method was compared with other state-of-art U-net models, to demonstrate the effectiveness of our architecture. Furthermore, the result verified that paralleling structure was useful for the convergence of loss curve.


2017 ◽  
Author(s):  
Ghislain St-Yves ◽  
Thomas Naselaris

AbstractWe introduce the feature-weighted receptive field (fwRF), an encoding model designed to balance expressiveness, interpretability and scalability. The fwRF is organized around the notion of a feature map—a transformation of visual stimuli into visual features that preserves the topology of visual space (but not necessarily the native resolution of the stimulus). The key assumption of the fwRF model is that activity in each voxel encodes variation in a spatially localized region across multiple feature maps. This region is fixed for all feature maps; however, the contribution of each feature map to voxel activity is weighted. Thus, the model has two separable sets of parameters: “where” parameters that characterize the location and extent of pooling over visual features, and “what” parameters that characterize tuning to visual features. The “where” parameters are analogous to classical receptive fields, while “what” parameters are analogous to classical tuning functions. By treating these as separable parameters, the fwRF model complexity is independent of the resolution of the underlying feature maps. This makes it possible to estimate models with thousands of high-resolution feature maps from relatively small amounts of data. Once a fwRF model has been estimated from data, spatial pooling and feature tuning can be read-off directly with no (or very little) additional post-processing or in-silico experimentation.We describe an optimization algorithm for estimating fwRF models from data acquired during standard visual neuroimaging experiments. We then demonstrate the model’s application to two distinct sets of features: Gabor wavelets and features supplied by a deep convolutional neural network. We show that when Gabor feature maps are used, the fwRF model recovers receptive fields and spatial frequency tuning functions consistent with known organizational principles of the visual cortex. We also show that a fwRF model can be used to regress entire deep convolutional networks against brain activity. The ability to use whole networks in a single encoding model yields state-of-the-art prediction accuracy. Our results suggest a wide variety of uses for the feature-weighted receptive field model, from retinotopic mapping with natural scenes, to regressing the activities of whole deep neural networks onto measured brain activity.


2020 ◽  
Vol 2020 ◽  
pp. 1-18 ◽  
Author(s):  
Nhat-Duy Nguyen ◽  
Tien Do ◽  
Thanh Duc Ngo ◽  
Duy-Dinh Le

Small object detection is an interesting topic in computer vision. With the rapid development in deep learning, it has drawn attention of several researchers with innovations in approaches to join a race. These innovations proposed comprise region proposals, divided grid cell, multiscale feature maps, and new loss function. As a result, performance of object detection has recently had significant improvements. However, most of the state-of-the-art detectors, both in one-stage and two-stage approaches, have struggled with detecting small objects. In this study, we evaluate current state-of-the-art models based on deep learning in both approaches such as Fast RCNN, Faster RCNN, RetinaNet, and YOLOv3. We provide a profound assessment of the advantages and limitations of models. Specifically, we run models with different backbones on different datasets with multiscale objects to find out what types of objects are suitable for each model along with backbones. Extensive empirical evaluation was conducted on 2 standard datasets, namely, a small object dataset and a filtered dataset from PASCAL VOC 2007. Finally, comparative results and analyses are then presented.


2020 ◽  
Vol 12 (1) ◽  
pp. 135
Author(s):  
Guofeng Tong ◽  
Yong Li ◽  
Dong Chen ◽  
Shaobo Xia ◽  
Jiju Peethambaran ◽  
...  

In outdoor Light Detection and Ranging (lidar)point cloud classification, finding the discriminative features for point cloud perception and scene understanding represents one of the great challenges. The features derived from defect-laden (i.e., noise, outliers, occlusions and irregularities) and raw outdoor LiDAR scans usually contain redundant and irrelevant information which adversely affects the accuracy of point semantic labeling. Moreover, point cloud features of different views have a capability to express different attributes of the same point. The simplest way of concatenating these features of different views cannot guarantee the applicability and effectiveness of the fused features. To solve these problems and achieve outdoor point cloud classification with fewer training samples, we propose a novel multi-view features and classifiers’ joint learning framework. The proposed framework uses label consistency and local distribution consistency of multi-space constraints for multi-view point cloud features extraction and classification. In the framework, the manifold learning is used to carry out subspace joint learning of multi-view features by introducing three kinds of constraints, i.e., local distribution consistency of feature space and position space, label consistency among multi-view predicted labels and ground truth, and label consistency among multi-view predicted labels. The proposed model can be well trained by fewer training points, and an iterative algorithm is used to solve the joint optimization of multi-view feature projection matrices and linear classifiers. Subsequently, the multi-view features are fused and used for point cloud classification effectively. We evaluate the proposed method on five different point cloud scenes and experimental results demonstrate that the classification performance of the proposed method is at par or outperforms the compared algorithms.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. 3122-3122
Author(s):  
Cory Batenchuk ◽  
Huan-Wei Chang ◽  
Peter Cimermancic ◽  
Eunhee S. Yi ◽  
Apaar Sadhwani ◽  
...  

3122 Background: The current standard work-up for both diagnosis and predictive biomarker testing in metastatic non-small cell lung cancer (NSCLC), can exhaust an entire tumor specimen. Notably, gene mutation panels or tumor mutation burden (TMB) testing currently requires 10 tissue slides and ranges from 10 days to 3 weeks from sample acquisition to test result. As more companion diagnostic (CDx)-restricted drugs are developed for NSCLC, rapid, tissue-sparing tests are sorely needed. We investigated whether TMB, T-effector (TEFF) gene signatures and PD-L1 status can be inferred from H&E images alone using a machine learning approach. Methods: Algorithm development included two steps: First, a neural network was trained to segment hand-annotated, pathologist-confirmed biological features from H&E images, such as tumor architecture and cell types. Second, these feature maps were fed into a classification model to predict the biomarker status. Ground truth biomarker status of the H&E-associated tumor samples came from whole exome sequencing (WES) for TMB, RNAseq for the TEFF gene signatures or reverse-phase protein array for PD-L1. Digital H&E images of NSCLC adenocarcinoma for model development were obtained from the cancer genome atlas (TCGA) and commercial sources. Results: This approach achieves > 75% accuracy in predicting TMB, TEFF and PD-L1 status, offers a way to interpret the model, and provides biological insights into the tumor-host microenvironment. Conclusions: These findings suggest that biomarker inference from H&E images is feasible, and may be sufficiently accurate to supplement or replace current tissue-based tests in a clinical setting. Our approach utilizes biological features for inference, and is thus robust, interpretable, and readily verifiable by pathologists. Finally, biomarker status inference from a single H&E image may enable testing in patients whose tumor tissue has been exhausted, spare further tissue use, and return test results within hours to enable rapid treatment decision-making to maximize patient benefit.


Sign in / Sign up

Export Citation Format

Share Document