scholarly journals An Improved 2D U-Net Model Integrated Squeeze-and-Excitation Layer for Prostate Cancer Segmentation

2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Bingshuai Liu ◽  
Jiawei Zheng ◽  
Hongwei Zhang ◽  
Peijie Chen ◽  
Shipeng Li ◽  
...  

In this paper, we proposed an improved 2D U-Net model integrated squeeze-and-excitation layer for prostate cancer segmentation. The proposed model combined a more complex 2D U-Net model and squeeze-and-excitation technique. The model consisted of an encoder stage and a decoder stage. The encoder stage aims to extract features of the input, which contains CONV blocks, SE layers, and max-pooling layers for improving the feature extraction capability of the model. The decoder aims to map the extracted features to the original image with CONV blocks, SE layers, and upsampling layers. The SE layer is implemented to learn more global and local features. Experiments on the public dataset PROMISE12 have demonstrated that the proposed model could achieve state-of-the-art segmentation performance compared with other traditional methods.

2021 ◽  
Vol 13 (16) ◽  
pp. 3340
Author(s):  
Wei Wang ◽  
Jun Liu ◽  
Chenjie Wang ◽  
Bin Luo ◽  
Cheng Zhang

Self-driving cars have experienced rapid development in the past few years, and Simultaneous Localization and Mapping (SLAM) is considered to be their basic capabilities. In this article, we propose a direct vision LiDAR fusion SLAM framework that consists of three modules. Firstly, a two-staged direct visual odometry module, which consists of a frame-to-frame tracking step, and an improved sliding window based thinning step, is proposed to estimate the accurate pose of the camera while maintaining efficiency. Secondly, every time a keyframe is generated, a dynamic objects considered LiDAR mapping module is utilized to refine the pose of the keyframe to obtain higher positioning accuracy and better robustness. Finally, a Parallel Global and Local Search Loop Closure Detection (PGLS-LCD) module that combines visual Bag of Words (BoW) and LiDAR-Iris feature is applied for place recognition to correct the accumulated drift and maintain a globally consistent map. We conducted a large number of experiments on the public dataset and our mobile robot dataset to verify the effectiveness of each module in our framework. Experimental results show that the proposed algorithm achieves more accurate pose estimation than the state-of-the-art methods.


2017 ◽  
Vol 9 (3) ◽  
pp. 58-72 ◽  
Author(s):  
Guangyu Wang ◽  
Xiaotian Wu ◽  
WeiQi Yan

The security issue of currency has attracted awareness from the public. De-spite the development of applying various anti-counterfeit methods on currency notes, cheaters are able to produce illegal copies and circulate them in market without being detected. By reviewing related work in currency security, the focus of this paper is on conducting a comparative study of feature extraction and classification algorithms of currency notes authentication. We extract various computational features from the dataset consisting of US dollar (USD), Chinese Yuan (CNY) and New Zealand Dollar (NZD) and apply the classification algorithms to currency identification. Our contributions are to find and implement various algorithms from the existing literatures and choose the best approaches for use.


2018 ◽  
pp. 252-269
Author(s):  
Guangyu Wang ◽  
Xiaotian Wu ◽  
WeiQi Yan

The security issue of currency has attracted awareness from the public. De-spite the development of applying various anti-counterfeit methods on currency notes, cheaters are able to produce illegal copies and circulate them in market without being detected. By reviewing related work in currency security, the focus of this paper is on conducting a comparative study of feature extraction and classification algorithms of currency notes authentication. We extract various computational features from the dataset consisting of US dollar (USD), Chinese Yuan (CNY) and New Zealand Dollar (NZD) and apply the classification algorithms to currency identification. Our contributions are to find and implement various algorithms from the existing literatures and choose the best approaches for use.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1593 ◽  
Author(s):  
Yanlei Gu ◽  
Huiyang Zhang ◽  
Shunsuke Kamijo

Image based human behavior and activity understanding has been a hot topic in the field of computer vision and multimedia. As an important part, skeleton estimation, which is also called pose estimation, has attracted lots of interests. For pose estimation, most of the deep learning approaches mainly focus on the joint feature. However, the joint feature is not sufficient, especially when the image includes multi-person and the pose is occluded or not fully visible. This paper proposes a novel multi-task framework for the multi-person pose estimation. The proposed framework is developed based on Mask Region-based Convolutional Neural Networks (R-CNN) and extended to integrate the joint feature, body boundary, body orientation and occlusion condition together. In order to further improve the performance of the multi-person pose estimation, this paper proposes to organize the different information in serial multi-task models instead of the widely used parallel multi-task network. The proposed models are trained on the public dataset Common Objects in Context (COCO), which is further augmented by ground truths of body orientation and mutual-occlusion mask. Experiments demonstrate the performance of the proposed method for multi-person pose estimation and body orientation estimation. The proposed method can detect 84.6% of the Percentage of Correct Keypoints (PCK) and has an 83.7% Correct Detection Rate (CDR). Comparisons further illustrate the proposed model can reduce the over-detection compared with other methods.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Leilei Rong ◽  
Yan Xu ◽  
Xiaolei Zhou ◽  
Lisu Han ◽  
Linghui Li ◽  
...  

AbstractVehicle re-identification (re-id) aims to solve the problems of matching and identifying the same vehicle under the scenes across multiple surveillance cameras. For public security and intelligent transportation system (ITS), it is extremely important to locate the target vehicle quickly and accurately in the massive vehicle database. However, re-id of the target vehicle is very challenging due to many factors, such as the orientation variations, illumination changes, occlusion, low resolution, rapid vehicle movement, and amounts of similar vehicle models. In order to resolve the difficulties and enhance the accuracy for vehicle re-id, in this work, we propose an improved multi-branch network in which global–local feature fusion, channel attention mechanism and weighted local feature are comprehensively combined. Firstly, the fusion of global and local features is adopted to obtain more information of the vehicle and enhance the learning ability of the model; Secondly, the channel attention module in the feature extraction branch is embedded to extract the personalized features of the targeting vehicle; Finally, the background and noise information on feature extraction is controlled by weighted local feature. The results of comprehensive experiments on the mainstream evaluation datasets including VeRi-776, VRIC, and VehicleID indicate that our method can effectively improve the accuracy of vehicle re-identification and is superior to the state-of-the-art methods.


2021 ◽  
Author(s):  
Leilei Rong ◽  
Yan Xu ◽  
Xiaolei Zhou ◽  
Lisu Han ◽  
Linghui Li ◽  
...  

Abstract Vehicle re-identification (Re-ID) aims to solve the problem of matching and identifying the same vehicles under the scene of cross multiple surveillance cameras. Finding the target vehicle quickly and accurately in the massive vehicle database is extremely important for public security, traffic surveillance and applications on smart city. However, it is very challenging due to the orientation variations, illumination changes, occlusion, low resolution, rapid vehicle movement, and amounts of similar vehicle models. In order to overcome these problems and improve the accuracy of vehicle re-identification, a multi-branches network is proposed, which is integrated by global-local feature fusion, channel attention mechanism, and weighted local feature. First, the fusion of global and local features is to obtain more complete information of the vehicle and enhance the learning ability of the model; second, the purpose of embedding the channel attention module in the feature extraction branch is to extract the personalized feature of the vehicle; finally, the influence of sky area and noise information on feature extraction is weakened by weighted local feature. The comprehensive experiments implemented on the mainstream evaluation datasets including VeRi-776, VRIC, and VehicleID indicate that our method can effectively improve the accuracy of vehicle re-identification and is superior to the state-of-the-art methods.


Author(s):  
Jianjun Wu ◽  
Ying Sha ◽  
Bo Jiang ◽  
Jianlong Tan

Structural representations of user social influence are critical for a variety of applications such as viral marketing and recommendation products. However, existing studies only focus on capturing and preserving the structure of relations, and ignore the diversity of influence relations patterns among users. To this end, we propose a deep structural influence learning model to learn social influence structure via mining rich features of each user, and fuse information from the aligned selfnetwork component for preserving global and local structure of the influence relations among users. Experiments on two real-world datasets demonstrate that the proposed model outperforms the state-of-the-art algorithms for learning rich representations in multi-label classification task.


2014 ◽  
Vol 24 (03) ◽  
pp. 1430010 ◽  
Author(s):  
JING HUO ◽  
YANG GAO ◽  
WANQI YANG ◽  
HUJUN YIN

In this paper, a novel method termed Multi-Instance Dictionary Learning (MIDL) is presented for detecting abnormal events in crowded video scenes. With respect to multi-instance learning, each event (video clip) in videos is modeled as a bag containing several sub-events (local observations); while each sub-event is regarded as an instance. The MIDL jointly learns a dictionary for sparse representations of sub-events (instances) and multi-instance classifiers for classifying events into normal or abnormal. We further adopt three different multi-instance models, yielding the Max-Pooling-based MIDL (MP-MIDL), Instance-based MIDL (Inst-MIDL) and Bag-based MIDL (Bag-MIDL), for detecting both global and local abnormalities. The MP-MIDL classifies observed events by using bag features extracted via max-pooling over sparse representations. The Inst-MIDL and Bag-MIDL classify observed events by the predicted values of corresponding instances. The proposed MIDL is evaluated and compared with the state-of-the-art methods for abnormal event detection on the UMN (for global abnormalities) and the UCSD (for local abnormalities) datasets and results show that the proposed MP-MIDL and Bag-MIDL achieve either comparable or improved detection performances. The proposed MIDL method is also compared with other multi-instance learning methods on the task and superior results are obtained by the MP-MIDL scheme.


2020 ◽  
Vol 34 (07) ◽  
pp. 11077-11084
Author(s):  
Yung-Han Huang ◽  
Kuang-Jui Hsu ◽  
Shyh-Kang Jeng ◽  
Yen-Yu Lin

Video re-localization aims to localize a sub-sequence, called target segment, in an untrimmed reference video that is similar to a given query video. In this work, we propose an attention-based model to accomplish this task in a weakly supervised setting. Namely, we derive our CNN-based model without using the annotated locations of the target segments in reference videos. Our model contains three modules. First, it employs a pre-trained C3D network for feature extraction. Second, we design an attention mechanism to extract multiscale temporal features, which are then used to estimate the similarity between the query video and a reference video. Third, a localization layer detects where the target segment is in the reference video by determining whether each frame in the reference video is consistent with the query video. The resultant CNN model is derived based on the proposed co-attention loss which discriminatively separates the target segment from the reference video. This loss maximizes the similarity between the query video and the target segment while minimizing the similarity between the target segment and the rest of the reference video. Our model can be modified to fully supervised re-localization. Our method is evaluated on a public dataset and achieves the state-of-the-art performance under both weakly supervised and fully supervised settings.


Symmetry ◽  
2020 ◽  
Vol 12 (3) ◽  
pp. 358
Author(s):  
Miftah Bedru Jamal ◽  
Jiang Zhengang ◽  
Fang Ming

Person re-identification is the task of matching pedestrian images across a network of non-overlapping camera views. It poses aggregated challenges resulted from random human pose, clutter from the background, illumination variations, and other factors. There has been a vast number of studies in recent years with promising success. However, key challenges have not been adequately addressed and continue to result in sub-optimal performance. Attention-based person re-identification gains more popularity in identifying discriminatory features from person images. Its potential in terms of extracting features common to a pair of person images across the feature extraction pipeline has not been be fully exploited. In this paper, we propose a novel attention-based Siamese network driven by a mutual-attention module decomposed into spatial and channel components. The proposed mutual-attention module not only leads feature extraction to the discriminative part of individual images, but also fuses mutual features symmetrically across pairs of person images to get informative regions common to both input images. Our model simultaneously learns feature embedding for discriminative cues and the similarity measure. The proposed model is optimized with multi-task loss, namely classification and verification loss. It is further optimized by a learnable mutual-attention module to facilitate an efficient and adaptive learning. The proposed model is thoroughly evaluated on extensively used large-scale datasets, Market-1501 and Duke-MTMC-ReID. Our experimental results show competitive results with the state-of-the-art works and the effectiveness of the mutual-attention module.


Sign in / Sign up

Export Citation Format

Share Document