scholarly journals Automatic glaucoma detection based on transfer induced attention network

2021 ◽  
Vol 20 (1) ◽  
Author(s):  
Xi Xu ◽  
Yu Guan ◽  
Jianqiang Li ◽  
Zerui Ma ◽  
Li Zhang ◽  
...  

Abstract Background Glaucoma is one of the causes that leads to irreversible vision loss. Automatic glaucoma detection based on fundus images has been widely studied in recent years. However, existing methods mainly depend on a considerable amount of labeled data to train the model, which is a serious constraint for real-world glaucoma detection. Methods In this paper, we introduce a transfer learning technique that leverages the fundus feature learned from similar ophthalmic data to facilitate diagnosing glaucoma. Specifically, a Transfer Induced Attention Network (TIA-Net) for automatic glaucoma detection is proposed, which extracts the discriminative features that fully characterize the glaucoma-related deep patterns under limited supervision. By integrating the channel-wise attention and maximum mean discrepancy, our proposed method can achieve a smooth transition between general and specific features, thus enhancing the feature transferability. Results To delimit the boundary between general and specific features precisely, we first investigate how many layers should be transferred during training with the source dataset network. Next, we compare our proposed model to previously mentioned methods and analyze their performance. Finally, with the advantages of the model design, we provide a transparent and interpretable transferring visualization by highlighting the key specific features in each fundus image. We evaluate the effectiveness of TIA-Net on two real clinical datasets and achieve an accuracy of 85.7%/76.6%, sensitivity of 84.9%/75.3%, specificity of 86.9%/77.2%, and AUC of 0.929 and 0.835, far better than other state-of-the-art methods. Conclusion Different from previous studies applied classic CNN models to transfer features from the non-medical dataset, we leverage knowledge from the similar ophthalmic dataset and propose an attention-based deep transfer learning model for the glaucoma diagnosis task. Extensive experiments on two real clinical datasets show that our TIA-Net outperforms other state-of-the-art methods, and meanwhile, it has certain medical value and significance for the early diagnosis of other medical tasks.

Author(s):  
Partha Sarathi Mangipudi ◽  
Hari Mohan Pandey ◽  
Ankur Choudhary

AbstractGlaucoma is an ailment causing permanent vision loss but can be prevented through the early detection. Optic disc to cup ratio is one of the key factors for glaucoma diagnosis. But accurate segmentation of disc and cup is still a challenge. To mitigate this challenge, an effective system for optic disc and cup segmentation using deep learning architecture is presented in this paper. Modified Groundtruth is utilized to train the proposed model. It works as fused segmentation marking by multiple experts that helps in improving the performance of the system. Extensive computer simulations are conducted to test the efficiency of the proposed system. For the implementation three standard benchmark datasets such as DRISHTI-GS, DRIONS-DB and RIM-ONE v3 are used. The performance of the proposed system is validated against the state-of-the-art methods. Results indicate an average overlapping score of 96.62%, 96.15% and 98.42% respectively for optic disc segmentation and an average overlapping score of 94.41% is achieved on DRISHTI-GS which is significant for optic cup segmentation.


2021 ◽  
Vol 11 (8) ◽  
pp. 3636
Author(s):  
Faria Zarin Subah ◽  
Kaushik Deb ◽  
Pranab Kumar Dhar ◽  
Takeshi Koshiba

Autism spectrum disorder (ASD) is a complex and degenerative neuro-developmental disorder. Most of the existing methods utilize functional magnetic resonance imaging (fMRI) to detect ASD with a very limited dataset which provides high accuracy but results in poor generalization. To overcome this limitation and to enhance the performance of the automated autism diagnosis model, in this paper, we propose an ASD detection model using functional connectivity features of resting-state fMRI data. Our proposed model utilizes two commonly used brain atlases, Craddock 200 (CC200) and Automated Anatomical Labelling (AAL), and two rarely used atlases Bootstrap Analysis of Stable Clusters (BASC) and Power. A deep neural network (DNN) classifier is used to perform the classification task. Simulation results indicate that the proposed model outperforms state-of-the-art methods in terms of accuracy. The mean accuracy of the proposed model was 88%, whereas the mean accuracy of the state-of-the-art methods ranged from 67% to 85%. The sensitivity, F1-score, and area under receiver operating characteristic curve (AUC) score of the proposed model were 90%, 87%, and 96%, respectively. Comparative analysis on various scoring strategies show the superiority of BASC atlas over other aforementioned atlases in classifying ASD and control.


Sensors ◽  
2018 ◽  
Vol 18 (12) ◽  
pp. 4109 ◽  
Author(s):  
Hai Wang ◽  
Yijie Yu ◽  
Yingfeng Cai ◽  
Long Chen ◽  
Xiaobo Chen

Vehicle detection is a key component of environmental sensing systems for Intelligent Vehicles (IVs). The traditional shallow model and offline learning-based vehicle detection method are not able to satisfy the real-world challenges of environmental complexity and scene dynamics. Focusing on these problems, this work proposes a vehicle detection algorithm based on a multiple feature subspace distribution deep model with online transfer learning. Based on the multiple feature subspace distribution hypothesis, a deep model is established in which multiple Restricted Boltzmann Machines (RBMs) construct the lower layers and a Deep Belief Network (DBN) composes the superstructure. For this deep model, an unsupervised feature extraction method is applied, which is based on sparse constraints. Then, a transfer learning method with online sample generation is proposed based on the deep model. Finally, the entire classifier is retrained online with supervised learning. The experiment is actuated using the KITTI road image datasets. The performance of the proposed method is compared with many state-of-the-art methods and it is demonstrated that the proposed deep transfer learning-based algorithm outperformed existing state-of-the-art methods.


Mathematics ◽  
2021 ◽  
Vol 9 (22) ◽  
pp. 2873
Author(s):  
Anusha Khan ◽  
Allah Bux Sargano ◽  
Zulfiqar Habib

Video super-resolution (VSR) aims at generating high-resolution (HR) video frames with plausible and temporally consistent details using their low-resolution (LR) counterparts, and neighboring frames. The key challenge for VSR lies in the effective exploitation of intra-frame spatial relation and temporal dependency between consecutive frames. Many existing techniques utilize spatial and temporal information separately and compensate motion via alignment. These methods cannot fully exploit the spatio-temporal information that significantly affects the quality of resultant HR videos. In this work, a novel deformable spatio-temporal convolutional residual network (DSTnet) is proposed to overcome the issues of separate motion estimation and compensation methods for VSR. The proposed framework consists of 3D convolutional residual blocks decomposed into spatial and temporal (2+1) D streams. This decomposition can simultaneously utilize input video’s spatial and temporal features without a separate motion estimation and compensation module. Furthermore, the deformable convolution layers have been used in the proposed model that enhances its motion-awareness capability. Our contribution is twofold; firstly, the proposed approach can overcome the challenges in modeling complex motions by efficiently using spatio-temporal information. Secondly, the proposed model has fewer parameters to learn than state-of-the-art methods, making it a computationally lean and efficient framework for VSR. Experiments are conducted on a benchmark Vid4 dataset to evaluate the efficacy of the proposed approach. The results demonstrate that the proposed approach achieves superior quantitative and qualitative performance compared to the state-of-the-art methods.


2020 ◽  
Vol 34 (07) ◽  
pp. 12144-12151
Author(s):  
Guan-An Wang ◽  
Tianzhu Zhang ◽  
Yang Yang ◽  
Jian Cheng ◽  
Jianlong Chang ◽  
...  

RGB-Infrared (IR) person re-identification is very challenging due to the large cross-modality variations between RGB and IR images. The key solution is to learn aligned features to the bridge RGB and IR modalities. However, due to the lack of correspondence labels between every pair of RGB and IR images, most methods try to alleviate the variations with set-level alignment by reducing the distance between the entire RGB and IR sets. However, this set-level alignment may lead to misalignment of some instances, which limits the performance for RGB-IR Re-ID. Different from existing methods, in this paper, we propose to generate cross-modality paired-images and perform both global set-level and fine-grained instance-level alignments. Our proposed method enjoys several merits. First, our method can perform set-level alignment by disentangling modality-specific and modality-invariant features. Compared with conventional methods, ours can explicitly remove the modality-specific features and the modality variation can be better reduced. Second, given cross-modality unpaired-images of a person, our method can generate cross-modality paired images from exchanged images. With them, we can directly perform instance-level alignment by minimizing distances of every pair of images. Extensive experimental results on two standard benchmarks demonstrate that the proposed model favourably against state-of-the-art methods. Especially, on SYSU-MM01 dataset, our model can achieve a gain of 9.2% and 7.7% in terms of Rank-1 and mAP. Code is available at https://github.com/wangguanan/JSIA-ReID.


2021 ◽  
Vol 11 (5) ◽  
pp. 2010
Author(s):  
Wei Huang ◽  
Yongying Li ◽  
Kunlin Zhang ◽  
Xiaoyu Hou ◽  
Jihui Xu ◽  
...  

The multi-scale lightweight network and attention mechanism recently attracted attention in person re-identification (ReID) as it is capable of improving the model’s ability to process information with low computational cost. However, state-of-the-art methods mostly concentrate on the spatial attention and big block channel attention model with high computational complexity while rarely investigate the inside block attention with the lightweight network, which cannot meet the requirements of high efficiency and low latency in the actual ReID system. In this paper, a novel lightweight person ReID model is designed firstly, called Multi-Scale Focusing Attention Network (MSFANet), to capture robust and elaborate multi-scale ReID features, which have fewer float-computing and higher performance. MSFANet is achieved by designing a multi-branch depthwise separable convolution module, combining with an inside block attention module, to extract and fuse multi-scale features independently. In addition, we design a multi-stage backbone with the ‘1-2-3’ form, which can significantly reduce computational cost. Furthermore, the MSFANet is exceptionally lightweight and can be embedded in a ReID framework flexibly. Secondly, an efficient loss function combining softmax loss and TriHard loss, based on the proposed optimal data augmentation method, is designed for faster convergence and better model generalization ability. Finally, the experimental results on two big ReID datasets (Market1501 and DukeMTMC) and two small ReID datasets (VIPeR, GRID) show that the proposed MSFANet achieves the best mAP performance and the lowest computational complexity compared with state-of-the-art methods, which are increasing by 2.3% and decreasing by 18.2%, respectively.


2020 ◽  
Vol 29 (15) ◽  
pp. 2050250
Author(s):  
Xiongfei Liu ◽  
Bengao Li ◽  
Xin Chen ◽  
Haiyan Zhang ◽  
Shu Zhan

This paper proposes a novel method for person image generation with arbitrary target pose. Given a person image and an arbitrary target pose, our proposed model can synthesize images with the same person but different poses. The Generative Adversarial Networks (GANs) are the major part of the proposed model. Different from the traditional GANs, we add attention mechanism to the generator in order to generate realistic-looking images, we also use content reconstruction with a pretrained VGG16 Net to keep the content consistency between generated images and target images. Furthermore, we test our model on DeepFashion and Market-1501 datasets. The experimental results show that the proposed network performs favorably against state-of-the-art methods.


Author(s):  
Akshi Kumar ◽  
Victor Hugo C. Albuquerque

Sentiment analysis on social media relies on comprehending the natural language and using a robust machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. The cultural miscellanies, geographically limited trending topic hash-tags, access to aboriginal language keyboards, and conversational comfort in native language compound the linguistic challenges of sentiment analysis. This research evaluates the performance of cross-lingual contextual word embeddings and zero-shot transfer learning in projecting predictions from resource-rich English to resource-poor Hindi language. The cross-lingual XLM-RoBERTa classification model is trained and fine-tuned using the English language Benchmark SemEval 2017 dataset Task 4 A and subsequently zero-shot transfer learning is used to evaluate the classification model on two Hindi sentence-level sentiment analysis datasets, namely, IITP-Movie and IITP-Product review datasets. The proposed model compares favorably to state-of-the-art approaches and gives an effective solution to sentence-level (tweet-level) analysis of sentiments in a resource-poor scenario. The proposed model compares favorably to state-of-the-art approaches and achieves an average performance accuracy of 60.93 on both the Hindi datasets.


2020 ◽  
Vol 27 ◽  
Author(s):  
Ashish Kumar Sharma ◽  
Rajeev Srivastava

Background: The prediction of a protein's secondary structure from its amino acid sequence is an essential step towards predicting its 3-D structure. The prediction performance improves by incorporating homologous multiple sequence alignment information. Since homologous details not available for all proteins. Therefore, it is necessary to predict the protein secondary structure from single sequences. Objective and Methods: Protein secondary structure predicted from their primary sequences using n-gram word embedding and deep recurrent neural network. Protein secondary structure depends on local and long-range neighbor residues in primary sequences. In the proposed work, the local contextual information of amino acid residues captures variable-length character n-gram words. An embedding vector represents these variable-length character n-gram words. Further, the bidirectional long short-term memory (Bi-LSTM) model is used to capture the long-range contexts by extracting the past and future residues information in primary sequences. Results: The proposed model evaluates on three public datasets ss.txt, RS126, and CASP9. The model shows the Q3 accuracy of 92.57%, 86.48%, and 89.66% for ss.txt, RS126, and CASP9. Conclusion: The proposed model performance compares with state-of-the-art methods available in the literature. After a comparative analysis, it observed that the proposed model performs better than state-of-the-art methods.


2019 ◽  
Vol 11 (24) ◽  
pp. 2921 ◽  
Author(s):  
Jingyu Li ◽  
Ying Li ◽  
Yayuan Xiao ◽  
Yunpeng Bai

In order to remove speckle noise from original synthetic aperture radar (SAR) images effectively and efficiently, this paper proposes a hybrid dilated residual attention network (HDRANet) with residual learning for SAR despeckling. Firstly, HDRANet employs the hybrid dilated convolution (HDC) in lightweight network architecture to enlarge the receptive field and aggregate global information. Then, a simple yet effective attention module, convolutional block attention module (CBAM), is integrated into the proposed model to constitute a residual HDC attention block through skip connection, which further enhances representation power and performance of the model. Extensive experimental results on the synthetic and real SAR images demonstrate the superior performance of HDRANet over the state-of-the-art methods in terms of quantitative metrics and visual quality.


Sign in / Sign up

Export Citation Format

Share Document