scholarly journals Automatic Diabetic Retinopathy Grading via Self-Knowledge Distillation

Electronics ◽  
2020 ◽  
Vol 9 (9) ◽  
pp. 1337
Author(s):  
Ling Luo ◽  
Dingyu Xue ◽  
Xinglong Feng

Diabetic retinopathy (DR) is a common fundus disease that leads to irreversible blindness, which plagues the working-age population. Automatic medical imaging diagnosis provides a non-invasive method to assist ophthalmologists in timely screening of suspected DR cases, which prevents its further deterioration. However, the state-of-the-art deep-learning-based methods generally have a large amount of model parameters, which makes large-scale clinical deployment a time-consuming task. Moreover, the severity of DR is associated with lesions, and it is difficult for the model to focus on these regions. In this paper, we propose a novel deep-learning technique for grading DR with only image-level supervision. Specifically, we first customize the model with the help of self-knowledge distillation to achieve a trade-off between model performance and time complexity. Secondly, CAM-Attention is used to allow the network to focus on discriminative zone, e.g., microaneurysms, soft/hard exudates, etc.. Considering that directly attaching a classifier after the Side branch will disrupt the hierarchical nature of convolutional neural networks, a Mimicking Module is employed that allows the Side branch to actively mimic the main branch structure. Extensive experiments are conducted on two benchmark datasets, with an AUC of 0.965 and an accuracy of 92.9% for the Messidor dataset and 67.96% accuracy achieved for the challenging IDRID dataset, which demonstrates the superior performance of our proposed method.

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Minyu Shi ◽  
Yongting Zhang ◽  
Huanhuan Wang ◽  
Junfeng Hu ◽  
Xiang Wu

The innovation of the deep learning modeling scheme plays an important role in promoting the research of complex problems handled with artificial intelligence in smart cities and the development of the next generation of information technology. With the widespread use of smart interactive devices and systems, the exponential growth of data volume and the complex modeling requirements increase the difficulty of deep learning modeling, and the classical centralized deep learning modeling scheme has encountered bottlenecks in the improvement of model performance and the diversification of smart application scenarios. The parallel processing system in deep learning links the virtual information space with the physical world, although the distributed deep learning research has become a crucial concern with its unique advantages in training efficiency, and improving the availability of trained models and preventing privacy disclosure are still the main challenges faced by related research. To address these above issues in distributed deep learning, this research developed a clonal selective optimization system based on the federated learning framework for the model training process involving large-scale data. This system adopts the heuristic clonal selective strategy in local model optimization and optimizes the effect of federated training. First of all, this process enhances the adaptability and robustness of the federated learning scheme and improves the modeling performance and training efficiency. Furthermore, this research attempts to improve the privacy security defense capability of the federated learning scheme for big data through differential privacy preprocessing. The simulation results show that the proposed clonal selection optimization system based on federated learning has significant optimization ability on model basic performance, stability, and privacy.


2020 ◽  
Author(s):  
Yuan Yuan ◽  
Lei Lin

Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data is scarce. To address this problem, we propose a novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pre-training is completed, the pre-trained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed method, leading to a classification accuracy increment up to 1.91% to 6.69%. <div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>


2021 ◽  
Author(s):  
Yingruo Fan ◽  
Jacqueline CK Lam ◽  
Victor On Kwok Li

<div> <div> <div> <p>Facial emotions are expressed through a combination of facial muscle movements, namely, the Facial Action Units (FAUs). FAU intensity estimation aims to estimate the intensity of a set of structurally dependent FAUs. Contrary to the existing works that focus on improving FAU intensity estimation, this study investigates how knowledge distillation (KD) incorporated into a training model can improve FAU intensity estimation efficiency while achieving the same level of performance. Given the intrinsic structural characteristics of FAU, it is desirable to distill deep structural relationships, namely, DSR-FAU, using heatmap regression. Our methodology is as follows: First, a feature map-level distillation loss was applied to ensure that the student network and the teacher network share similar feature distributions. Second, the region-wise and channel-wise relationship distillation loss functions were introduced to penalize the difference in structural relationships. Specifically, the region-wise relationship can be represented by the structural correlations across the facial features, whereas the channel-wise relationship is represented by the implicit FAU co-occurrence dependencies. Third, we compared the model performance of DSR-FAU with the state-of-the-art models, based on two benchmarking datasets. Our proposed model achieves comparable performance with other baseline models, though requiring a lower number of model parameters and lower computation complexities. </p> </div> </div> </div>


2020 ◽  
Vol 34 (04) ◽  
pp. 6917-6924 ◽  
Author(s):  
Ya Zhao ◽  
Rui Xu ◽  
Xinchao Wang ◽  
Peng Hou ◽  
Haihong Tang ◽  
...  

Lip reading has witnessed unparalleled development in recent years thanks to deep learning and the availability of large-scale datasets. Despite the encouraging results achieved, the performance of lip reading, unfortunately, remains inferior to the one of its counterpart speech recognition, due to the ambiguous nature of its actuations that makes it challenging to extract discriminant features from the lip movement videos. In this paper, we propose a new method, termed as Lip by Speech (LIBS), of which the goal is to strengthen lip reading by learning from speech recognizers. The rationale behind our approach is that the features extracted from speech recognizers may provide complementary and discriminant clues, which are formidable to be obtained from the subtle movements of the lips, and consequently facilitate the training of lip readers. This is achieved, specifically, by distilling multi-granularity knowledge from speech recognizers to lip readers. To conduct this cross-modal knowledge distillation, we utilize an efficacious alignment scheme to handle the inconsistent lengths of the audios and videos, as well as an innovative filtering strategy to refine the speech recognizer's prediction. The proposed method achieves the new state-of-the-art performance on the CMLR and LRS2 datasets, outperforming the baseline by a margin of 7.66% and 2.75% in character error rate, respectively.


2021 ◽  
Author(s):  
Jingyi Wei ◽  
Peter Lotfy ◽  
Kian Faizi ◽  
Hugo Kitano ◽  
Patrick D. Hsu ◽  
...  

AbstractTranscriptome engineering requires flexible RNA-targeting technologies that can perturb mammalian transcripts in a robust and scalable manner. CRISPR systems that natively target RNA molecules, such as Cas13 enzymes, are enabling rapid progress in the investigation of RNA biology and advancement of RNA therapeutics. Here, we sought to develop a Cas13 platform for high-throughput phenotypic screening and elucidate the design principles underpinning its RNA targeting efficiency. We employed the RfxCas13d (CasRx) system in a positive selection screen by tiling 55 known essential genes with single nucleotide resolution. Leveraging this dataset of over 127,000 guide RNAs, we systematically compared a series of linear regression and machine learning algorithms to train a convolutional neural network (CNN) model that is able to robustly predict guide RNA performance based on guide sequence alone. We further incorporated secondary features including secondary structure, free energy, target site position, and target isoform percent. To evaluate model performance, we conducted orthogonal screens via cell surface protein knockdown. The final CNN model is able to predict highly effective guide RNAs (gRNAs) within each transcript with >90% accuracy in this independent test set. To provide user interpretability, we evaluate feature contributions using both integrated gradients and SHapley Additive exPlanations (SHAP). We identify a specific sequence motif at guide position 15-24 along with selected secondary features to be predictive of highly efficient guides. Taken together, we derive Cas13d guide design rules from large-scale screen data, release a guide design tool (http://RNAtargeting.org) to advance the RNA targeting toolbox, and describe a path for systematic development of deep learning models to predict CRISPR activity.


2020 ◽  
Author(s):  
Yuan Yuan ◽  
Lei Lin

<div>Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data is scarce. To address this problem, we propose a novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pre-training is completed, the pre-trained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed method, leading to a classification accuracy increment up to 2.38% to 5.27%. The code and the pre-trained model will be available at https://github.com/linlei1214/SITS-BERT upon publication.</div><div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>


Symmetry ◽  
2020 ◽  
Vol 12 (5) ◽  
pp. 705
Author(s):  
Po-Chou Shih ◽  
Chun-Chin Hsu ◽  
Fang-Chih Tien

Silicon wafer is the most crucial material in the semiconductor manufacturing industry. Owing to limited resources, the reclamation of monitor and dummy wafers for reuse can dramatically lower the cost, and become a competitive edge in this industry. However, defects such as void, scratches, particles, and contamination are found on the surfaces of the reclaimed wafers. Most of the reclaimed wafers with the asymmetric distribution of the defects, known as the “good (G)” reclaimed wafers, can be re-polished if their defects are not irreversible and if their thicknesses are sufficient for re-polishing. Currently, the “no good (NG)” reclaimed wafers must be first screened by experienced human inspectors to determine their re-usability through defect mapping. This screening task is tedious, time-consuming, and unreliable. This study presents a deep-learning-based reclaimed wafers defect classification approach. Three neural networks, multilayer perceptron (MLP), convolutional neural network (CNN) and Residual Network (ResNet), are adopted and compared for classification. These networks analyze the pattern of defect mapping and determine not only the reclaimed wafers are suitable for re-polishing but also where the defect categories belong. The open source TensorFlow library was used to train the MLP, CNN, and ResNet networks using collected wafer images as input data. Based on the experimental results, we found that the system applying CNN networks with a proper design of kernels and structures gave fast and superior performance in identifying defective wafers owing to its deep learning capability, and the ResNet averagely exhibited excellent accuracy, while the large-scale MLP networks also acquired good results with proper network structures.


2021 ◽  
Vol 17 (11) ◽  
pp. 155014772110570
Author(s):  
Yiting Li ◽  
Liyuan Sun ◽  
Jianping Gou ◽  
Lan Du ◽  
Weihua Ou

Deep neural networks have achieved a great success in a variety of applications, such as self-driving cars and intelligent robotics. Meanwhile, knowledge distillation has received increasing attention as an effective model compression technique for training very efficient deep models. The performance of the student network obtained through knowledge distillation heavily depends on whether the transfer of the teacher’s knowledge can effectively guide the student training. However, most existing knowledge distillation schemes require a large teacher network pre-trained on large-scale data sets, which can increase the difficulty of knowledge distillation in different applications. In this article, we propose a feature fusion-based collaborative learning for knowledge distillation. Specifically, during knowledge distillation, it enables networks to learn from each other using the feature/response-based knowledge in different network layers. We concatenate the features learned by the teacher and the student networks to obtain a more representative feature map for knowledge transfer. In addition, we also introduce a network regularization method to further improve the model performance by providing a positive knowledge during training. Experiments and ablation studies on two widely used data sets demonstrate that the proposed method, feature fusion-based collaborative learning, significantly outperforms recent state-of-the-art knowledge distillation methods.


Author(s):  
Raquel Rodríguez-Pérez ◽  
Jürgen Bajorath

AbstractMachine learning (ML) enables modeling of quantitative structure–activity relationships (QSAR) and compound potency predictions. Recently, multi-target QSAR models have been gaining increasing attention. Simultaneous compound potency predictions for multiple targets can be carried out using ensembles of independently derived target-based QSAR models or in a more integrated and advanced manner using multi-target deep neural networks (MT-DNNs). Herein, single-target and multi-target ML models were systematically compared on a large scale in compound potency value predictions for 270 human targets. By design, this large-magnitude evaluation has been a special feature of our study. To these ends, MT-DNN, single-target DNN (ST-DNN), support vector regression (SVR), and random forest regression (RFR) models were implemented. Different test systems were defined to benchmark these ML methods under conditions of varying complexity. Source compounds were divided into training and test sets in a compound- or analog series-based manner taking target information into account. Data partitioning approaches used for model training and evaluation were shown to influence the relative performance of ML methods, especially for the most challenging compound data sets. For example, the performance of MT-DNNs with per-target models yielded superior performance compared to single-target models. For a test compound or its analogs, the availability of potency measurements for multiple targets affected model performance, revealing the influence of ML synergies.


2021 ◽  
Author(s):  
Hieu H. Pham ◽  
Dung V. Do ◽  
Ha Q. Nguyen

AbstractX-ray imaging in Digital Imaging and Communications in Medicine (DICOM) format is the most commonly used imaging modality in clinical practice, resulting in vast, non-normalized databases. This leads to an obstacle in deploying artificial intelligence (AI) solutions for analyzing medical images, which often requires identifying the right body part before feeding the image into a specified AI model. This challenge raises the need for an automated and efficient approach to classifying body parts from X-ray scans. Unfortunately, to the best of our knowledge, there is no open tool or framework for this task to date. To fill this lack, we introduce a DICOM Imaging Router that deploys deep convolutional neural networks (CNNs) for categorizing unknown DICOM X-ray images into five anatomical groups: abdominal, adult chest, pediatric chest, spine, and others. To this end, a large-scale X-ray dataset consisting of 16,093 images has been collected and manually classified. We then trained a set of state-of-the-art deep CNNs using a training set of 11,263 images. These networks were then evaluated on an independent test set of 2,419 images and showed superior performance in classifying the body parts. Specifically, our best performing model (i.e., MobileNet-V1) achieved a recall of 0.982 (95% CI, 0.977– 0.988), a precision of 0.985 (95% CI, 0.975–0.989) and a F1-score of 0.981 (95% CI, 0.976–0.987), whilst requiring less computation for inference (0.0295 second per image). Our external validity on 1,000 X-ray images shows the robustness of the proposed approach across hospitals. These remarkable performances indicate that deep CNNs can accurately and effectively differentiate human body parts from X-ray scans, thereby providing potential benefits for a wide range of applications in clinical settings. The dataset, codes, and trained deep learning models from this study will be made publicly available on our project website at https://vindr.ai/datasets/bodypartxr.


Sign in / Sign up

Export Citation Format

Share Document