Multilevel Fine-Tuning: Closing Generalization Gaps in Approximation of Solution Maps under a Limited Budget for Training Data

Mucinous cystic neoplasms (MCN) and serous cystic neoplasms (SCN) account for a large portion of solitary pancreatic cystic neoplasms (PCN). In this study we implemented a convolutional neural network (CNN) model using ResNet50 to differentiate between MCN and SCN. The training data were collected retrospectively from 59 MCN and 49 SCN patients from two different hospitals. Data augmentation was used to enhance the size and quality of training datasets. Fine-tuning training approaches were utilized by adopting the pre-trained model from transfer learning while training selected layers. Testing of the network was conducted by varying the endoscopic ultrasonography (EUS) image sizes and positions to evaluate the network performance for differentiation. The proposed network model achieved up to 82.75% accuracy and a 0.88 (95% CI: 0.817–0.930) area under curve (AUC) score. The performance of the implemented deep learning networks in decision-making using only EUS images is comparable to that of traditional manual decision-making using EUS images along with supporting clinical information. Gradient-weighted class activation mapping (Grad-CAM) confirmed that the network model learned the features from the cyst region accurately. This study proves the feasibility of diagnosing MCN and SCN using a deep learning network model. Further improvement using more datasets is needed.

Download Full-text

New polyp image classification technique using transfer learning of network-in-network structure in endoscopic images

Scientific Reports ◽

10.1038/s41598-021-83199-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Young Jae Kim ◽

Jang Pyo Bae ◽

Jun-Won Chung ◽

Dong Kyun Park ◽

Kwang Gi Kim ◽

...

Keyword(s):

Colorectal Cancer ◽

Transfer Learning ◽

Test Data ◽

State Of The Art ◽

Early Stage ◽

Statistical Significance ◽

Recall Rate ◽

Training Data ◽

Fine Tuning ◽

Accuracy Evaluation

AbstractWhile colorectal cancer is known to occur in the gastrointestinal tract. It is the third most common form of cancer of 27 major types of cancer in South Korea and worldwide. Colorectal polyps are known to increase the potential of developing colorectal cancer. Detected polyps need to be resected to reduce the risk of developing cancer. This research improved the performance of polyp classification through the fine-tuning of Network-in-Network (NIN) after applying a pre-trained model of the ImageNet database. Random shuffling is performed 20 times on 1000 colonoscopy images. Each set of data are divided into 800 images of training data and 200 images of test data. An accuracy evaluation is performed on 200 images of test data in 20 experiments. Three compared methods were constructed from AlexNet by transferring the weights trained by three different state-of-the-art databases. A normal AlexNet based method without transfer learning was also compared. The accuracy of the proposed method was higher in statistical significance than the accuracy of four other state-of-the-art methods, and showed an 18.9% improvement over the normal AlexNet based method. The area under the curve was approximately 0.930 ± 0.020, and the recall rate was 0.929 ± 0.029. An automatic algorithm can assist endoscopists in identifying polyps that are adenomatous by considering a high recall rate and accuracy. This system can enable the timely resection of polyps at an early stage.

Download Full-text

X-MOL: large-scale pre-training for molecular understanding and diverse molecular analysis

10.1101/2020.12.23.424259 ◽

2020 ◽

Author(s):

Dongyu Xue ◽

Han Zhang ◽

Dongling Xiao ◽

Yukang Gong ◽

Guohui Chuai ◽

...

Keyword(s):

Molecular Analysis ◽

In Silico ◽

Large Scale ◽

De Novo ◽

Representation Learning ◽

Training Data ◽

Fine Tuning ◽

Model Interpretation ◽

Unlabelled Data ◽

Super Computing

AbstractIn silico modelling and analysis of small molecules substantially accelerates the process of drug development. Representing and understanding molecules is the fundamental step for various in silico molecular analysis tasks. Traditionally, these molecular analysis tasks have been investigated individually and separately. In this study, we presented X-MOL, which applies large-scale pre-training technology on 1.1 billion molecules for molecular understanding and representation, and then, carefully designed fine-tuning was performed to accommodate diverse downstream molecular analysis tasks, including molecular property prediction, chemical reaction analysis, drug-drug interaction prediction, de novo generation of molecules and molecule optimization. As a result, X-MOL was proven to achieve state-of-the-art results on all these molecular analysis tasks with good model interpretation ability. Collectively, taking advantage of super large-scale pre-training data and super-computing power, our study practically demonstrated the utility of the idea of “mass makes miracles” in molecular representation learning and downstream in silico molecular analysis, indicating the great potential of using large-scale unlabelled data with carefully designed pre-training and fine-tuning strategies to unify existing molecular analysis tasks and substantially enhance the performance of each task.

Download Full-text

Combining Self-supervised Learning and Active Learning for Disfluency Detection

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3487290 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-25

Author(s):

Shaolei Wang ◽

Zhongyuan Wang ◽

Wanxiang Che ◽

Sendong Zhao ◽

Ting Liu

Keyword(s):

Neural Network ◽

Active Learning ◽

Supervised Learning ◽

Large Scale ◽

Training Data ◽

Fine Tuning ◽

Training Dataset ◽

Performance Gap ◽

Annotation Costs ◽

Trained Neural Network

Spoken language is fundamentally different from the written language in that it contains frequent disfluencies or parts of an utterance that are corrected by the speaker. Disfluency detection (removing these disfluencies) is desirable to clean the input for use in downstream NLP tasks. Most existing approaches to disfluency detection heavily rely on human-annotated data, which is scarce and expensive to obtain in practice. To tackle the training data bottleneck, in this work, we investigate methods for combining self-supervised learning and active learning for disfluency detection. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled data and propose two self-supervised pre-training tasks: (i) a tagging task to detect the added noisy words and (ii) sentence classification to distinguish original sentences from grammatically incorrect sentences. We then combine these two tasks to jointly pre-train a neural network. The pre-trained neural network is then fine-tuned using human-annotated disfluency detection training data. The self-supervised learning method can capture task-special knowledge for disfluency detection and achieve better performance when fine-tuning on a small annotated dataset compared to other supervised methods. However, limited in that the pseudo training data are generated based on simple heuristics and cannot fully cover all the disfluency patterns, there is still a performance gap compared to the supervised models trained on the full training dataset. We further explore how to bridge the performance gap by integrating active learning during the fine-tuning process. Active learning strives to reduce annotation costs by choosing the most critical examples to label and can address the weakness of self-supervised learning with a small annotated dataset. We show that by combining self-supervised learning with active learning, our model is able to match state-of-the-art performance with just about 10% of the original training data on both the commonly used English Switchboard test set and a set of in-house annotated Chinese data.

Download Full-text

Multiscale Mask R-CNN–Based Lung Tumor Detection Using PET Imaging

Molecular Imaging ◽

10.1177/1536012119863531 ◽

2019 ◽

Vol 18 ◽

pp. 153601211986353 ◽

Cited By ~ 6

Author(s):

Rui Zhang ◽

Chao Cheng ◽

Xuehua Zhao ◽

Xuechen Li

Keyword(s):

Pet Imaging ◽

Lung Tumor ◽

Training Data ◽

Fine Tuning ◽

Weighted Voting ◽

Data Set ◽

New Approach ◽

Positron Emission ◽

Voting Strategy ◽

Aided Diagnosis

Positron emission tomography (PET) imaging serves as one of the most competent methods for the diagnosis of various malignancies, such as lung tumor. However, with an elevation in the utilization of PET scan, radiologists are overburdened considerably. Consequently, a new approach of “computer-aided diagnosis” is being contemplated to curtail the heavy workloads. In this article, we propose a multiscale Mask Region–Based Convolutional Neural Network (Mask R-CNN)–based method that uses PET imaging for the detection of lung tumor. First, we produced 3 models of Mask R-CNN for lung tumor candidate detection. These 3 models were generated by fine-tuning the Mask R-CNN using certain training data that consisted of images from 3 different scales. Each of the training data set included 594 slices with lung tumor. These 3 models of Mask R-CNN models were then integrated using weighted voting strategy to diminish the false-positive outcomes. A total of 134 PET slices were employed as test set in this experiment. The precision, recall, and F score values of our proposed method were 0.90, 1, and 0.95, respectively. Experimental results exhibited strong conviction about the effectiveness of this method in detecting lung tumors, along with the capability of identifying a healthy chest pattern and reducing incorrect identification of tumors to a large extent.

Download Full-text

A general approach for improving deep learning-based medical relation extraction using a pre-trained model and fine-tuning

Database ◽

10.1093/database/baz116 ◽

2019 ◽

Vol 2019 ◽

Cited By ~ 2

Author(s):

Tao Chen ◽

Mingfen Wu ◽

Hexi Li

Keyword(s):

Deep Learning ◽

Large Scale ◽

Relation Extraction ◽

Training Model ◽

Biomedical Literature ◽

Training Data ◽

Fine Tuning ◽

Learning Approaches ◽

Additional Time ◽

Clinical Records

Abstract The automatic extraction of meaningful relations from biomedical literature or clinical records is crucial in various biomedical applications. Most of the current deep learning approaches for medical relation extraction require large-scale training data to prevent overfitting of the training model. We propose using a pre-trained model and a fine-tuning technique to improve these approaches without additional time-consuming human labeling. Firstly, we show the architecture of Bidirectional Encoder Representations from Transformers (BERT), an approach for pre-training a model on large-scale unstructured text. We then combine BERT with a one-dimensional convolutional neural network (1d-CNN) to fine-tune the pre-trained model for relation extraction. Extensive experiments on three datasets, namely the BioCreative V chemical disease relation corpus, traditional Chinese medicine literature corpus and i2b2 2012 temporal relation challenge corpus, show that the proposed approach achieves state-of-the-art results (giving a relative improvement of 22.2, 7.77, and 38.5% in F1 score, respectively, compared with a traditional 1d-CNN classifier). The source code is available at https://github.com/chentao1999/MedicalRelationExtraction.

Download Full-text

OmniPD: One-Step Person Detection in Top-View Omnidirectional Indoor Scenes

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2019-0061 ◽

2019 ◽

Vol 5 (1) ◽

pp. 239-244

Author(s):

Jingrui Yu ◽

Roman Seidel ◽

Gangolf Hirtz

Keyword(s):

Data Augmentation ◽

Training Data ◽

Fine Tuning ◽

Single Shot ◽

Augmentation Techniques ◽

Indoor Scenes ◽

One Step ◽

Bounding Boxes ◽

Omnidirectional Images ◽

Fine Tune

AbstractWe propose a one-step person detector for topview omnidirectional indoor scenes based on convolutional neural networks (CNNs). While state of the art person detectors reach competitive results on perspective images, missing CNN architectures as well as training data that follows the distortion of omnidirectional images makes current approaches not applicable to our data. The method predicts bounding boxes of multiple persons directly in omnidirectional images without perspective transformation, which reduces overhead of pre- and post-processing and enables realtime performance. The basic idea is to utilize transfer learning to fine-tune CNNs trained on perspective images with data augmentation techniques for detection in omnidirectional images. We fine-tune two variants of Single Shot MultiBox detectors (SSDs). The first one uses Mobilenet v1 FPN as feature extractor (moSSD). The second one uses ResNet50 v1 FPN (resSSD). Both models are pre-trained on Microsoft Common Objects in Context (COCO) dataset. We fine-tune both models on PASCAL VOC07 and VOC12 datasets, specifically on class person. Random 90-degree rotation and random vertical flipping are used for data augmentation in addition to the methods proposed by original SSD. We reach an average precision (AP) of 67.3%with moSSD and 74.9%with resSSD on the evaluation dataset. To enhance the fine-tuning process, we add a subset of HDA Person dataset and a subset of PIROPO database and reduce the number of perspective images to PASCAL VOC07. The AP rises to 83.2% for moSSD and 86.3% for resSSD, respectively. The average inference speed is 28 ms per image for moSSD and 38 ms per image for resSSD using Nvidia Quadro P6000. Our method is applicable to other CNN-based object detectors and can potentially generalize for detecting other objects in omnidirectional images.

Download Full-text

Facial Expression Recognition Based on Weighted-Cluster Loss and Deep Transfer Learning Using a Highly Imbalanced Dataset

Sensors ◽

10.3390/s20092639 ◽

2020 ◽

Vol 20 (9) ◽

pp. 2639

Author(s):

Quan T. Ngo ◽

Seokhoon Yoon

Keyword(s):

Facial Expression ◽

Transfer Learning ◽

Loss Function ◽

Real World ◽

Facial Expression Recognition ◽

Training Data ◽

Fine Tuning ◽

Expression Recognition ◽

Recent Success ◽

Deep Cnn

Facial expression recognition (FER) is a challenging problem in the fields of pattern recognition and computer vision. The recent success of convolutional neural networks (CNNs) in object detection and object segmentation tasks has shown promise in building an automatic deep CNN-based FER model. However, in real-world scenarios, performance degrades dramatically owing to the great diversity of factors unrelated to facial expressions, and due to a lack of training data and an intrinsic imbalance in the existing facial emotion datasets. To tackle these problems, this paper not only applies deep transfer learning techniques, but also proposes a novel loss function called weighted-cluster loss, which is used during the fine-tuning phase. Specifically, the weighted-cluster loss function simultaneously improves the intra-class compactness and the inter-class separability by learning a class center for each emotion class. It also takes the imbalance in a facial expression dataset into account by giving each emotion class a weight based on its proportion of the total number of images. In addition, a recent, successful deep CNN architecture, pre-trained in the task of face identification with the VGGFace2 database from the Visual Geometry Group at Oxford University, is employed and fine-tuned using the proposed loss function to recognize eight basic facial emotions from the AffectNet database of facial expression, valence, and arousal computing in the wild. Experiments on an AffectNet real-world facial dataset demonstrate that our method outperforms the baseline CNN models that use either weighted-softmax loss or center loss.

Download Full-text

DEEP LEARNING BASED ROOF TYPE CLASSIFICATION USING VERY HIGH RESOLUTION AERIAL IMAGERY

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b3-2021-55-2021 ◽

2021 ◽

Vol XLIII-B3-2021 ◽

pp. 55-60

Author(s):

M. Buyukdemircioglu ◽

R. Can ◽

S. Kocaman

Keyword(s):

Deep Learning ◽

High Resolution ◽

Urban Areas ◽

Image Features ◽

Training Data ◽

Fine Tuning ◽

Computer Hardware ◽

Geographical Information ◽

Training Dataset ◽

Very High

Abstract. Automatic detection, segmentation and reconstruction of buildings in urban areas from Earth Observation (EO) data are still challenging for many researchers. Roof is one of the most important element in a building model. The three-dimensional geographical information system (3D GIS) applications generally require the roof type and roof geometry for performing various analyses on the models, such as energy efficiency. The conventional segmentation and classification methods are often based on features like corners, edges and line segments. In parallel to the developments in computer hardware and artificial intelligence (AI) methods including deep learning (DL), image features can be extracted automatically. As a DL technique, convolutional neural networks (CNNs) can also be used for image classification tasks, but require large amount of high quality training data for obtaining accurate results. The main aim of this study was to generate a roof type dataset from very high-resolution (10 cm) orthophotos of Cesme, Turkey, and to classify the roof types using a shallow CNN architecture. The training dataset consists 10,000 roof images and their labels. Six roof type classes such as flat, hip, half-hip, gable, pyramid and complex roofs were used for the classification in the study area. The prediction performance of the shallow CNN model used here was compared with the results obtained from the fine-tuning of three well-known pre-trained networks, i.e. VGG-16, EfficientNetB4, ResNet-50. The results show that although our CNN has slightly lower performance expressed with the overall accuracy, it is still acceptable for many applications using sparse data.

Download Full-text

Analysis and Evaluation of Language Models for Word Sense Disambiguation

Computational Linguistics ◽

10.1162/coli_a_00405 ◽

2021 ◽

pp. 1-55

Author(s):

Daniel Loureiro ◽

Kiamehr Rezaee ◽

Mohammad Taher Pilehvar ◽

Jose Camacho-Collados

Keyword(s):

Feature Extraction ◽

Word Sense Disambiguation ◽

Language Model ◽

Training Data ◽

Fine Tuning ◽

Language Models ◽

Coarse Grained ◽

Word Sense ◽

Sense Disambiguation ◽

High Level

Abstract Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability in capturing context-sensitive semantic nuances. However, there is still little knowledge about their capabilities and potential limitations in encoding and recovering word senses. In this article, we provide an in-depth quantitative and qualitative analysis of the celebrated BERT model with respect to lexical ambiguity. One of the main conclusions of our analysis is that BERT can accurately capture high-level sense distinctions, even when a limited number of examples is available for each word sense. Our analysis also reveals that in some cases language models come close to solving coarse-grained noun disambiguation under ideal conditions in terms of availability of training data and computing resources. However, this scenario rarely occurs in real-world settings and, hence, many practical challenges remain even in the coarse-grained setting. We also perform an in-depth comparison of the two main language model based WSD strategies, i.e., fine-tuning and feature extraction, finding that the latter approach is more robust with respect to sense bias and it can better exploit limited available training data. In fact, the simple feature extraction strategy of averaging contextualized embeddings proves robust even using only three training sentences per word sense, with minimal improvements obtained by increasing the size of this training data.

Download Full-text