A Dynamic Improvement of a Training Dataset for Source Code Classification Using Deep Learning approach

In recent years, there are various methods for source code classification using deep learning approaches have been proposed. The classification accuracy of the method using deep learning is greatly influenced by the training data set. Therefore, it is possible to create a model with higher accuracy by improving the construction method of the training data set. In this study, we propose a dynamic learning data set improvement method for source code classification using deep learning. In the proposed method, we first train and verify the source code classification model using the training data set. Next, we reconstruct the training data set based on the verification result. We create a high-precision model by repeating this learning and reconstruction and improving the learning data set. In the evaluation experiment, the source code classification model was learned using the proposed method, and the classification accuracy was compared with the three baseline methods. As a result, it was found that the model learned using the proposed method has the highest classification accuracy. We also confirmed that the proposed method improves the classification accuracy of the model from 0.64 to 0.96

Download Full-text

Improving generalization of deep learning models for diagnostic pathology by increasing variability in training data: experiments on osteosarcoma subtypes

10.1101/2020.09.10.20192294 ◽

2020 ◽

Author(s):

Haiming Tang ◽

Nanfei Sun ◽

Steven Shen

Keyword(s):

Deep Learning ◽

Model Performance ◽

High Variability ◽

Training Data ◽

Classification Model ◽

Training Dataset ◽

Learning Models ◽

Diagnostic Pathology ◽

Model Generalization ◽

Histopathological Images

Artificial intelligence (AI) has an emerging progress in diagnostic pathology. A large number of studies of applying deep learning models to histopathological images have been published in recent years. While many studies claim high accuracies, they may fall into the pitfalls of overfitting and lack of generalization due to the high variability of the histopathological images. We use the example of Osteosarcoma to illustrate the pitfalls and how the addition of model input variability can help improve model performance. We use the publicly available osteosarcoma dataset to retrain a previously published classification model for osteosarcoma. We partition the same set of images into the training and testing datasets differently than the original study: the test dataset consists of images from one patient while the training dataset consists images of all other patients. The performance of the model on the test set using the new partition schema declines dramatically, indicating a lack of model generalization and overfitting.We also show the influence of training data variability on model performance by collecting a minimal dataset of 10 osteosarcoma subtypes as well as benign tissues and benign bone tumors of differentiation. We show the additions of more and more subtypes into the training data step by step under the same model schema yield a series of coherent models with increasing performances. In conclusion, we bring forward data preprocessing and collection tactics for histopathological images of high variability to avoid the pitfalls of overfitting and build deep learning models of higher generalization abilities.

Download Full-text

Effects of data count and image scaling on Deep Learning training

PeerJ Computer Science ◽

10.7717/peerj-cs.312 ◽

2020 ◽

Vol 6 ◽

pp. e312

Author(s):

Daisuke Hirahara ◽

Eichi Takaya ◽

Taro Takahara ◽

Takuya Ueda

Keyword(s):

Deep Learning ◽

Classification Accuracy ◽

Data Augmentation ◽

Interpolation Method ◽

Training Data ◽

Image Size ◽

Bilinear Method ◽

Data Set ◽

Interpolation Methods ◽

Average Classification Accuracy

Background Deep learning using convolutional neural networks (CNN) has achieved significant results in various fields that use images. Deep learning can automatically extract features from data, and CNN extracts image features by convolution processing. We assumed that increasing the image size using interpolation methods would result in an effective feature extraction. To investigate how interpolation methods change as the number of data increases, we examined and compared the effectiveness of data augmentation by inversion or rotation with image augmentation by interpolation when the image data for training were small. Further, we clarified whether image augmentation by interpolation was useful for CNN training. To examine the usefulness of interpolation methods in medical images, we used a Gender01 data set, which is a sex classification data set, on chest radiographs. For comparison of image enlargement using an interpolation method with data augmentation by inversion and rotation, we examined the results of two- and four-fold enlargement using a Bilinear method. Results The average classification accuracy improved by expanding the image size using the interpolation method. The biggest improvement was noted when the number of training data was 100, and the average classification accuracy of the training model with the original data was 0.563. However, upon increasing the image size by four times using the interpolation method, the average classification accuracy significantly improved to 0.715. Compared with the data augmentation by inversion and rotation, the model trained using the Bilinear method showed an improvement in the average classification accuracy by 0.095 with 100 training data and 0.015 with 50,000 training data. Comparisons of the average classification accuracy of the chest X-ray images showed a stable and high-average classification accuracy using the interpolation method. Conclusion Training the CNN by increasing the image size using the interpolation method is a useful method. In the future, we aim to conduct additional verifications using various medical images to further clarify the reason why image size is important.

Download Full-text

DEEP MULTI-TASK LEARNING FOR TREE GENERA CLASSIFICATION

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-iv-2-153-2018 ◽

2018 ◽

Vol IV-2 ◽

pp. 153-159 ◽

Cited By ~ 4

Author(s):

C. Ko ◽

J. Kang ◽

G. Sohn

Keyword(s):

Deep Learning ◽

Classification Accuracy ◽

Network Performance ◽

Training Sample ◽

Training Data ◽

Training Dataset ◽

Main Task ◽

Data Generation ◽

Multiple View

The goal for our paper is to classify tree genera using airborne Light Detection and Ranging (LiDAR) data with Convolution Neural Network (CNN) &ndash; Multi-task Network (MTN) implementation. Unlike Single-task Network (STN) where only one task is assigned to the learning outcome, MTN is a deep learning architect for learning a main task (classification of tree genera) with other tasks (in our study, classification of coniferous and deciduous) simultaneously, with shared classification features. The main contribution of this paper is to improve classification accuracy from CNN-STN to CNN-MTN. This is achieved by introducing a concurrence loss (<i>L</i><sub>cd</sub>) to the designed MTN. This term regulates the overall network performance by minimizing the inconsistencies between the two tasks. Results show that we can increase the classification accuracy from 88.7&thinsp;% to 91.0&thinsp;% (from STN to MTN). The second goal of this paper is to solve the problem of small training sample size by multiple-view data generation. The motivation of this goal is to address one of the most common problems in implementing deep learning architecture, the insufficient number of training data. We address this problem by simulating training dataset with multiple-view approach. The promising results from this paper are providing a basis for classifying a larger number of dataset and number of classes in the future.

Download Full-text

Deep Learning Approaches for Whiteboard Image Quality Enhancement

Color and Imaging Conference ◽

10.2352/j.imagingsci.technol.2019.63.4.040404 ◽

2019 ◽

Vol 2019 (1) ◽

pp. 360-368

Author(s):

Mekides Assefa Abebe ◽

Jon Yngve Hardeberg

Keyword(s):

Deep Learning ◽

Image Quality ◽

Image Data ◽

Quality Enhancement ◽

Network Architectures ◽

Learning Approaches ◽

Data Set ◽

Image Quality Enhancement ◽

Processing Techniques ◽

White Balancing

Different whiteboard image degradations highly reduce the legibility of pen-stroke content as well as the overall quality of the images. Consequently, different researchers addressed the problem through different image enhancement techniques. Most of the state-of-the-art approaches applied common image processing techniques such as background foreground segmentation, text extraction, contrast and color enhancements and white balancing. However, such types of conventional enhancement methods are incapable of recovering severely degraded pen-stroke contents and produce artifacts in the presence of complex pen-stroke illustrations. In order to surmount such problems, the authors have proposed a deep learning based solution. They have contributed a new whiteboard image data set and adopted two deep convolutional neural network architectures for whiteboard image quality enhancement applications. Their different evaluations of the trained models demonstrated their superior performances over the conventional methods.

Download Full-text

Effectiveness of transfer learning for enhancing tumor classification with a convolutional neural network on frozen sections

Scientific Reports ◽

10.1038/s41598-020-78129-0 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Young-Gon Kim ◽

Sungchul Kim ◽

Cristina Eunbee Cho ◽

In Hye Song ◽

Hee Jin Lee ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Transfer Learning ◽

Frozen Section ◽

Medical Center ◽

External Validation ◽

Model Performance ◽

Classification Model ◽

Training Dataset

AbstractFast and accurate confirmation of metastasis on the frozen tissue section of intraoperative sentinel lymph node biopsy is an essential tool for critical surgical decisions. However, accurate diagnosis by pathologists is difficult within the time limitations. Training a robust and accurate deep learning model is also difficult owing to the limited number of frozen datasets with high quality labels. To overcome these issues, we validated the effectiveness of transfer learning from CAMELYON16 to improve performance of the convolutional neural network (CNN)-based classification model on our frozen dataset (N = 297) from Asan Medical Center (AMC). Among the 297 whole slide images (WSIs), 157 and 40 WSIs were used to train deep learning models with different dataset ratios at 2, 4, 8, 20, 40, and 100%. The remaining, i.e., 100 WSIs, were used to validate model performance in terms of patch- and slide-level classification. An additional 228 WSIs from Seoul National University Bundang Hospital (SNUBH) were used as an external validation. Three initial weights, i.e., scratch-based (random initialization), ImageNet-based, and CAMELYON16-based models were used to validate their effectiveness in external validation. In the patch-level classification results on the AMC dataset, CAMELYON16-based models trained with a small dataset (up to 40%, i.e., 62 WSIs) showed a significantly higher area under the curve (AUC) of 0.929 than those of the scratch- and ImageNet-based models at 0.897 and 0.919, respectively, while CAMELYON16-based and ImageNet-based models trained with 100% of the training dataset showed comparable AUCs at 0.944 and 0.943, respectively. For the external validation, CAMELYON16-based models showed higher AUCs than those of the scratch- and ImageNet-based models. Model performance for slide feasibility of the transfer learning to enhance model performance was validated in the case of frozen section datasets with limited numbers.

Download Full-text

Deep Learning-Based Hepatocellular Carcinoma Histopathology Image Classification: Accuracy versus Training Dataset Size

IEEE Access ◽

10.1109/access.2021.3060765 ◽

2021 ◽

pp. 1-1

Author(s):

Yu-Shiang Lin ◽

Pei-Hsin Huang ◽

Yung-Yaw Chen

Keyword(s):

Hepatocellular Carcinoma ◽

Deep Learning ◽

Image Classification ◽

Classification Accuracy ◽

Training Dataset ◽

Dataset Size

Download Full-text

Deep Learning of Appearance Affinity for Multi-Object Tracking and Re-Identification: A Comparative View

Electronics ◽

10.3390/electronics9111757 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1757

Author(s):

María J. Gómez-Silva ◽

Arturo de la Escalera ◽

José M. Armingol

Keyword(s):

Deep Learning ◽

Object Tracking ◽

Loss Function ◽

Neural Model ◽

Training Data ◽

Learning Approaches ◽

The Core ◽

Triplet Loss ◽

Affinity Model

Recognizing the identity of a query individual in a surveillance sequence is the core of Multi-Object Tracking (MOT) and Re-Identification (Re-Id) algorithms. Both tasks can be addressed by measuring the appearance affinity between people observations with a deep neural model. Nevertheless, the differences in their specifications and, consequently, in the characteristics and constraints of the available training data for each one of these tasks, arise from the necessity of employing different learning approaches to attain each one of them. This article offers a comparative view of the Double-Margin-Contrastive and the Triplet loss function, and analyzes the benefits and drawbacks of applying each one of them to learn an Appearance Affinity model for Tracking and Re-Identification. A batch of experiments have been conducted, and their results support the hypothesis concluded from the presented study: Triplet loss function is more effective than the Contrastive one when an Re-Id model is learnt, and, conversely, in the MOT domain, the Contrastive loss can better discriminate between pairs of images rendering the same person or not.

Download Full-text

Deep Learning-Based Computer-Aided Diagnosis System for Gastroscopy Image Classification Using Synthetic Data

Applied Sciences ◽

10.3390/app11020760 ◽

2021 ◽

Vol 11 (2) ◽

pp. 760

Author(s):

Yun-ji Kim ◽

Hyun Chin Cho ◽

Hyun-chong Cho

Keyword(s):

Deep Learning ◽

Disease Diagnosis ◽

Computer Aided Diagnosis ◽

Medical Data ◽

Classification Model ◽

Training Dataset ◽

Generative Adversarial Networks ◽

Original Training ◽

Computer Aided ◽

Aided Diagnosis

Gastric cancer has a high mortality rate worldwide, but it can be prevented with early detection through regular gastroscopy. Herein, we propose a deep learning-based computer-aided diagnosis (CADx) system applying data augmentation to help doctors classify gastroscopy images as normal or abnormal. To improve the performance of deep learning, a large amount of training data are required. However, the collection of medical data, owing to their nature, is highly expensive and time consuming. Therefore, data were generated through deep convolutional generative adversarial networks (DCGAN), and 25 augmentation policies optimized for the CIFAR-10 dataset were implemented through AutoAugment to augment the data. Accordingly, a gastroscopy image was augmented, only high-quality images were selected through an image quality-measurement method, and gastroscopy images were classified as normal or abnormal through the Xception network. We compared the performances of the original training dataset, which did not improve, the dataset generated through the DCGAN, the dataset augmented through the augmentation policies of CIFAR-10, and the dataset combining the two methods. The dataset combining the two methods delivered the best performance in terms of accuracy (0.851) and achieved an improvement of 0.06 over the original training dataset. We confirmed that augmenting data through the DCGAN and CIFAR-10 augmentation policies is most suitable for the classification model for normal and abnormal gastric endoscopy images. The proposed method not only solves the medical-data problem but also improves the accuracy of gastric disease diagnosis.

Download Full-text

Deep Learning based Tomato’s Ripe and Unripe Classification System

International Journal of Software Innovation ◽

10.4018/ijsi.292023 ◽

2022 ◽

Vol 10 (1) ◽

pp. 0-0

Keyword(s):

Deep Learning ◽

Classification Accuracy ◽

Ccd Camera ◽

Agricultural Products ◽

Training Data ◽

Maturity Level ◽

Agriculture Sector ◽

The Past ◽

State Of Art

Effective productivity estimates of fresh produced crops are very essential for efficient farming, commercial planning, and logistical support. In the past ten years, machine learning (ML) algorithms have been widely used for grading and classification of agricultural products in agriculture sector. However, the precise and accurate assessment of the maturity level of tomatoes using ML algorithms is still a quite challenging to achieve due to these algorithms being reliant on hand crafted features. Hence, in this paper we propose a deep learning based tomato maturity grading system that helps to increase the accuracy and adaptability of maturity grading tasks with less amount of training data. The performance of proposed system is assessed on the real tomato datasets collected from the open fields using Nikon D3500 CCD camera. The proposed approach achieved an average maturity classification accuracy of 99.8 % which seems to be quite promising in comparison to the other state of art methods.

Download Full-text

Tobacco Leaf Grading Based on Deep Convolutional Neural Networks and Machine Vision

Journal of the ASABE ◽

10.13031/ja.14537 ◽

2021 ◽

Vol 65 (1) ◽

pp. 11-22

Author(s):

Mengyao Lu ◽

Shuwen Jiang ◽

Cong Wang ◽

Dong Chen ◽

Tian’en Chen

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Transfer Learning ◽

Convolutional Neural Networks ◽

Classification Accuracy ◽

Classification Model ◽

List Type ◽

Tobacco Leaves ◽

Tobacco Leaf ◽

Grading Model

HighlightsA classification model for the front and back sides of tobacco leaves was developed for application in industry.A tobacco leaf grading method that combines a CNN with double-branch integration was proposed.The A-ResNet network was proposed and compared with other classic CNN networks.The grading accuracy of eight different grades was 91.30% and the testing time was 82.180 ms, showing a relatively high classification accuracy and efficiency.Abstract. Flue-cured tobacco leaf grading is a key step in the production and processing of Chinese-style cigarette raw materials, directly affecting cigarette blend and quality stability. At present, manual grading of tobacco leaves is dominant in China, resulting in unsatisfactory grading quality and consuming considerable material and financial resources. In this study, for fast, accurate, and non-destructive tobacco leaf grading, 2,791 flue-cured tobacco leaves of eight different grades in south Anhui Province, China, were chosen as the study sample, and a tobacco leaf grading method that combines convolutional neural networks and double-branch integration was proposed. First, a classification model for the front and back sides of tobacco leaves was trained by transfer learning. Second, two processing methods (equal-scaled resizing and cropping) were used to obtain global images and local patches from the front sides of tobacco leaves. A global image-based tobacco leaf grading model was then developed using the proposed A-ResNet-65 network, and a local patch-based tobacco leaf grading model was developed using the ResNet-34 network. These two networks were compared with classic deep learning networks, such as VGGNet, GoogLeNet-V3, and ResNet. Finally, the grading results of the two grading models were integrated to realize tobacco leaf grading. The tobacco leaf classification accuracy of the final model, for eight different grades, was 91.30%, and grading of a single tobacco leaf required 82.180 ms. The proposed method achieved a relatively high grading accuracy and efficiency. It provides a method for industrial implementation of the tobacco leaf grading and offers a new approach for the quality grading of other agricultural products. Keywords: Convolutional neural network, Deep learning, Image classification, Transfer learning, Tobacco leaf grading

Download Full-text