Cyclical Learning Rates (CLR’s) for Improving Training Accuracies and Lowering Computational Cost

Author(s):  
Rushikesh Chopade ◽  
Aditya Stanam ◽  
Anand Narayanan ◽  
Shrikant Pawar

Abstract Prediction of different lung pathologies using chest X-ray images is a challenging task requiring robust training and testing accuracies. In this article, one-class classifier (OCC) and binary classification algorithms have been tested to classify 14 different diseases (atelectasis, cardiomegaly, consolidation, effusion, edema, emphysema, fibrosis, hernia, infiltration, mass, nodule, pneumonia, pneumothorax and pleural-thickening). We have utilized 3 different neural network architectures (MobileNetV1, Alexnet, and DenseNet-121) with four different optimizers (SGD, Adam, and RMSProp) for comparing best possible accuracies. Cyclical learning rate (CLR), a tuning hyperparameters technique was found to have a faster convergence of the cost towards the minima of cost function. Here, we present a unique approach of utilizing previously trained binary classification models with a learning rate decay technique for re-training models using CLR’s. Doing so, we found significant improvement in training accuracies for each of the selected conditions. Thus, utilizing CLR’s in callback functions seems a promising strategy for image classification problems.

2017 ◽  
Vol 29 (12) ◽  
pp. 3353-3380 ◽  
Author(s):  
Shao-Bo Lin ◽  
Jinshan Zeng ◽  
Xiangyu Chang

This letter aims at refined error analysis for binary classification using support vector machine (SVM) with gaussian kernel and convex loss. Our first result shows that for some loss functions, such as the truncated quadratic loss and quadratic loss, SVM with gaussian kernel can reach the almost optimal learning rate provided the regression function is smooth. Our second result shows that for a large number of loss functions, under some Tsybakov noise assumption, if the regression function is infinitely smooth, then SVM with gaussian kernel can achieve the learning rate of order [Formula: see text], where [Formula: see text] is the number of samples.


2020 ◽  
Vol 11 (1) ◽  
pp. 19-34 ◽  
Author(s):  
Stefania Bellavia ◽  
Nataša Krklec Jerinkić ◽  
Greta Malaspina

AbstractThis paper deals with subsampled spectral gradient methods for minimizing finite sums. Subsample function and gradient approximations are employed in order to reduce the overall computational cost of the classical spectral gradient methods. The global convergence is enforced by a nonmonotone line search procedure. Global convergence is proved provided that functions and gradients are approximated with increasing accuracy. R-linear convergence and worst-case iteration complexity is investigated in case of strongly convex objective function. Numerical results on well known binary classification problems are given to show the effectiveness of this framework and analyze the effect of different spectral coefficient approximations arising from the variable sample nature of this procedure.


2021 ◽  
Vol 13 (21) ◽  
pp. 4407
Author(s):  
Yuchao Feng ◽  
Jianwei Zheng ◽  
Mengjie Qin ◽  
Cong Bai ◽  
Jinglin Zhang

Owing to the outstanding feature extraction capability, convolutional neural networks (CNNs) have been widely applied in hyperspectral image (HSI) classification problems and have achieved an impressive performance. However, it is well known that 2D convolution suffers from the absent consideration of spectral information, while 3D convolution requires a huge amount of computational cost. In addition, the cost of labeling and the limitation of computing resources make it urgent to improve the generalization performance of the model with scarcely labeled samples. To relieve these issues, we design an end-to-end 3D octave and 2D vanilla mixed CNN, namely Oct-MCNN-HS, based on the typical 3D-2D mixed CNN (MCNN). It is worth mentioning that two feature fusion operations are deliberately constructed to climb the top of the discriminative features and practical performance. That is, 2D vanilla convolution merges the feature maps generated by 3D octave convolutions along the channel direction, and homology shifting aggregates the information of the pixels locating at the same spatial position. Extensive experiments are conducted on four publicly available HSI datasets to evaluate the effectiveness and robustness of our model, and the results verify the superiority of Oct-MCNN-HS both in efficacy and efficiency.


2021 ◽  
Vol 13 (9) ◽  
pp. 1623
Author(s):  
João E. Batista ◽  
Ana I. R. Cabral ◽  
Maria J. P. Vasconcelos ◽  
Leonardo Vanneschi ◽  
Sara Silva

Genetic programming (GP) is a powerful machine learning (ML) algorithm that can produce readable white-box models. Although successfully used for solving an array of problems in different scientific areas, GP is still not well known in the field of remote sensing. The M3GP algorithm, a variant of the standard GP algorithm, performs feature construction by evolving hyperfeatures from the original ones. In this work, we use the M3GP algorithm on several sets of satellite images over different countries to create hyperfeatures from satellite bands to improve the classification of land cover types. We add the evolved hyperfeatures to the reference datasets and observe a significant improvement of the performance of three state-of-the-art ML algorithms (decision trees, random forests, and XGBoost) on multiclass classifications and no significant effect on the binary classifications. We show that adding the M3GP hyperfeatures to the reference datasets brings better results than adding the well-known spectral indices NDVI, NDWI, and NBR. We also compare the performance of the M3GP hyperfeatures in the binary classification problems with those created by other feature construction methods such as FFX and EFS.


Algorithms ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 134
Author(s):  
Loai Abdallah ◽  
Murad Badarna ◽  
Waleed Khalifa ◽  
Malik Yousef

In the computational biology community there are many biological cases that are considered as multi-one-class classification problems. Examples include the classification of multiple tumor types, protein fold recognition and the molecular classification of multiple cancer types. In all of these cases the real world appropriately characterized negative cases or outliers are impractical to achieve and the positive cases might consist of different clusters, which in turn might lead to accuracy degradation. In this paper we present a novel algorithm named MultiKOC multi-one-class classifiers based K-means to deal with this problem. The main idea is to execute a clustering algorithm over the positive samples to capture the hidden subdata of the given positive data, and then building up a one-class classifier for every cluster member’s examples separately: in other word, train the OC classifier on each piece of subdata. For a given new sample, the generated classifiers are applied. If it is rejected by all of those classifiers, the given sample is considered as a negative sample, otherwise it is a positive sample. The results of MultiKOC are compared with the traditional one-class, multi-one-class, ensemble one-classes and two-class methods, yielding a significant improvement over the one-class and like the two-class performance.


Author(s):  
Kanae Takahashi ◽  
Kouji Yamamoto ◽  
Aya Kuchiba ◽  
Tatsuki Koyama

AbstractA binary classification problem is common in medical field, and we often use sensitivity, specificity, accuracy, negative and positive predictive values as measures of performance of a binary predictor. In computer science, a classifier is usually evaluated with precision (positive predictive value) and recall (sensitivity). As a single summary measure of a classifier’s performance, F1 score, defined as the harmonic mean of precision and recall, is widely used in the context of information retrieval and information extraction evaluation since it possesses favorable characteristics, especially when the prevalence is low. Some statistical methods for inference have been developed for the F1 score in binary classification problems; however, they have not been extended to the problem of multi-class classification. There are three types of F1 scores, and statistical properties of these F1 scores have hardly ever been discussed. We propose methods based on the large sample multivariate central limit theorem for estimating F1 scores with confidence intervals.


Sign in / Sign up

Export Citation Format

Share Document