Novel loss functions for ensemble-based medical image classification

Medical images commonly exhibit multiple abnormalities. Predicting them requires multi-class classifiers whose training and desired reliable performance can be affected by a combination of factors, such as, dataset size, data source, distribution, and the loss function used to train deep neural networks. Currently, the cross-entropy loss remains the de-facto loss function for training deep learning classifiers. This loss function, however, asserts equal learning from all classes, leading to a bias toward the majority class. Although the choice of the loss function impacts model performance, to the best of our knowledge, we observed that no literature exists that performs a comprehensive analysis and selection of an appropriate loss function toward the classification task under study. In this work, we benchmark various state-of-the-art loss functions, critically analyze model performance, and propose improved loss functions for a multi-class classification task. We select a pediatric chest X-ray (CXR) dataset that includes images with no abnormality (normal), and those exhibiting manifestations consistent with bacterial and viral pneumonia. We construct prediction-level and model-level ensembles to improve classification performance. Our results show that compared to the individual models and the state-of-the-art literature, the weighted averaging of the predictions for top-3 and top-5 model-level ensembles delivered significantly superior classification performance (p < 0.05) in terms of MCC (0.9068, 95% confidence interval (0.8839, 0.9297)) metric. Finally, we performed localization studies to interpret model behavior and confirm that the individual models and ensembles learned task-specific features and highlighted disease-specific regions of interest. The code is available at https://github.com/sivaramakrishnan-rajaraman/multiloss_ensemble_models.

Download Full-text

Assessing the Impact of the Loss Function, Architecture and Image Type for Deep Learning-Based Wildfire Segmentation

Applied Sciences ◽

10.3390/app11157046 ◽

2021 ◽

Vol 11 (15) ◽

pp. 7046

Author(s):

Jorge Francisco Ciprián-Sánchez ◽

Gilberto Ochoa-Ruiz ◽

Lucile Rossi ◽

Frédéric Morandini

Keyword(s):

Deep Learning ◽

Loss Function ◽

State Of The Art ◽

Fire Detection ◽

Loss Functions ◽

Wildfire Spread ◽

Combine Information ◽

The Impact ◽

Image Type ◽

Segmentation Models

Wildfires stand as one of the most relevant natural disasters worldwide, particularly more so due to the effect of climate change and its impact on various societal and environmental levels. In this regard, a significant amount of research has been done in order to address this issue, deploying a wide variety of technologies and following a multi-disciplinary approach. Notably, computer vision has played a fundamental role in this regard. It can be used to extract and combine information from several imaging modalities in regard to fire detection, characterization and wildfire spread forecasting. In recent years, there has been work pertaining to Deep Learning (DL)-based fire segmentation, showing very promising results. However, it is currently unclear whether the architecture of a model, its loss function, or the image type employed (visible, infrared, or fused) has the most impact on the fire segmentation results. In the present work, we evaluate different combinations of state-of-the-art (SOTA) DL architectures, loss functions, and types of images to identify the parameters most relevant to improve the segmentation results. We benchmark them to identify the top-performing ones and compare them to traditional fire segmentation techniques. Finally, we evaluate if the addition of attention modules on the best performing architecture can further improve the segmentation results. To the best of our knowledge, this is the first work that evaluates the impact of the architecture, loss function, and image type in the performance of DL-based wildfire segmentation models.

Download Full-text

Learning Large Logic Programs By Going Beyond Entailment

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/287 ◽

2020 ◽

Author(s):

Andrew Cropper ◽

Sebastijan Dumančic

Keyword(s):

Logic Programming ◽

Loss Function ◽

Inductive Logic Programming ◽

State Of The Art ◽

Inductive Logic ◽

Program Synthesis ◽

Loss Functions ◽

Logic Programs ◽

Binary Decision ◽

Best First Search

A major challenge in inductive logic programming (ILP) is learning large programs. We argue that a key limitation of existing systems is that they use entailment to guide the hypothesis search. This approach is limited because entailment is a binary decision: a hypothesis either entails an example or does not, and there is no intermediate position. To address this limitation, we go beyond entailment and use 'example-dependent' loss functions to guide the search, where a hypothesis can partially cover an example. We implement our idea in Brute, a new ILP system which uses best-first search, guided by an example-dependent loss function, to incrementally build programs. Our experiments on three diverse program synthesis domains (robot planning, string transformations, and ASCII art), show that Brute can substantially outperform existing ILP systems, both in terms of predictive accuracies and learning times, and can learn programs 20 times larger than state-of-the-art systems.

Download Full-text

Eigenloss: Combined PCA-Based Loss Function for Polyp Segmentation

Mathematics ◽

10.3390/math8081316 ◽

2020 ◽

Vol 8 (8) ◽

pp. 1316

Author(s):

Luisa F. Sánchez-Peralta ◽

Artzai Picón ◽

Juan Antonio Antequera-Barroso ◽

Juan Francisco Ortega-Morán ◽

Francisco M. Sánchez-Margallo ◽

...

Keyword(s):

Deep Learning ◽

Loss Function ◽

Survival Rates ◽

Principal Component ◽

Loss Functions ◽

Cancer Death ◽

Linear Combinations ◽

The Individual ◽

Deep Learning Model ◽

Death Causes

Colorectal cancer is one of the leading cancer death causes worldwide, but its early diagnosis highly improves the survival rates. The success of deep learning has also benefited this clinical field. When training a deep learning model, it is optimized based on the selected loss function. In this work, we consider two networks (U-Net and LinkNet) and two backbones (VGG-16 and Densnet121). We analyzed the influence of seven loss functions and used a principal component analysis (PCA) to determine whether the PCA-based decomposition allows for the defining of the coefficients of a non-redundant primal loss function that can outperform the individual loss functions and different linear combinations. The eigenloss is defined as a linear combination of the individual losses using the elements of the eigenvector as coefficients. Empirical results show that the proposed eigenloss improves the general performance of individual loss functions and outperforms other linear combinations when Linknet is used, showing potential for its application in polyp segmentation problems.

Download Full-text

Improved Loss Function for Image Classification

Computational Intelligence and Neuroscience ◽

10.1155/2021/6660961 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Chenrui Wen ◽

Xinhao Yang ◽

Ke Zhang ◽

Jiahui Zhang

Keyword(s):

Image Classification ◽

Loss Function ◽

Classification Performance ◽

Cross Entropy ◽

Loss Functions ◽

Network Architectures ◽

Sampling Function ◽

Network Training ◽

Sampling Procedures ◽

Loss Experiment

An improved loss function free of sampling procedures is proposed to improve the ill-performed classification by sample shortage. Adjustable parameters are used to expand the loss scope, minimize the weight of easily classified samples, and further substitute the sampling function, which are added to the cross-entropy loss and the SoftMax loss. Experiment results indicate that improvements in all classification performance of our loss function are shown in various network architectures and on different datasets. To summarize, compared with traditional loss functions, our improved version not only elevates classification performance but also lowers the difficulty of network training.

Download Full-text

Deep Attentive Ranking Networks for Learning to Order Sentences

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6323 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8115-8122

Author(s):

Pawan Kumar ◽

Dhanajit Brahma ◽

Harish Karnick ◽

Piyush Rai

Keyword(s):

Loss Function ◽

State Of The Art ◽

Loss Functions ◽

Evaluation Metrics ◽

Invariant Representation ◽

Art Methods ◽

Ranking Loss ◽

Order Invariant

We present an attention-based ranking framework for learning to order sentences given a paragraph. Our framework is built on a bidirectional sentence encoder and a self-attention based transformer network to obtain an input order invariant representation of paragraphs. Moreover, it allows seamless training using a variety of ranking based loss functions, such as pointwise, pairwise, and listwise ranking. We apply our framework on two tasks: Sentence Ordering and Order Discrimination. Our framework outperforms various state-of-the-art methods on these tasks on a variety of evaluation metrics. We also show that it achieves better results when using pairwise and listwise ranking losses, rather than the pointwise ranking loss, which suggests that incorporating relative positions of two or more sentences in the loss function contributes to better learning.

Download Full-text

BVDT: A Boosted Vector Decision Tree Algorithm for Multi-Class Classification Problems

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001417500161 ◽

2017 ◽

Vol 31 (05) ◽

pp. 1750016 ◽

Cited By ~ 3

Author(s):

Kaiyuan Wu ◽

Zhiming Zheng ◽

Shaoting Tang

Keyword(s):

Decision Tree ◽

Loss Function ◽

Nearest Neighbor ◽

State Of The Art ◽

Support Vector ◽

Classification Problems ◽

Regression Problem ◽

Weak Learner ◽

Boosting Algorithms ◽

Multi Class Classification

In this paper, we propose a powerful weak learner (Vector Decision Tree (VDT)) and a new Boosted Vector Decision Tree (BVDT) algorithm framework for the task of multi-class classification. Unlike the traditional scalar valued boosting algorithms, the BVDT algorithm directly maps the feature space to the decision space in the multi-class setting, which facilitates convenient implementations of the multi-class classification algorithms using diverse loss functions. By viewing the explicit hard threshold on the leaf node value applied in the LogitBoost as a constraint optimization problem, we further develop two new variants of the BVDT algorithm: the [Formula: see text]-BVDT and the [Formula: see text]-BVDT. The performance of the proposed algorithm is evaluated on different datasets and compared with three state-of-the-art boosting algorithms, [Formula: see text]-Nearest Neighbor (KNN) and Support Vector Machine (SVM). The results show that the performance of the proposed algorithm ranks first in all but one dataset and reduces the test error rate by 4% up to 58% with respect to the state-of-the-art boosting algorithms based on the scalar-valued weak learner. Furthermore, we present a case study on the Abalone dataset by designing a new loss function that combines the negative log-likelihood loss function of classification problem and square loss function of regression problem.

Download Full-text

PatentNet: multi-label classification of patent documents using deep learning based language understanding

Scientometrics ◽

10.1007/s11192-021-04179-4 ◽

2021 ◽

Author(s):

Arousha Haghighian Roudsari ◽

Jafar Afshar ◽

Wookey Lee ◽

Suan Lee

Keyword(s):

Deep Learning ◽

Language Processing ◽

State Of The Art ◽

Classification Performance ◽

Fine Tuning ◽

Language Models ◽

Classification Task ◽

Domain Experts ◽

Patent Classification ◽

Patent Documents

AbstractPatent classification is an expensive and time-consuming task that has conventionally been performed by domain experts. However, the increase in the number of filed patents and the complexity of the documents make the classification task challenging. The text used in patent documents is not always written in a way to efficiently convey knowledge. Moreover, patent classification is a multi-label classification task with a large number of labels, which makes the problem even more complicated. Hence, automating this expensive and laborious task is essential for assisting domain experts in managing patent documents, facilitating reliable search, retrieval, and further patent analysis tasks. Transfer learning and pre-trained language models have recently achieved state-of-the-art results in many Natural Language Processing tasks. In this work, we focus on investigating the effect of fine-tuning the pre-trained language models, namely, BERT, XLNet, RoBERTa, and ELECTRA, for the essential task of multi-label patent classification. We compare these models with the baseline deep-learning approaches used for patent classification. We use various word embeddings to enhance the performance of the baseline models. The publicly available USPTO-2M patent classification benchmark and M-patent datasets are used for conducting experiments. We conclude that fine-tuning the pre-trained language models on the patent text improves the multi-label patent classification performance. Our findings indicate that XLNet performs the best and achieves a new state-of-the-art classification performance with respect to precision, recall, F1 measure, as well as coverage error, and LRAP.

Download Full-text

Double Additive Margin Softmax Loss for Face Recognition

Applied Sciences ◽

10.3390/app10010060 ◽

2019 ◽

Vol 10 (1) ◽

pp. 60 ◽

Cited By ~ 1

Author(s):

Shengwei Zhou ◽

Caikou Chen ◽

Guojiang Han ◽

Xielian Hou

Keyword(s):

Neural Networks ◽

Face Recognition ◽

Loss Function ◽

State Of The Art ◽

Feature Learning ◽

Loss Functions ◽

Deep Convolutional Neural Networks ◽

Large Margin ◽

Face Features ◽

Geometrical Explanation

Learning large-margin face features whose intra-class variance is small and inter-class diversity is one of important challenges in feature learning applying Deep Convolutional Neural Networks (DCNNs) for face recognition. Recently, an appealing line of research is to incorporate an angular margin in the original softmax loss functions for obtaining discriminative deep features during the training of DCNNs. In this paper we propose a novel loss function, termed as double additive margin Softmax loss (DAM-Softmax). The presented loss has a clearer geometrical explanation and can obtain highly discriminative features for face recognition. Extensive experimental evaluation of several recent state-of-the-art softmax loss functions are conducted on the relevant face recognition benchmarks, CASIA-Webface, LFW, CALFW, CPLFW, and CFP-FP. We show that the proposed loss function consistently outperforms the state-of-the-art.

Download Full-text

A Machine Learning Methodology for Identification and Triage of Heart Failure Exacerbations

Journal of Cardiovascular Translational Research ◽

10.1007/s12265-021-10151-7 ◽

2021 ◽

Author(s):

James Morrill ◽

Klajdi Qirko ◽

Jacob Kelly ◽

Andrew Ambrosy ◽

Botros Toro ◽

...

Keyword(s):

Machine Learning ◽

Heart Failure ◽

Real Time ◽

Model Performance ◽

Classification Performance ◽

Self Awareness ◽

Consensus Opinion ◽

The Usa ◽

The Individual ◽

At Home

Abstract Inadequate at-home management and self-awareness of heart failure (HF) exacerbations are known to be leading causes of the greater than 1 million estimated HF-related hospitalizations in the USA alone. Most current at-home HF management protocols include paper guidelines or exploratory health applications that lack rigor and validation at the level of the individual patient. We report on a novel triage methodology that uses machine learning predictions for real-time detection and assessment of exacerbations. Medical specialist opinions on statistically and clinically comprehensive, simulated patient cases were used to train and validate prediction algorithms. Model performance was assessed by comparison to physician panel consensus in a representative, out-of-sample validation set of 100 vignettes. Algorithm prediction accuracy and safety indicators surpassed all individual specialists in identifying consensus opinion on existence/severity of exacerbations and appropriate treatment response. The algorithms also scored the highest sensitivity, specificity, and PPV when assessing the need for emergency care. Lay summary Here we develop a machine-learning approach for providing real-time decision support to adults diagnosed with congestive heart failure. The algorithm achieves higher exacerbation and triage classification performance than any individual physician when compared to physician consensus opinion. Graphical abstract

Download Full-text

Valence electron spectroscopy of inhomogeneous media

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100130389 ◽

1992 ◽

Vol 50 (2) ◽

pp. 1150-1151

Author(s):

A. Howie ◽

D.W. McComb

Keyword(s):

Specific Gravity ◽

Loss Function ◽

Electron Spectroscopy ◽

Valence Electron ◽

Peak Height ◽

Loss Functions ◽

Loss Peak ◽

High Energies ◽

Valence Electron Density ◽

Electron Microscopist

The bulk loss function Im(-l/ε (ω)), a well established tool for the interpretation of valence loss spectra, is being progressively adapted to the wide variety of inhomogeneous samples of interest to the electron microscopist. Proportionality between n, the local valence electron density, and ε-1 (Sellmeyer's equation) has sometimes been assumed but may not be valid even in homogeneous samples. Figs. 1 and 2 show the experimentally measured bulk loss functions for three pure silicates of different specific gravity ρ - quartz (ρ = 2.66), coesite (ρ = 2.93) and a zeolite (ρ = 1.79). Clearly, despite the substantial differences in density, the shift of the prominent loss peak is very small and far less than that predicted by scaling e for quartz with Sellmeyer's equation or even the somewhat smaller shift given by the Clausius-Mossotti (CM) relation which assumes proportionality between n (or ρ in this case) and (ε - 1)/(ε + 2). Both theories overestimate the rise in the peak height for coesite and underestimate the increase at high energies.

Download Full-text