Efficient pan-cancer whole-slide image classification and outlier detection using convolutional neural networks

AbstractVisual analysis of solid tissue mounted on glass slides is currently the primary method used by pathologists for determining the stage, type and subtypes of cancer. Although whole slide images are usually large (10s to 100s thousands pixels wide), an exhaustive though time-consuming assessment is necessary to reduce the risk of misdiagnosis. In an effort to address the many diagnostic challenges faced by trained experts, recent research has been focused on developing automatic prediction systems for this multi-class classification problem. Typically, complex convolutional neural network (CNN) architectures, such as Google’s Inception, are used to tackle this problem. Here, we introduce a greatly simplified CNN architecture, PathCNN, which allows for more efficient use of computational resources and better classification performance. Using this improved architecture, we trained simultaneously on whole-slide images from multiple tumor sites and corresponding non-neoplastic tissue. Dimensionality reduction analysis of the weights of the last layer of the network capture groups of images that faithfully represent the different types of cancer, highlighting at the same time differences in staining and capturing outliers, artifacts and misclassification errors. Our code is available online at: https://github.com/sedab/PathCNN.

Download Full-text

Signal Classification Algorithms over Time Selective Channels

Electronics ◽

10.3390/electronics10141714 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1714

Author(s):

Mohamed Marey ◽

Hala Mostafa

Keyword(s):

Block Code ◽

Classification Problem ◽

Classification Performance ◽

Signal Classification ◽

Final Decision ◽

Time Block ◽

Channel Response ◽

Over Time ◽

Parallel Fashion

In this work, we propose a general framework to design a signal classification algorithm over time selective channels for wireless communications applications. We derive an upper bound on the maximum number of observation samples over which the channel response is an essential invariant. The proposed framework relies on dividing the received signal into blocks, and each of them has a length less than the mentioned bound. Then, these blocks are fed into a number of classifiers in a parallel fashion. A final decision is made through a well-designed combiner and detector. As a case study, we employ the proposed framework on a space-time block-code classification problem by developing two combiners and detectors. Monte Carlo simulations show that the proposed framework is capable of achieving excellent classification performance over time selective channels compared to the conventional algorithms.

Download Full-text

Data Augmentation and Spectral Structure Features for Limited Samples Hyperspectral Classification

Remote Sensing ◽

10.3390/rs13040547 ◽

2021 ◽

Vol 13 (4) ◽

pp. 547

Author(s):

Wenning Wang ◽

Xuebin Liu ◽

Xuanqin Mou

Keyword(s):

Classification Accuracy ◽

Data Augmentation ◽

Classification Problem ◽

Classification Performance ◽

Spectral Structure ◽

Limited Sample ◽

Sample Classification ◽

Training Samples ◽

Traditional Classification ◽

Hyperspectral Classification

For both traditional classification and current popular deep learning methods, the limited sample classification problem is very challenging, and the lack of samples is an important factor affecting the classification performance. Our work includes two aspects. First, the unsupervised data augmentation for all hyperspectral samples not only improves the classification accuracy greatly with the newly added training samples, but also further improves the classification accuracy of the classifier by optimizing the augmented test samples. Second, an effective spectral structure extraction method is designed, and the effective spectral structure features have a better classification accuracy than the true spectral features.

Download Full-text

Confidence interval for micro-averaged F1 and macro-averaged F1 scores

Applied Intelligence ◽

10.1007/s10489-021-02635-5 ◽

2021 ◽

Author(s):

Kanae Takahashi ◽

Kouji Yamamoto ◽

Aya Kuchiba ◽

Tatsuki Koyama

Keyword(s):

Binary Classification ◽

Classification Problem ◽

Classification Problems ◽

Summary Measure ◽

Medical Field ◽

Predictive Values ◽

Binary Classification Problem ◽

Multi Class Classification ◽

Sensitivity Specificity ◽

Measures Of Performance

AbstractA binary classification problem is common in medical field, and we often use sensitivity, specificity, accuracy, negative and positive predictive values as measures of performance of a binary predictor. In computer science, a classifier is usually evaluated with precision (positive predictive value) and recall (sensitivity). As a single summary measure of a classifier’s performance, F1 score, defined as the harmonic mean of precision and recall, is widely used in the context of information retrieval and information extraction evaluation since it possesses favorable characteristics, especially when the prevalence is low. Some statistical methods for inference have been developed for the F1 score in binary classification problems; however, they have not been extended to the problem of multi-class classification. There are three types of F1 scores, and statistical properties of these F1 scores have hardly ever been discussed. We propose methods based on the large sample multivariate central limit theorem for estimating F1 scores with confidence intervals.

Download Full-text

SEMANTIC SEGMENTATION OF BENTHIC COMMUNITIES FROM ORTHO-MOSAIC MAPS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w10-151-2019 ◽

2019 ◽

Vol XLII-2/W10 ◽

pp. 151-158 ◽

Cited By ~ 5

Author(s):

G. Pavoni ◽

M. Corsini ◽

M. Callieri ◽

M. Palma ◽

R. Scopigno

Keyword(s):

Visual Analysis ◽

Marine Organism ◽

Benthic Communities ◽

Semantic Segmentation ◽

Classification Performance ◽

Training Dataset ◽

Non Invasive ◽

Visual Sampling ◽

Organism Identification

Abstract. Visual sampling techniques represent a valuable resource for a rapid, non-invasive data acquisition for underwater monitoring purposes. Long-term monitoring projects usually requires the collection of large quantities of data, and the visual analysis of a human expert operator remains, in this context, a very time consuming task. It has been estimated that only the 1-2% of the acquired images are later analyzed by scientists (Beijbom et al., 2012). Strategies for the automatic recognition of benthic communities are required to effectively exploit all the information contained in visual data. Supervised learning methods, the most promising classification techniques in this field, are commonly affected by two recurring issues: the wide diversity of marine organism, and the small amount of labeled data. In this work, we discuss the advantages offered by the use of annotated high resolution ortho-mosaics of seabed to classify and segment the investigated specimens, and we suggest several strategies to obtain a considerable per-pixel classification performance although the use of a reduced training dataset composed by a single ortho-mosaic. The proposed methodology can be applied to a large number of different species, making the procedure of marine organism identification an highly adaptable task.

Download Full-text

A Comparative Study of Associative Classifiers in Mesenchymal Stem Cell Differentiation Analysis

Data Mining ◽

10.4018/978-1-4666-2455-9.ch049 ◽

2013 ◽

pp. 970-990

Author(s):

Weiqi Wang ◽

Yanbo J. Wang ◽

Qin Xin ◽

René Bañares-Alcántara ◽

Frans Coenen ◽

...

Keyword(s):

Stem Cell ◽

Association Rules ◽

Stem Cell Differentiation ◽

Classification Problem ◽

General Context ◽

Mesenchymal Stem Cell Differentiation ◽

Multiple Association ◽

Multi Class Classification ◽

Associative Classifiers

Discovering how Mesenchymal Stem Cells (MSCs) can be differentiated is an important topic in stem cell therapy and tissue engineering. In a general context, such differentiation analysis can be modeled as a classification problem in data mining. Specifically, this is concerned with the single-label multi-class classification task. Previous studies on this topic suggests the Associative Classification (AC) rather than other alternative (Classification) techniques, and presented classification results based on the CMAR (Classification based on Multiple Association Rules) associative classifier. Other AC algorithms include: CBA (Classification Based on Associations), PRM (Predictive Rule Mining), CPAR (Classification based on Predictive Association Rules) and TFPC (Total From Partial Classification). The main aim of this chapter is to compare the performance of different associative classifiers, in terms of classification accuracy, efficiency, number of rules to be generated, quality of such rules, and the maximum number of attributes in rule-antecedents, with respect to MSC differentiation analysis.

Download Full-text

Intelligent Neural Network Schemes for Multi-Class Classification

Applied Sciences ◽

10.3390/app9194036 ◽

2019 ◽

Vol 9 (19) ◽

pp. 4036 ◽

Cited By ~ 1

Author(s):

You ◽

Wu ◽

Lee ◽

Liu

Keyword(s):

Neural Network ◽

Clustering Algorithm ◽

Classification Problem ◽

Machine Learning Techniques ◽

Training Dataset ◽

Reduction Techniques ◽

Learning Techniques ◽

Benchmark Datasets ◽

Dimensionality Reduction Techniques ◽

Multi Class Classification

Multi-class classification is a very important technique in engineering applications, e.g., mechanical systems, mechanics and design innovations, applied materials in nanotechnologies, etc. A large amount of research is done for single-label classification where objects are associated with a single category. However, in many application domains, an object can belong to two or more categories, and multi-label classification is needed. Traditionally, statistical methods were used; recently, machine learning techniques, in particular neural networks, have been proposed to solve the multi-class classification problem. In this paper, we develop radial basis function (RBF)-based neural network schemes for single-label and multi-label classification, respectively. The number of hidden nodes and the parameters involved with the basis functions are determined automatically by applying an iterative self-constructing clustering algorithm to the given training dataset, and biases and weights are derived optimally by least squares. Dimensionality reduction techniques are adopted and integrated to help reduce the overfitting problem associated with the RBF networks. Experimental results from benchmark datasets are presented to show the effectiveness of the proposed schemes.

Download Full-text

Detection of Tampering by Image Resizing Using Local Tchebichef Moments

Applied Sciences ◽

10.3390/app9153007 ◽

2019 ◽

Vol 9 (15) ◽

pp. 3007

Author(s):

Dengyong Zhang ◽

Shanshan Wang ◽

Jin Wang ◽

Arun Kumar Sangaiah ◽

Feng Li ◽

...

Keyword(s):

Texture Classification ◽

Binary Classification ◽

Classification Problem ◽

Seam Carving ◽

Spatial Correlations ◽

Image Resizing ◽

Tchebichef Moments ◽

Output Code ◽

Universal Detection ◽

Multi Class Classification

There are many image resizing techniques, which include scaling, scale-and-stretch, seam carving, and so on. They have their own advantages and are suitable for different application scenarios. Therefore, a universal detection of tampering by image resizing is more practical. By preliminary experiments, we found that no matter which image resizing technique is adopted, it will destroy local texture and spatial correlations among adjacent pixels to some extent. Due to the excellent performance of local Tchebichef moments (LTM) in texture classification, we are motivated to present a detection method of tampering by image resizing using LTM in this paper. The tampered images are obtained by removing the pixels from original images using image resizing (scaling, scale-and-stretch and seam carving). Firstly, the residual is obtained by image pre-processing. Then, the histogram features of LTM are extracted from the residual. Finally, an error-correcting output code strategy is adopted by ensemble learning, which turns a multi-class classification problem into binary classification sub-problems. Experimental results show that the proposed approach can obtain an acceptable detection accuracies for the three content-aware image re-targeting techniques.

Download Full-text

Semi-supervised learning using autodidactic interpolation on sparse representation-based multiple one-dimensional embedding

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691319500139 ◽

2019 ◽

Vol 17 (03) ◽

pp. 1950013 ◽

Cited By ~ 3

Author(s):

Hao Deng ◽

Chao Ma ◽

Lijun Shen ◽

Chuanwu Yang

Keyword(s):

Sparse Representation ◽

Euclidean Distance ◽

Main Idea ◽

Classification Problem ◽

Classification Performance ◽

One Dimensional ◽

Adaptive Interpolation ◽

Shortest Distance ◽

The Common ◽

Sample Set

In this paper, we present a novel semi-supervised classification method based on sparse representation (SR) and multiple one-dimensional embedding-based adaptive interpolation (M1DEI). The main idea of M1DEI is to embed the data into multiple one-dimensional (1D) manifolds satisfying that the connected samples have shortest distance. In this way, the problem of high-dimensional data classification is transformed into a 1D classification problem. By alternating interpolation and averaging on the multiple 1D manifolds, the labeled sample set of the data can enlarge gradually. Obviously, proper metric facilitates more accurate embedding and further helps improve the classification performance. We develop a SR-based metric, which measures the affinity between samples more accurately than the common Euclidean distance. The experimental results on several databases show the effectiveness of the improvement.

Download Full-text

Analysis of Vocal Tract Characteristics for Near-term Suicidal Risk Assessment

Methods of Information in Medicine ◽

10.1055/s-0038-1633420 ◽

2004 ◽

Vol 43 (01) ◽

pp. 36-38 ◽

Cited By ~ 17

Author(s):

A. Ozdas ◽

D. M. Wilkes ◽

M. K. Silverman ◽

S. E. Silverman ◽

R. G. Shiavi

Keyword(s):

Vocal Tract ◽

Classification Performance ◽

Gaussian Mixtures ◽

Suicidal Risk ◽

Depressed Patients ◽

Discriminating Power ◽

Cepstral Coefficients ◽

Near Term ◽

The Many ◽

Suicidal Patients

Summary Objectives: Among the many clinical decisions that psychiatrists must make, assessment of a patient’s risk of committing suicide is definitely among the most important, complex and demanding. One of the authors reviewing his clinical experience observed that successful predictions of suicidality were often based on the patient’s voice independent of content. The voices of suicidal patients exhibited unique qualities, which distinguished them from non-suicidal patients. In this study we investigated the discriminating power of lower order mel-cepstral coefficients among suicidal, major depressed, and non-suicidal patients. Methods: Our sample consisted of 10 near-term suicidal patients, 10 major depressed patients, and 10 non-depressed control subjects. Gaussian mixtures were employed to model the class distributions of the extracted features. Results and Conclusions: As a result of two-sample ML classification analyses, first four mel-cepstral coefficients yielded exceptional classification performance with correct classification scores of 80% between near-term suicidal patients and non-depressed controls, 75% between depressed patients and non-depressed controls, and 80% between near-term suicidal patients and depressed patients.

Download Full-text

EBOC: Ensemble-Based Ordinal Classification in Transportation

Journal of Advanced Transportation ◽

10.1155/2019/7482138 ◽

2019 ◽

Vol 2019 ◽

pp. 1-17

Author(s):

Pelin Yıldırım ◽

Ulaş K. Birant ◽

Derya Birant

Keyword(s):

Historical Data ◽

Classification Problem ◽

Classification Performance ◽

Classification Algorithms ◽

Ordinal Classification ◽

Adaboost Algorithm ◽

Education And Health ◽

Transportation Sector ◽

C4.5 Decision Tree ◽

Target Attribute

Learning the latent patterns of historical data in an efficient way to model the behaviour of a system is a major need for making right decisions. For this purpose, machine learning solution has already begun its promising marks in transportation as well as in many areas such as marketing, finance, education, and health. However, many classification algorithms in the literature assume that the target attribute values in the datasets are unordered, so they lose inherent order between the class values. To overcome the problem, this study proposes a novel ensemble-based ordinal classification (EBOC) approach which suggests bagging and boosting (AdaBoost algorithm) methods as a solution for ordinal classification problem in transportation sector. This article also compares the proposed EBOC approach with ordinal class classifier and traditional tree-based classification algorithms (i.e., C4.5 decision tree, RandomTree, and REPTree) in terms of accuracy. The results indicate that the proposed EBOC approach achieves better classification performance than the conventional solutions.

Download Full-text