Deep learning algorithms out-perform veterinary pathologists in detecting the mitotically most active tumor region

Marc Aubreville; Christof A. Bertram; Christian Marzahl; Corinne Gurtner; Martina Dettwiler; Anja Schmidt; Florian Bartenschlager; Sophie Merz; Marco Fragoso; Olivia Kershaw; Robert Klopfleisch; Andreas Maier

doi:10.1038/s41598-020-73246-2

Deep learning algorithms out-perform veterinary pathologists in detecting the mitotically most active tumor region

Scientific Reports ◽

10.1038/s41598-020-73246-2 ◽

2020 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Marc Aubreville ◽

Christof A. Bertram ◽

Christian Marzahl ◽

Corinne Gurtner ◽

Martina Dettwiler ◽

...

Keyword(s):

Deep Learning ◽

Ground Truth ◽

Mitotic Count ◽

Mitotic Figure ◽

Two Stage ◽

Data Set ◽

Area Selection ◽

Tumor Region ◽

Cutaneous Mast Cell ◽

Mitotic Figures

Abstract Manual count of mitotic figures, which is determined in the tumor region with the highest mitotic activity, is a key parameter of most tumor grading schemes. It can be, however, strongly dependent on the area selection due to uneven mitotic figure distribution in the tumor section. We aimed to assess the question, how significantly the area selection could impact the mitotic count, which has a known high inter-rater disagreement. On a data set of 32 whole slide images of H&E-stained canine cutaneous mast cell tumor, fully annotated for mitotic figures, we asked eight veterinary pathologists (five board-certified, three in training) to select a field of interest for the mitotic count. To assess the potential difference on the mitotic count, we compared the mitotic count of the selected regions to the overall distribution on the slide. Additionally, we evaluated three deep learning-based methods for the assessment of highest mitotic density: In one approach, the model would directly try to predict the mitotic count for the presented image patches as a regression task. The second method aims at deriving a segmentation mask for mitotic figures, which is then used to obtain a mitotic density. Finally, we evaluated a two-stage object-detection pipeline based on state-of-the-art architectures to identify individual mitotic figures. We found that the predictions by all models were, on average, better than those of the experts. The two-stage object detector performed best and outperformed most of the human pathologists on the majority of tumor cases. The correlation between the predicted and the ground truth mitotic count was also best for this approach (0.963–0.979). Further, we found considerable differences in position selection between pathologists, which could partially explain the high variance that has been reported for the manual mitotic count. To achieve better inter-rater agreement, we propose to use a computer-based area selection for support of the pathologist in the manual mitotic count.

Get full-text (via PubEx)

Computerized Calculation of Mitotic Count Distribution in Canine Cutaneous Mast Cell Tumor Sections: Mitotic Count Is Area Dependent

Veterinary Pathology ◽

10.1177/0300985819890686 ◽

2019 ◽

Vol 57 (2) ◽

pp. 214-226 ◽

Cited By ~ 6

Author(s):

Christof A. Bertram ◽

Marc Aubreville ◽

Corinne Gurtner ◽

Alexander Bartel ◽

Sarah M. Corner ◽

...

Keyword(s):

Mast Cell ◽

High Power ◽

Ground Truth ◽

Mitotic Count ◽

Automated Image Analysis ◽

High Power Field ◽

Borderline Cases ◽

Area Selection ◽

Cutaneous Mast Cell ◽

Mitotic Figures

Mitotic count (MC) is an important element for grading canine cutaneous mast cell tumors (ccMCTs) and is determined in 10 consecutive high-power fields with the highest mitotic activity. However, there is variability in area selection between pathologists. In this study, the MC distribution and the effect of area selection on the MC were analyzed in ccMCTs. Two pathologists independently annotated all mitotic figures in whole-slide images of 28 ccMCTs (ground truth). Automated image analysis was used to examine the ground truth distribution of the MC throughout the tumor section area, which was compared with the manual MCs of 11 pathologists. Computerized analysis demonstrated high variability of the MC within different tumor areas. There were 6 MCTs with consistently low MCs (MC<7 in all tumor areas), 13 cases with mostly high MCs (MC ≥7 in ≥75% of 10 high-power field areas), and 9 borderline cases with variable MCs around 7, which is a cutoff value for ccMCT grading. There was inconsistency among pathologists in identifying the areas with the highest density of mitotic figures throughout the 3 ccMCT groups; only 51.9% of the counts were consistent with the highest 25% of the ground truth MC distribution. Regardless, there was substantial agreement between pathologists in detecting tumors with MC ≥7. Falsely low MCs below 7 mainly occurred in 4 of 9 borderline cases that had very few ground truth areas with MC ≥7. The findings of this study highlight the need to further standardize how to select the region of the tumor in which to determine the MC.

Get full-text (via PubEx)

Ground-truth uncertainty-aware metrics for machine learning applications on seismic image interpretation: Application to faults and horizon extraction

The Leading Edge ◽

10.1190/tle39100734.1 ◽

2020 ◽

Vol 39 (10) ◽

pp. 734-741

Author(s):

Sébastien Guillon ◽

Frédéric Joncour ◽

Pierre-Emmanuel Barrallon ◽

Laurent Castanié

Keyword(s):

Deep Learning ◽

Image Interpretation ◽

Ground Truth ◽

Learning Model ◽

Seismic Interpretation ◽

Data Set ◽

Seismic Image ◽

Machine Learning Applications ◽

Deep Learning Model

We propose new metrics to measure the performance of a deep learning model applied to seismic interpretation tasks such as fault and horizon extraction. Faults and horizons are thin geologic boundaries (1 pixel thick on the image) for which a small prediction error could lead to inappropriately large variations in common metrics (precision, recall, and intersection over union). Through two examples, we show how classical metrics could fail to indicate the true quality of fault or horizon extraction. Measuring the accuracy of reconstruction of thin objects or boundaries requires introducing a tolerance distance between ground truth and prediction images to manage the uncertainties inherent in their delineation. We therefore adapt our metrics by introducing a tolerance function and illustrate their ability to manage uncertainties in seismic interpretation. We compare classical and new metrics through different examples and demonstrate the robustness of our metrics. Finally, we show on a 3D West African data set how our metrics are used to tune an optimal deep learning model.

Get full-text (via PubEx)

Deep learning-driven velocity model building workflow

The Leading Edge ◽

10.1190/tle38110872a1.1 ◽

2019 ◽

Vol 38 (11) ◽

pp. 872a1-872a9 ◽

Cited By ~ 4

Author(s):

Mauricio Araya-Polo ◽

Stuart Farris ◽

Manuel Florez

Keyword(s):

Deep Learning ◽

Seismic Data ◽

Model Building ◽

Ground Truth ◽

Velocity Model ◽

Training Data ◽

Quality Data ◽

Generative Adversarial Network ◽

Data Set ◽

Velocity Models

Exploration seismic data are heavily manipulated before human interpreters are able to extract meaningful information regarding subsurface structures. This manipulation adds modeling and human biases and is limited by methodological shortcomings. Alternatively, using seismic data directly is becoming possible thanks to deep learning (DL) techniques. A DL-based workflow is introduced that uses analog velocity models and realistic raw seismic waveforms as input and produces subsurface velocity models as output. When insufficient data are used for training, DL algorithms tend to overfit or fail. Gathering large amounts of labeled and standardized seismic data sets is not straightforward. This shortage of quality data is addressed by building a generative adversarial network (GAN) to augment the original training data set, which is then used by DL-driven seismic tomography as input. The DL tomographic operator predicts velocity models with high statistical and structural accuracy after being trained with GAN-generated velocity models. Beyond the field of exploration geophysics, the use of machine learning in earth science is challenged by the lack of labeled data or properly interpreted ground truth, since we seldom know what truly exists beneath the earth's surface. The unsupervised approach (using GANs to generate labeled data)illustrates a way to mitigate this problem and opens geology, geophysics, and planetary sciences to more DL applications.

Get full-text (via PubEx)

Mitotic Figure Recognition: Agreement among Pathologists and Computerized Detector

Analytical Cellular Pathology ◽

10.1155/2012/385271 ◽

2012 ◽

Vol 35 (2) ◽

pp. 97-100 ◽

Cited By ~ 24

Author(s):

Christopher Malon ◽

Elena Brachtel ◽

Eric Cosatto ◽

Hans Peter Graf ◽

Atsushi Kurata ◽

...

Keyword(s):

Ground Truth ◽

Mitotic Figure ◽

Mitosis Detection ◽

Prognostic Importance ◽

Hematoxylin And Eosin ◽

Mitotic Figures ◽

Grade 3 ◽

The Individual ◽

Computerized System ◽

Level Of Agreement

Despite the prognostic importance of mitotic count as one of the components of the Bloom – Richardson grade [3], several studies ([2, 9, 10]) have found that pathologists’ agreement on the mitotic grade is fairly modest. Collecting a set of more than 4,200 candidate mitotic figures, we evaluate pathologists' agreement on individual figures, and train a computerized system for mitosis detection, comparing its performance to the classifications of three pathologists. The system’s and the pathologists’ classifications are based on evaluation of digital micrographs of hematoxylin and eosin stained breast tissue. On figures where the majority of pathologists agree on a classification, we compare the performance of the trained system to that of the individual pathologists. We find that the level of agreement of the pathologists ranges from slight to moderate, with strong biases, and that the system performs competitively in rating the ground truth set. This study is a step towards automatic mitosis count to accelerate a pathologist's work and improve reproducibility.

Get full-text (via PubEx)

Computer-Assisted Mitotic Count Using a Deep Learning-based Algorithm Improves Inter-Observer Reproducibility and Accuracy in canine cutaneous mast cell tumors

10.1101/2021.06.04.446287 ◽

2021 ◽

Author(s):

Christof A Bertram ◽

Marc Aubreville ◽

Taryn A Donovan ◽

Alexander Bartel ◽

Frauke Wilm ◽

...

Keyword(s):

Mast Cell ◽

High Performance ◽

Ground Truth ◽

Mitotic Count ◽

Computer Assisted ◽

Malignant Neoplasms ◽

Mast Cell Tumors ◽

Computer Assistance ◽

Phosphohistone H3 ◽

Cutaneous Mast Cell

The mitotic count (MC) is an important histological parameter for prognostication of malignant neoplasms. However, it has inter- and intra-observer discrepancies due to difficulties in selecting the region of interest (MC-ROI) and in identifying/classifying mitotic figures (MFs). Recent progress in the field of artificial intelligence has allowed the development of high-performance algorithms that may improve standardization of the MC. As algorithmic predictions are not flawless, the computer-assisted review by pathologists may ensure reliability. In the present study we have compared partial (MC-ROI preselection) and full (additional visualization of MF candidate proposal and display of algorithmic confidence values) computer-assisted MC analysis to the routine (unaided) MC analysis by 23 pathologists for whole slide images of 50 canine cutaneous mast cell tumors (ccMCTs). Algorithmic predictions aimed to assist pathologists in detecting mitotic hotspot locations, reducing omission of MF and improving classification against imposters. The inter-observer consistency for the MC significantly increased with computer assistance (interobserver correlation coefficient, ICC = 0.92) compared to the unaided approach (ICC = 0.70). Classification into prognostic stratifications had a higher accuracy with computer assistance. The algorithmically preselected MC-ROIs had a consistently higher MCs than the manually selected MC-ROIs. Compared to a ground truth (developed with immunohistochemistry for phosphohistone H3), pathologist performance in detecting individual MF was augmented when using computer assistance (F1-score of 0.68 increased to 0.79) with a reduction in false negatives by 38%. The results of this study prove that computer assistance may lead to a more reproducible and accurate MCs in ccMCTs.

Get full-text (via PubEx)

Analysis of Tracheobronchial Diverticula Based on Semantic Segmentation of CT Images via the Dual-Channel Attention Network

Frontiers in Public Health ◽

10.3389/fpubh.2021.813717 ◽

2022 ◽

Vol 9 ◽

Author(s):

Maoyi Zhang ◽

Changqing Ding ◽

Shuli Guo

Keyword(s):

Deep Learning ◽

Ground Truth ◽

Semantic Segmentation ◽

Rapid Identification ◽

Diagnostic Process ◽

Automated Identification ◽

Data Set ◽

Attention Model ◽

Physiological Indicators ◽

Dual Channel

Tracheobronchial diverticula (TD) is a common cystic lesion that can be easily neglected; hence accurate and rapid identification is critical for later diagnosis. There is a strong need to automate this diagnostic process because traditional manual observations are time-consuming and laborious. However, most studies have only focused on the case report or listed the relationship between the disease and other physiological indicators, but a few have adopted advanced technologies such as deep learning for automated identification and diagnosis. To fill this gap, this study interpreted TD recognition as semantic segmentation and proposed a novel attention-based network for TD semantic segmentation. Since the area of TD lesion is small and similar to surrounding organs, we designed the atrous spatial pyramid pooling (ASPP) and attention mechanisms, which can efficiently complete the segmentation of TD with robust results. The proposed attention model can selectively gather features from different branches according to the amount of information they contain. Besides, to the best of our knowledge, no public research data is available yet. For efficient network training, we constructed a data set containing 218 TD and related ground truth (GT). We evaluated different models based on the proposed data set, among which the highest MIOU can reach 0.92. The experiments show that our model can outperform state-of-the-art methods, indicating that the deep learning method has great potential for TD recognition.

Get full-text (via PubEx)

A large-scale dataset for mitotic figure assessment on whole slide images of canine cutaneous mast cell tumor

Scientific Data ◽

10.1038/s41597-019-0290-4 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 5

Author(s):

Christof A. Bertram ◽

Marc Aubreville ◽

Christian Marzahl ◽

Andreas Maier ◽

Robert Klopfleisch

Keyword(s):

Mast Cell ◽

Large Scale ◽

Region Of Interest ◽

Mitotic Figure ◽

Low Grade ◽

Detection Algorithms ◽

Large Scale Dataset ◽

Cutaneous Mast Cell ◽

Mitotic Figures ◽

Whole Slide Images

AbstractWe introduce a novel, large-scale dataset for microscopy cell annotations. The dataset includes 32 whole slide images (WSI) of canine cutaneous mast cell tumors, selected to include both low grade cases as well as high grade cases. The slides have been completely annotated for mitotic figures and we provide secondary annotations for neoplastic mast cells, inflammatory granulocytes, and mitotic figure look-alikes. Additionally to a blinded two-expert manual annotation with consensus, we provide an algorithm-aided dataset, where potentially missed mitotic figures were detected by a deep neural network and subsequently assessed by two human experts. We included 262,481 annotations in total, out of which 44,880 represent mitotic figures. For algorithmic validation, we used a customized RetinaNet approach, followed by a cell classification network. We find F1-Scores of 0.786 and 0.820 for the manually labelled and the algorithm-aided dataset, respectively. The dataset provides, for the first time, WSIs completely annotated for mitotic figures and thus enables assessment of mitosis detection algorithms on complete WSIs as well as region of interest detection algorithms.

Get full-text (via PubEx)

Development of a Deep Learning Algorithm for Periapical Disease Detection in Dental Radiographs

Diagnostics ◽

10.3390/diagnostics10060430 ◽

2020 ◽

Vol 10 (6) ◽

pp. 430 ◽

Cited By ~ 1

Author(s):

Michael G. Endres ◽

Florian Hillen ◽

Marios Salloumis ◽

Ahmad R. Sedaghat ◽

Stefan M. Niehues ◽

...

Keyword(s):

Deep Learning ◽

Learning Algorithm ◽

Ground Truth ◽

True Positive Rate ◽

Radiographic Images ◽

Data Set ◽

Radiographic Findings ◽

Panoramic Radiographs ◽

Deep Learning Algorithm ◽

The Mean

Periapical radiolucencies, which can be detected on panoramic radiographs, are one of the most common radiographic findings in dentistry and have a differential diagnosis including infections, granuloma, cysts and tumors. In this study, we seek to investigate the ability with which 24 oral and maxillofacial (OMF) surgeons assess the presence of periapical lucencies on panoramic radiographs, and we compare these findings to the performance of a predictive deep learning algorithm that we have developed using a curated data set of 2902 de-identified panoramic radiographs. The mean diagnostic positive predictive value (PPV) of OMF surgeons based on their assessment of panoramic radiographic images was 0.69 (±0.13), indicating that dentists on average falsely diagnose 31% of cases as radiolucencies. However, the mean diagnostic true positive rate (TPR) was 0.51 (±0.14), indicating that on average 49% of all radiolucencies were missed. We demonstrate that the deep learning algorithm achieves a better performance than 14 of 24 OMF surgeons within the cohort, exhibiting an average precision of 0.60 (±0.04), and an F1 score of 0.58 (±0.04) corresponding to a PPV of 0.67 (±0.05) and TPR of 0.51 (±0.05). The algorithm, trained on limited data and evaluated on clinically validated ground truth, has potential to assist OMF surgeons in detecting periapical lucencies on panoramic radiographs.

Get full-text (via PubEx)

Pavement Image Datasets: A New Benchmark Dataset to Classify and Densify Pavement Distresses

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198120907283 ◽

2020 ◽

Vol 2674 (2) ◽

pp. 328-339 ◽

Cited By ~ 4

Author(s):

Hamed Majidifard ◽

Peng Jin ◽

Yaw Adu-Gyamfi ◽

William G. Buttlar

Keyword(s):

Deep Learning ◽

Ground Truth ◽

Application Programming Interface ◽

Assessment System ◽

Top Down ◽

Research Activity ◽

Data Set ◽

Ground Truth Data ◽

Pavement Distresses ◽

Learning Frameworks

Automated pavement distresses detection using road images remains a challenging topic in the computer vision research community. Recent developments in deep learning have led to considerable research activity directed towards improving the efficacy of automated pavement distress identification and rating. Deep learning models require a large ground truth data set, which is often not readily available in the case of pavements. In this study, a labeled dataset approach is introduced as a first step towards a more robust, easy-to-deploy pavement condition assessment system. The technique is termed herein as the pavement image dataset (PID) method. The dataset consists of images captured from two camera views of an identical pavement segment, that is, a wide view and a top-down view. The wide-view images were used to classify the distresses and to train the deep learning frameworks, while the top-down-view images allowed calculation of distress density, which will be used in future studies aimed at automated pavement rating. For the wide view group dataset, 7,237 images were manually annotated and distresses classified into nine categories. Images were extracted using the Google application programming interface (API), selecting street-view images using a python-based code developed for this project. The new dataset was evaluated using two mainstream deep learning frameworks: You Only Look Once (YOLO v2) and Faster Region Convolution Neural Network (Faster R-CNN). Accuracy scores using the F1 index were found to be 0.84 for YOLOv2 and 0.65 for the Faster R-CNN model runs; both quite acceptable considering the convenience of utilizing Google Maps images.

Get full-text (via PubEx)

Computer-assisted mitotic count using a deep learning–based algorithm improves interobserver reproducibility and accuracy

Veterinary Pathology ◽

10.1177/03009858211067478 ◽

2021 ◽

pp. 030098582110674

Author(s):

Christof A. Bertram ◽

Marc Aubreville ◽

Taryn A. Donovan ◽

Alexander Bartel ◽

Frauke Wilm ◽

...

Keyword(s):

High Performance ◽

Region Of Interest ◽

Ground Truth ◽

Mitotic Count ◽

Computer Assisted ◽

Malignant Neoplasms ◽

Mast Cell Tumors ◽

Computer Assistance ◽

Phosphohistone H3 ◽

Cutaneous Mast Cell

The mitotic count (MC) is an important histological parameter for prognostication of malignant neoplasms. However, it has inter- and intraobserver discrepancies due to difficulties in selecting the region of interest (MC-ROI) and in identifying or classifying mitotic figures (MFs). Recent progress in the field of artificial intelligence has allowed the development of high-performance algorithms that may improve standardization of the MC. As algorithmic predictions are not flawless, computer-assisted review by pathologists may ensure reliability. In the present study, we compared partial (MC-ROI preselection) and full (additional visualization of MF candidates and display of algorithmic confidence values) computer-assisted MC analysis to the routine (unaided) MC analysis by 23 pathologists for whole-slide images of 50 canine cutaneous mast cell tumors (ccMCTs). Algorithmic predictions aimed to assist pathologists in detecting mitotic hotspot locations, reducing omission of MFs, and improving classification against imposters. The interobserver consistency for the MC significantly increased with computer assistance (interobserver correlation coefficient, ICC = 0.92) compared to the unaided approach (ICC = 0.70). Classification into prognostic stratifications had a higher accuracy with computer assistance. The algorithmically preselected hotspot MC-ROIs had a consistently higher MCs than the manually selected MC-ROIs. Compared to a ground truth (developed with immunohistochemistry for phosphohistone H3), pathologist performance in detecting individual MF was augmented when using computer assistance (F1-score of 0.68 increased to 0.79) with a reduction in false negatives by 38%. The results of this study demonstrate that computer assistance may lead to more reproducible and accurate MCs in ccMCTs.

Get full-text (via PubEx)