Abstract 893: Batch effects in tumor biomarker studies using tissue microarrays: Extent, impact, and remediation

Author(s):  
Konrad H. Stopsack ◽  
Molin Wang ◽  
Svitlana Tyekucheva ◽  
Travis A. Gerke ◽  
J. Bailey Vaselkiv ◽  
...  
2021 ◽  
Author(s):  
Konrad H. Stopsack ◽  
Svitlana Tyekucheva ◽  
Molin Wang ◽  
Travis A. Gerke ◽  
J. Bailey Vaselkiv ◽  
...  

Tissue microarrays (TMAs) have been used in thousands of cancer biomarker studies. To what extent batch effects, measurement error in biomarker levels between slides, affects TMA-based studies has not been assessed systematically. We evaluated 20 protein biomarkers on 14 TMAs with prospectively collected tumor tissue from 1,448 primary prostate cancers. In half of the biomarkers, more than 10% of biomarker variance was attributable to between-TMA differences (range, 1–48%). We implemented different methods to mitigate batch effects (R package batchtma), tested in plasmode simulation. Biomarker levels were more similar between mitigation approaches compared to uncorrected values. For some biomarkers, associations with clinical features changed substantially after addressing batch effects. Batch effects and resulting bias are not an error of an individual study but an inherent feature of TMA-based protein biomarker studies. They always need to be considered during study design and addressed analytically in studies using more than one TMA.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Konrad H Stopsack ◽  
Svitlana Tyekucheva ◽  
Molin Wang ◽  
Travis A Gerke ◽  
J Bailey Vaselkiv ◽  
...  

Tissue microarrays (TMAs) have been used in thousands of cancer biomarker studies. To what extent batch effects, measurement error in biomarker levels between slides, affects TMA-based studies has not been assessed systematically. We evaluated 20 protein biomarkers on 14 TMAs with prospectively collected tumor tissue from 1,448 primary prostate cancers. In half of the biomarkers, more than 10% of biomarker variance was attributable to between-TMA differences (range, 1-48%). We implemented different methods to mitigate batch effects (R package batchtma), tested in plasmode simulation. Biomarker levels were more similar between mitigation approaches compared to uncorrected values. For some biomarkers, associations with clinical features changed substantially after addressing batch effects. Batch effects and resulting bias are not an error of an individual study but an inherent feature of TMA-based protein biomarker studies. They always need to be considered during study design and addressed analytically in studies using more than one TMA.


Biomolecules ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. 1786
Author(s):  
Aurelia Bustos ◽  
Artemio Payá ◽  
Andrés Torrubia ◽  
Rodrigo Jover ◽  
Xavier Llor ◽  
...  

The prediction of microsatellite instability (MSI) using deep learning (DL) techniques could have significant benefits, including reducing cost and increasing MSI testing of colorectal cancer (CRC) patients. Nonetheless, batch effects or systematic biases are not well characterized in digital histology models and lead to overoptimistic estimates of model performance. Methods to not only palliate but to directly abrogate biases are needed. We present a multiple bias rejecting DL system based on adversarial networks for the prediction of MSI in CRC from tissue microarrays (TMAs), trained and validated in 1788 patients from EPICOLON and HGUA. The system consists of an end-to-end image preprocessing module that tile samples at multiple magnifications and a tissue classification module linked to the bias-rejecting MSI predictor. We detected three biases associated with the learned representations of a baseline model: the project of origin of samples, the patient’s spot and the TMA glass where each spot was placed. The system was trained to directly avoid learning the batch effects of those variables. The learned features from the bias-ablated model achieved maximum discriminative power with respect to the task and minimal statistical mean dependence with the biases. The impact of different magnifications, types of tissues and the model performance at tile vs patient level is analyzed. The AUC at tile level, and including all three selected tissues (tumor epithelium, mucin and lymphocytic regions) and 4 magnifications, was 0.87 ± 0.03 and increased to 0.9 ± 0.03 at patient level. To the best of our knowledge, this is the first work that incorporates a multiple bias ablation technique at the DL architecture in digital pathology, and the first using TMAs for the MSI prediction task.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Huijuan Ge ◽  
Yaoxin Xiao ◽  
Guangqi Qin ◽  
Yanzi Gu ◽  
Xu Cai ◽  
...  

Abstract Background Ovarian clear cell carcinoma (OCCC) is the second subtype of ovarian epithelial carcinoma reported to be closely related to Lynch syndrome (LS). ARID1A mutation is an important pathogenetic mechanism in OCCC that leads to loss of ARID1A expression in approximately half of OCCCs. However, the correlation of MMR status and ARID1A deficiency is unclear. The current study aimed to identify the clinical and histopathological characteristics of OCCC associated with dMMR and to further explore the association between dMMR and ARID1A deficiency. Methods A cohort of 176 primary OCCC patients was enrolled and review included histological characteristics (nuclear atypia, necrosis, mitosis, stromal hyalinization, and background precursors) and host inflammatory response (tumor-infiltrating lymphocytes, peritumoral lymphocytes, intratumoral stromal inflammation and plasma cell infiltration). Immunohistochemical staining of MLH1, PMS2, MSH2, MSH6 and ARID1A was performed using tissue microarrays. Results dMMR was detected in 10/176 tumors (6 %), followed by MSH2/MSH6 (6/176), MLH1/PMS2 (3/176), and MSH6 (1/176). The average age of patients with dMMR was younger than that of patients with intact MMR (46 y vs. 53 y). Tumors with diffuse intratumoral stromal inflammation remained significantly associated after multivariate analysis. ARID1A expression was absent in 8 patients with dMMR (8/10), which is a significantly higher frequency than that observed in patients with intact MMR (80 % vs. 43.2 %). Conclusions Our study indicates that diffuse intratumoral stromal inflammation of OCCCs is associated with dMMR, with loss of MSH2/MSH6 expression being most frequent. dMMR is strongly associated with the loss of ARID1A expression in OCCC.


2021 ◽  
Vol 9 (7) ◽  
pp. e002197
Author(s):  
Janis M Taube ◽  
Kristin Roman ◽  
Elizabeth L Engle ◽  
Chichung Wang ◽  
Carmen Ballesteros-Merino ◽  
...  

BackgroundEmerging data suggest predictive biomarkers based on the spatial arrangement of cells or coexpression patterns in tissue sections will play an important role in precision immuno-oncology. Multiplexed immunofluorescence (mIF) is ideally suited to such assessments. Standardization and validation of an end-to-end workflow that supports multisite trials and clinical laboratory processes are vital. Six institutions collaborated to: (1) optimize an automated six-plex assay focused on the PD-1/PD-L1 axis, (2) assess intersite and intrasite reproducibility of staining using a locked down image analysis algorithm to measure tumor cell and immune cell (IC) subset densities, %PD-L1 expression on tumor cells (TCs) and ICs, and PD-1/PD-L1 proximity assessments.MethodsA six-plex mIF panel (PD-L1, PD-1, CD8, CD68, FOXP3, and CK) was rigorously optimized as determined by quantitative equivalence to immunohistochemistry (IHC) chromogenic assays. Serial sections from tonsil and breast carcinoma and non-small cell lung cancer (NSCLC) tissue microarrays (TMAs), TSA-Opal fluorescent detection reagents, and antibodies were distributed to the six sites equipped with a Leica Bond Rx autostainer and a Vectra Polaris multispectral imaging platform. Tissue sections were stained and imaged at each site and delivered to a single site for analysis. Intersite and intrasite reproducibility were assessed by linear fits to plots of cell densities, including %PDL1 expression by TCs and ICs in the breast and NSCLC TMAs.ResultsComparison of the percent positive cells for each marker between mIF and IHC revealed that enhanced amplification in the mIF assay was required to detect low-level expression of PD-1, PD-L1, FoxP3 and CD68. Following optimization, an average equivalence of 90% was achieved between mIF and IHC across all six assay markers. Intersite and intrasite cell density assessments showed an average concordance of R2=0.75 (slope=0.92) and R2=0.88 (slope=0.93) for breast carcinoma, respectively, and an average concordance of R2=0.72 (slope=0.86) and R2=0.81 (slope=0.68) for NSCLC. Intersite concordance for %PD-L1+ICs had an average R2 value of 0.88 and slope of 0.92. Assessments of PD-1/PD-L1 proximity also showed strong concordance (R2=0.82; slope=0.75).ConclusionsAssay optimization yielded highly sensitive, reproducible mIF characterization of the PD-1/PD-L1 axis across multiple sites. High concordance was observed across sites for measures of density of specific IC subsets, measures of coexpression and proximity with single-cell resolution.


GigaScience ◽  
2020 ◽  
Vol 9 (11) ◽  
Author(s):  
Alexandra J Lee ◽  
YoSon Park ◽  
Georgia Doing ◽  
Deborah A Hogan ◽  
Casey S Greene

Abstract Motivation In the past two decades, scientists in different laboratories have assayed gene expression from millions of samples. These experiments can be combined into compendia and analyzed collectively to extract novel biological patterns. Technical variability, or "batch effects," may result from combining samples collected and processed at different times and in different settings. Such variability may distort our ability to extract true underlying biological patterns. As more integrative analysis methods arise and data collections get bigger, we must determine how technical variability affects our ability to detect desired patterns when many experiments are combined. Objective We sought to determine the extent to which an underlying signal was masked by technical variability by simulating compendia comprising data aggregated across multiple experiments. Method We developed a generative multi-layer neural network to simulate compendia of gene expression experiments from large-scale microbial and human datasets. We compared simulated compendia before and after introducing varying numbers of sources of undesired variability. Results The signal from a baseline compendium was obscured when the number of added sources of variability was small. Applying statistical correction methods rescued the underlying signal in these cases. However, as the number of sources of variability increased, it became easier to detect the original signal even without correction. In fact, statistical correction reduced our power to detect the underlying signal. Conclusion When combining a modest number of experiments, it is best to correct for experiment-specific noise. However, when many experiments are combined, statistical correction reduces our ability to extract underlying patterns.


Cancers ◽  
2021 ◽  
Vol 13 (2) ◽  
pp. 292
Author(s):  
Simona Crosta ◽  
Renzo Boldorini ◽  
Francesca Bono ◽  
Virginia Brambilla ◽  
Emanuele Dainese ◽  
...  

Immune checkpoint inhibitors for blocking the programmed cell death protein 1 (PD-1)/programmed death-ligand 1 (PD-L1) axis are now available for squamous cell carcinoma of the head and neck (HNSCC) in relapsing and/or metastatic settings. In this work, we compared the resulting combined positive score (CPS) of PD-L1 using alternative methods adopted in routine clinical practice and determined the level of diagnostic agreement and inter-observer reliability in this setting. The study applied 5 different protocols on 40 tissue microarrays from HNSCC. The error rate of the individual protocols ranged from a minimum of 7% to a maximum of 21%, the sensitivity from 79% to 96%, and the specificity from 50% to 100%. In the intermediate group (1 ≤ CPS < 20), the majority of errors consisted of an underestimation of PD-L1 expression. In strong expressors, 5 out of 14 samples (36%) were correctly evaluated by all the protocols, but no protocol was able to correctly identify all the “strong expressors”. The overall inter-observer agreement in PD-L1 CPS reached 87%. The inter-observer reliability was moderate, with an ICC of 0.774 (95% CI (0.651; 0.871)). In conclusion, our study showed moderate interobserver reliability among different protocols. In order to improve the performances, adequate specific training to evaluate PD-L1 by CPS in the HNSCC setting should be coordinated.


Sign in / Sign up

Export Citation Format

Share Document