scholarly journals MBV: a method to solve sample mislabeling and detect technical bias in large combined genotype and sequencing assay datasets

2017 ◽  
Vol 33 (12) ◽  
pp. 1895-1897 ◽  
Author(s):  
Alexandre Fort ◽  
Nikolaos I Panousis ◽  
Marco Garieri ◽  
Stylianos E Antonarakis ◽  
Tuuli Lappalainen ◽  
...  
Keyword(s):  
2013 ◽  
Vol 12 ◽  
pp. CIN.S12862 ◽  
Author(s):  
Martin Lauss ◽  
Ilhami Visne ◽  
Albert Kriegner ◽  
Markus Ringnér ◽  
Göran Jönsson ◽  
...  

High-dimensional datasets can be confounded by variation from technical sources, such as batches. Undetected batch effects can have severe consequences for the validity of a study's conclusion(s). We evaluate high-throughput RNAseq and miRNAseq as well as DNA methylation and gene expression microarray datasets, mainly from the Cancer Genome Atlas (TCGA) project, in respect to technical and biological annotations. We observe technical bias in these datasets and discuss corrective interventions. We then suggest a general procedure to control study design, detect technical bias using linear regression of principal components, correct for batch effects, and re-evaluate principal components. This procedure is implemented in the R package swamp, and as graphical user interface software. In conclusion, high-throughput platforms that generate continuous measurements are sensitive to various forms of technical bias. For such data, monitoring of technical variation is an important analysis step.


Author(s):  
Yanan Xue ◽  
Yinan Xue ◽  
Zhengcai Wang ◽  
Yongzhen Mo ◽  
Pinyan Wang ◽  
...  

Abstract Background: We aimed to identify immune-related signature for predicting cutaneous melanoma (CM) prognosis. Methods: We used TCGA samples (n=471) to develop the best 23 Immune related gene pairs (23-IRGP) prognostic signature and divided patients into high- and low-immune risk group in TCGA dataset and validation datasets: GSE65904 (n=214), GSE59455 (n=141), and GSE22153 (n=79). Results: 23-IRGP presented precise ability in cutaneous melanoma (CM) which high-risk groups showed poor prognosis and indicated significant predict power in immune micro-environment and biological analysis as well. Conclusions: we established a novel promising prognostic model in CM and built the bridge between immune micro-environment and CM patient results. This approach can be applied to discover the signatures in other diseases without technical bias from different platforms.


EMBO Reports ◽  
2021 ◽  
Vol 22 (2) ◽  
Author(s):  
Philip Hunter
Keyword(s):  

2020 ◽  
Vol 48 (8) ◽  
pp. e46-e46 ◽  
Author(s):  
Michael Scherer ◽  
Almut Nebel ◽  
Andre Franke ◽  
Jörn Walter ◽  
Thomas Lengauer ◽  
...  

Abstract DNA methylation is an epigenetic mark with important regulatory roles in cellular identity and can be quantified at base resolution using bisulfite sequencing. Most studies are limited to the average DNA methylation levels of individual CpGs and thus neglect heterogeneity within the profiled cell populations. To assess this within-sample heterogeneity (WSH) several window-based scores that quantify variability in DNA methylation in sequencing reads have been proposed. We performed the first systematic comparison of four published WSH scores based on simulated and publicly available datasets. Moreover, we propose two new scores and provide guidelines for selecting appropriate scores to address cell-type heterogeneity, cellular contamination and allele-specific methylation. Most of the measures were sensitive in detecting DNA methylation heterogeneity in these scenarios, while we detected differences in susceptibility to technical bias. Using recently published DNA methylation profiles of Ewing sarcoma samples, we show that DNA methylation heterogeneity provides information complementary to the DNA methylation level. WSH scores are powerful tools for estimating variance in DNA methylation patterns and have the potential for detecting novel disease-associated genomic loci not captured by established statistics. We provide an R-package implementing the WSH scores for integration into analysis workflows.


2000 ◽  
Vol 64 (1) ◽  
pp. 43-57 ◽  
Author(s):  
W. Compston

AbstractIon probe data for zircons from tuffs within the Llfynant flags (Arenig) and the Serw Formation (lower Llanvirn) of north Wales have been revised using better statistical methods for separating detrital ages, making allowance for recently-found variability in radiogenic 206Pb/238U in the reference zircon SL13, and testing the sensitivity of the ages to the secondary ion discrimination slope. The revised ages are options of 469.2 ± 2.1 (σ) or 472.9 ± 2.9 Ma for the Llfynant flags dependent on mixture modelling, and 465.3 ± 1.4 Ma for the Serw Formation. All ages are within error of previous SHRIMP results and the Serw age now has the same numerical value as a previous MSID age for the same sample. It is shown that an MSID age of 483 ± 0.5 Ma with interpreted Pb loss for a late Tremadoc bentonite is dependent on the correction for common Pb, and that a slightly more radiogenic choice for the common Pb composition places nearly all data on Concordia. The latter would indicate that the bentonite might contain two zircon populations: inherited grains at 482 Ma and tuff magmatic grains at 473 Ma, which is more compatible with the SHRIMP Arenig result. Interpretations of other MSID zircon ages from the Ordovician are also sensitive to choice of common Pb, and raise the likelihood that many multigrain ages might be too old owing to admixture with slightly older inherited zircon. A supposed 1–2% technical bias of SHRIMP 206Pb/238U ages relative to MSID is refuted.


The Breast ◽  
2013 ◽  
Vol 22 (5) ◽  
pp. 974-979 ◽  
Author(s):  
Jose A. Pérez-Fidalgo ◽  
Pilar Eroles ◽  
Jaime Ferrer ◽  
Ana Bosch ◽  
Octavio Burgués ◽  
...  

2015 ◽  
Author(s):  
Stephane E Castel ◽  
Ami Levy-Moonshine ◽  
Pejman Mohammadi ◽  
Eric Banks ◽  
Tuuli Lappalainen

Allelic expression (AE) analysis has become an important tool for integrating genome and transcriptome data to characterize various biological phenomena such as cis-regulatory variation and nonsense-mediated decay. In this paper, we systematically analyze the properties of AE read count data and technical sources of error, such as low-quality or double-counted RNA-seq reads, genotyping errors, allelic mapping bias, and technical covariates due to sample preparation and sequencing, and variation in total read depth. We provide guidelines for correcting and filtering for such errors, and show that the resulting AE data has extremely low technical noise. Finally, we introduce novel software for high-throughput production of AE data from RNA-sequencing data, implemented in the GATK framework. These improved tools and best practices for AE analysis yield higher quality AE data by reducing technical bias. This provides a practical framework for wider adoption of AE analysis by the genomics community.


2017 ◽  
Author(s):  
MD Giraldez ◽  
RM Spengler ◽  
A Etheridge ◽  
PM Godoy ◽  
AJ Barczak ◽  
...  

AbstractSmall RNA-seq is increasingly being used for profiling of small RNAs. Quantitative characteristics of long RNA-seq have been extensively described, but small RNA-seq involves fundamentally different methods for library preparation, with distinct protocols and technical variations that have not been fully and systematically studied. We report here the results of a study using common references (synthetic RNA pools of defined composition, as well as plasma-derived RNA) to evaluate the accuracy, reproducibility and bias of small RNA-seq library preparation for five distinct protocols and across nine different laboratories. We observed protocol-specific and sequence-specific bias, which was ameliorated using adapters for ligation with randomized end-nucleotides, and computational correction factors. Despite this technical bias, relative quantification using small RNA-seq was remarkably accurate and reproducible, even across multiple laboratories using different methods. These results provide strong evidence for the feasibility of reproducible cross-laboratory small RNA-seq studies, even those involving analysis of data generated using different protocols.


Lab on a Chip ◽  
2015 ◽  
Vol 15 (8) ◽  
pp. 1822-1834 ◽  
Author(s):  
Christian Dusny ◽  
Alexander Grünberger ◽  
Christopher Probst ◽  
Wolfgang Wiechert ◽  
Dietrich Kohlheyer ◽  
...  

The cross-platform comparison of three different single-cell cultivation methods demonstrates technical influences on biological key parameters like specific growth rate, division rate and cellular morphology.


Sign in / Sign up

Export Citation Format

Share Document