spurious results
Recently Published Documents


TOTAL DOCUMENTS

125
(FIVE YEARS 31)

H-INDEX

19
(FIVE YEARS 2)

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Meifang Qi ◽  
Utthara Nayar ◽  
Leif S. Ludwig ◽  
Nikhil Wagle ◽  
Esther Rheinbay

Abstract Background Exogenous cDNA introduced into an experimental system, either intentionally or accidentally, can appear as added read coverage over that gene in next-generation sequencing libraries derived from this system. If not properly recognized and managed, this cross-contamination with exogenous signal can lead to incorrect interpretation of research results. Yet, this problem is not routinely addressed in current sequence processing pipelines. Results We present cDNA-detector, a computational tool to identify and remove exogenous cDNA contamination in DNA sequencing experiments. We demonstrate that cDNA-detector can identify cDNAs quickly and accurately from alignment files. A source inference step attempts to separate endogenous cDNAs (retrocopied genes) from potential cloned, exogenous cDNAs. cDNA-detector provides a mechanism to decontaminate the alignment from detected cDNAs. Simulation studies show that cDNA-detector is highly sensitive and specific, outperforming existing tools. We apply cDNA-detector to several highly-cited public databases (TCGA, ENCODE, NCBI SRA) and show that contaminant genes appear in sequencing experiments where they lead to incorrect coverage peak calls. Conclusions cDNA-detector is a user-friendly and accurate tool to detect and remove cDNA detection in NGS libraries. This two-step design reduces the risk of true variant removal since it allows for manual review of candidates. We find that contamination with intentionally and accidentally introduced cDNAs is an underappreciated problem even in widely-used consortium datasets, where it can lead to spurious results. Our findings highlight the importance of sensitive detection and removal of contaminant cDNA from NGS libraries before downstream analysis.


2021 ◽  
Vol 12 (5) ◽  
pp. 1-26
Author(s):  
Yiqun Xie ◽  
Xiaowei Jia ◽  
Shashi Shekhar ◽  
Han Bao ◽  
Xun Zhou

Cluster detection is important and widely used in a variety of applications, including public health, public safety, transportation, and so on. Given a collection of data points, we aim to detect density-connected spatial clusters with varying geometric shapes and densities, under the constraint that the clusters are statistically significant. The problem is challenging, because many societal applications and domain science studies have low tolerance for spurious results, and clusters may have arbitrary shapes and varying densities. As a classical topic in data mining and learning, a myriad of techniques have been developed to detect clusters with both varying shapes and densities (e.g., density-based, hierarchical, spectral, or deep clustering methods). However, the vast majority of these techniques do not consider statistical rigor and are susceptible to detecting spurious clusters formed as a result of natural randomness. On the other hand, scan statistic approaches explicitly control the rate of spurious results, but they typically assume a single “hotspot” of over-density and many rely on further assumptions such as a tessellated input space. To unite the strengths of both lines of work, we propose a statistically robust formulation of a multi-scale DBSCAN, namely Significant DBSCAN+, to identify significant clusters that are density connected. As we will show, incorporation of statistical rigor is a powerful mechanism that allows the new Significant DBSCAN+ to outperform state-of-the-art clustering techniques in various scenarios. We also propose computational enhancements to speed-up the proposed approach. Experiment results show that Significant DBSCAN+ can simultaneously improve the success rate of true cluster detection (e.g., 10–20% increases in absolute F1 scores) and substantially reduce the rate of spurious results (e.g., from thousands/hundreds of spurious detections to none or just a few across 100 datasets), and the acceleration methods can improve the efficiency for both clustered and non-clustered data.


2021 ◽  
Author(s):  
Danfeng Chen ◽  
Katherine Tashman ◽  
Duncan S Palmer ◽  
Benjamin Neale ◽  
Kathryn Roeder ◽  
...  

Abstract The use of external controls in genome-wide association study (GWAS) can significantly increase the size and diversity of the control sample, enabling high-resolution ancestry matching and enhancing the power to detect association signals. However, the aggregation of controls from multiple sources is challenging due to batch effects, difficulty in identifying genotyping errors, and the use of different genotyping platforms. These obstacles have impeded the use of external controls in GWAS and can lead to spurious results if not carefully addressed. We propose a unified data harmonization pipeline that includes an iterative approach to quality control (QC) and imputation, implemented before and after merging cohorts and arrays. We apply this harmonization pipeline to aggregate 27 517 European control samples from 16 collections within dbGaP. We leverage these harmonized controls to conduct a GWAS of Crohn’s disease. We demonstrate a boost in power over using the cohort samples alone, and that our procedure results in summary statistics free of any significant batch effects. This harmonization pipeline for aggregating genotype data from multiple sources can also serve other applications where individual level genotypes, rather than summary statistics, are required.


Author(s):  
Oliver Gutiérrez-Hernández ◽  
Luis Ventura García

Multiplicity arises when data analysis involves multiple simultaneous inferences, increasing the chance of spurious findings. It is a widespread problem frequently ignored by researchers. In this paper, we perform an exploratory analysis of the Web of Science database for COVID-19 observational studies. We examined 100 top-cited COVID-19 peer-reviewed articles based on p-values, including up to 7100 simultaneous tests, with 50% including >34 tests, and 20% > 100 tests. We found that the larger the number of tests performed, the larger the number of significant results (r = 0.87, p < 10−6). The number of p-values in the abstracts was not related to the number of p-values in the papers. However, the highly significant results (p < 0.001) in the abstracts were strongly correlated (r = 0.61, p < 10−6) with the number of p < 0.001 significances in the papers. Furthermore, the abstracts included a higher proportion of significant results (0.91 vs. 0.50), and 80% reported only significant results. Only one reviewed paper addressed multiplicity-induced type I error inflation, pointing to potentially spurious results bypassing the peer-review process. We conclude the need to pay special attention to the increased chance of false discoveries in observational studies, including non-replicated striking discoveries with a potentially large social impact. We propose some easy-to-implement measures to assess and limit the effects of multiplicity.


2021 ◽  
Vol 108 (Supplement_6) ◽  
Author(s):  
C Vincenti ◽  
V Bhattacharya ◽  
N Kansal

Abstract Aim Limb amputations have many post-op complications, including pain. The evidence supporting the use of nerve sheath catheters (NSC) to manage post-op pain is mixed. Current literature suggests NSC reduces post-op opioid requirements but does not reduce pain score, phantom limb pain or chronic stump pain. This study compared post-op pain in those with and without NSC after above knee amputations (AKA) and below knee amputations (BKA). Method Retrospective data from April 2014 – March 2017 was reviewed. Information regarding indication, anaesthetic, morphine requirement at 72 hours, phantom limb, chronic limb pain and a pain scale (1-10) at 24, 48 and 72 hours were collected. Results 32 patients were involved in the study. 11 had NSC for pain control. Of those patients without NSC, 43% experienced no pain. In comparison, 33% of those with NSC experienced no post-op pain. Phantom limb pain was experienced in a higher proportion of patients with NSC (18%) and in those with AKA (11%). 18% of patients with NSC experienced chronic limb pain, compared to 33% without NSC. 62% of patients with NSC required morphine at 72 hours and at higher dosages compared to those without. However, two patients used large amounts of morphine potentially giving spurious results. Conclusions Though limited by small patient group, patients with NSC were more likely to require morphine at 72 hours and at higher dosages but were less likely to experience chronic limb pain thus proving the role of NSC in post-op pain control.


Author(s):  
Maria Squires ◽  
Helen Wise ◽  
Heather Holmes ◽  
Katie Hadfield

Background Spuriously high results using the Abbott Architect enzymatic creatinine assay were noted to be particularly associated with very small sample volumes. This led us to query the effect of under-filling lithium heparin tubes on the measured enzymatic creatinine result. Methods Blood was provided by 5 laboratory personnel and then decanted into 5 x1.2 mL Sarstedt S-Monovette tubes, giving final blood volumes of 200, 400, 600, 800 and 1200  μL. Plasma was analysed using Abbott Architect Jaffe, enzymatic creatinine, Beckman Coulter (AU500) enzymatic creatinine and Roche (Cobas c702) enzymatic creatinine assays. Saline was also added to Sarstedt 1.2 mL and Teklab 2 mL tubes and analysed using the Abbott Jaffe and enzymatic creatinine methods. Results Increasing degrees of under-fill were associated with greater over-estimation of creatinine using the Abbott enzymatic assay, but no difference was noted using Jaffe methodology on the same platform or enzymatic assays provided by Roche or Beckman. On average, creatinine was 40.6% (+27.7  μmol/L) higher when only 200  μL of blood was present in the tube. Small volumes of saline added to lithium heparin tubes measured significant creatinine concentrations using the Abbott enzymatic method. Conclusions Lithium heparin directly interferes in the Abbott Architect enzymatic creatinine assay. Under-filling lithium heparin tubes can lead to clinically significant over-estimation of creatinine results by this assay. Users of this assay should be aware of the potential for spurious results in small sample volumes collected into lithium heparin tubes and implement robust procedures for identifying and reporting results on these samples.


2021 ◽  
Author(s):  
Matthew F. Glasser ◽  
Timothy S. Coalson ◽  
Michael P. Harms ◽  
Graham L. Baum ◽  
Joonas A. Autio ◽  
...  

T1-weighted divided by T2-weighted (T1w/T2w) myelin maps were initially developed for neuroanatomical analyses such as identifying cortical areas, but they are increasingly used in statistical comparisons across individuals and groups with other variables of interest. Existing T1w/T2w myelin maps contain residual radiofrequency transmit field (B1+) biases, which may be correlated with these variables of interest, leading to potentially spurious results. Here we propose multiple methods for correcting these transmit field biases using either explicit measures of the transmit field or alternatively a 'pseudo-transmit' approach that is highly correlated with the transmit field. We find that the resulting corrected T1w/T2w myelin maps are both better neuroanatomical measures (e.g., for use in cross-species comparisons), and more appropriate for statistical comparisons across individuals and groups (e.g., sex, age, or body-mass-index). We recommend that investigators who use the T1w/T2w approach for mapping cortical myelin use these B1+ transmit field corrected myelin maps going forward.


2021 ◽  
Author(s):  
James Wanliss ◽  
Grace Wanliss

Abstract Higuchi’s method of determining fractal dimension (HFD) is an important research tool that, compared to many other methods, gives rapid, efficient, and accurate estimations for the range of possible fractal dimensions. One major difficulty in applying the method is the correct choice of tuning parameter (kmax) to compute the most accurate results as a poor choice can generate spurious results. We analyze synthetic fractional Brownian motion to offer a general equation that allows determination, a priori, of the best value for the tuning parameter, given a particular length data set.


2021 ◽  
Author(s):  
Paul S Scotti ◽  
Jiageng Chen ◽  
Julie D Golomb

Inverted encoding models have recently become popular as a method for decoding stimuli and investigating neural representations. Here we present a novel modification to inverted encoding models that improves the flexibility and interpretability of stimulus reconstructions, addresses some key issues inherent in the standard inverted encoding model procedure, and provides trial-by-trial stimulus predictions and goodness-of-fit estimates. The standard inverted encoding model approach estimates channel responses (or reconstructions), which are averaged and aligned across trials and then typically evaluated using a metric such as slope, amplitude, etc.). We discuss how this standard procedure can produce spurious results and other interpretation issues. Our modifications are not susceptible to these methodological issues and are further advantageous due to our decoding metric taking into account the choice of population-level tuning functions and employing a prediction error-based metric directly comparable across experiments. Our modifications also allow researchers to obtain trial-by-trial confidence estimates independent of prediction error which can be used to threshold reconstructions and increase statistical power. We validate and demonstrate the improved utility of our modified inverted encoding model procedure across three real fMRI datasets, and additionally offer a Python package for easy implementation of our approach.


2021 ◽  
Vol 16 (2) ◽  
pp. 138-150
Author(s):  
Hyun Kang

Systematic reviews and meta-analyses rank the highest in the evidence hierarchy. However, they still have the risk of spurious results because they include too few studies and participants. The use of trial sequential analysis (TSA) has increased recently, providing more information on the precision and uncertainty of meta-analysis results. This makes it a powerful tool for clinicians to assess the conclusiveness of meta-analysis. TSA provides monitoring boundaries or futility boundaries, helping clinicians prevent unnecessary trials. The use and interpretation of TSA should be based on an understanding of the principles and assumptions behind TSA, which may provide more accurate, precise, and unbiased information to clinicians, patients, and policymakers. In this article, the history, background, principles, and assumptions behind TSA are described, which would lead to its better understanding, implementation, and interpretation.


Sign in / Sign up

Export Citation Format

Share Document