scholarly journals A field-wide assessment of differential high throughput sequencing reveals widespread bias

2021 ◽  
Author(s):  
Taavi Päll ◽  
Hannes Luidalepp ◽  
Tanel Tenson ◽  
Ülo Maiväli

AbstractHere we assess reproducibility and inferential quality in the field of differential HT-seq, based on analysis of datasets submitted 2008-2019 to the NCBI GEO data repository. Analysis of GEO submission file structures places an overall 59% upper limit to reproducibility. We further show that only 23% of experiments resulted in theoretically expected p value histogram shapes, although both reproducibility and p value distributions show marked improvement over time. Uniform p value histogram shapes, indicative of <100 true effects, were extremely few. Our calculations of π0, the fraction of true nulls, showed that 36% of experiments have π0 <0.5, meaning that in over a third of experiments most RNA-s were estimated to change their expression level upon experimental treatment. Both the fraction of different p value histogram types and π0 values are strongly associated with the software used for calculating these p values by the original authors, indicating widespread bias.

2015 ◽  
Vol 9 ◽  
pp. BBI.S24066 ◽  
Author(s):  
Monica B. Assumpção ◽  
Fabiano C. Moreira ◽  
Igor G. Hamoy ◽  
Leandro Magalhães ◽  
Amanda Vidal ◽  
...  

Field effect in cancer, also called “field cancerization”, attempts to explain the development of multiple primary tumors and locally recurrent cancer. The concept of field effect in cancer has been reinforced, since molecular alterations were found in tumor-adjacent tissues with normal histopathological appearances. With the aim of investigating field effects in gastric cancer (GC), we conducted a high-throughput sequencing of the miRnome of four GC samples and their respective tumor-adjacent tissues and compared them with the miRnome of a gastric antrum sample from patients without GC, assuming that tumor-adjacent tissues could not be considered as normal tissues. The global number of miRNAs and read counts was highest in tumor samples, followed by tumor-adjacent and normal samples. Analyzing the miRNA expression profile of tumor-adjacent miRNA, hsa-miR-3131, hsa-miR-664, hsa-miR-483, and hsa-miR-150 were significantly downregulated compared with the antrum without tumor tissue ( P-value < 0.01; fold-change < 5). Additionally, hsa-miR-3131, hsa-miR-664, and hsa-miR-150 were downregulated ( P-value < 0.001) in all paired samples of tumor and tumor-adjacent tissues, compared with antrum without tumor mucosa. The field effect was clearly demonstrated in gastric carcinogenesis by an epigenetics-based approach, and potential biomarkers of the GC field effect were identified. The elevated expression of miRNAs in adjacent tissues and tumors tissues may indicate that a cascade of events takes place during gastric carcinogenesis, reinforcing the notion of field effects. This phenomenon seems to be linked to DNA methylation patterns in cancer and suggests the involvement of an epigenetic network mechanism.


2016 ◽  
Author(s):  
Jay T Lennon ◽  
Kenneth J Locey

In a recent commentary, Amann and Rosselló-Mórab summarize how the census of Bacteria and Archaea has changed over time (1). For decades, the number of recognized microbial taxa was underestimated owing to limitations associated with culture-based methods and the rules of nomenclature. The authors describe a "quantum leap" in the estimates of global microbial diversity following advances in high-throughput sequencing technology. Despite this, Amann and Rosselló-Mórab project that a complete census of microbial diversity will be reached within a few years culminating in the lower millions of taxa (1). While perhaps attractively optimistic to some, this presumption is misleading for the following reasons.


2016 ◽  
Author(s):  
Jay T Lennon ◽  
Kenneth J Locey

In a recent commentary, Amann and Rosselló-Mórab summarize how the census of Bacteria and Archaea has changed over time (1). For decades, the number of recognized microbial taxa was underestimated owing to limitations associated with culture-based methods and the rules of nomenclature. The authors describe a "quantum leap" in the estimates of global microbial diversity following advances in high-throughput sequencing technology. Despite this, Amann and Rosselló-Mórab project that a complete census of microbial diversity will be reached within a few years culminating in the lower millions of taxa (1). While perhaps attractively optimistic to some, this presumption is misleading for the following reasons.


2013 ◽  
Vol 121 (7) ◽  
pp. 377-386 ◽  
Author(s):  
Mauro Ajaj Saieg ◽  
William R. Geddie ◽  
Scott L. Boerner ◽  
Denis Bailey ◽  
Michael Crump ◽  
...  

2021 ◽  
Author(s):  
Willem M Otte ◽  
Christiaan H Vinkers ◽  
Philippe Habets ◽  
David G P van IJzendoorn ◽  
Joeri K Tijdink

Abstract Objective To quantitatively map how non-significant outcomes are reported in randomised controlled trials (RCTs) over the last thirty years. Design Quantitative analysis of English full-texts containing 567,758 RCTs recorded in PubMed (81.5% of all published RCTs). Methods We determined the exact presence of 505 pre-defined phrases denoting results that do not reach formal statistical significance (P<0.05) in 567,758 RCT full texts between 1990 and 2020 and manually extracted associated P values. Phrase data was modeled with Bayesian linear regression. Evidence for temporal change was obtained through Bayes-factor analysis. In a randomly sampled subset, the associated P values were manually extracted. Results We identified 61,741 phrases indicating close to significant results in 49,134 (8.65%; 95% confidence interval (CI): 8.58–8.73) RCTs. The overall prevalence of these phrases remained stable over time, with the most prevalent phrases being ‘marginally significant’ (in 7,735 RCTs), ‘all but significant’ (7,015), ‘a nonsignificant trend’ (3,442), ‘failed to reach statistical significance’ (2,578) and ‘a strong trend’ (1,700). The strongest evidence for a temporal prevalence increase was found for ‘a numerical trend’, ‘a positive trend’, ‘an increasing trend’ and ‘nominally significant’. The phrases ‘all but significant’, ‘approaches statistical significance’, ‘did not quite reach statistical significance’, ‘difference was apparent’, ‘failed to reach statistical significance’ and ‘not quite significant’ decreased over time. In the random sampled subset, the 11,926 identified P values ranged between 0.05 and 0.15 (68.1%; CI: 67.3–69.0; median 0.06). Conclusions Our results demonstrate that phrases describing marginally significant results are regularly used in RCTs to report P values close to but above the dominant 0.05 cut-off. The phrase prevalence remained stable over time, despite all efforts to change the focus from P < 0.05 to reporting effect sizes and corresponding confidence intervals. To improve transparency and enhance responsible interpretation of RCT results, researchers, clinicians, reviewers, and editors need to abandon the focus on formal statistical significance thresholds and stimulate reporting of exact P values with corresponding effect sizes and confidence intervals. Significance statement The power of language to modify the reader’s perception of how to interpret biomedical results cannot be underestimated. Misreporting and misinterpretation are urgent problems in RCT output. This may be at least partially related to the statistical paradigm of the 0.05 significance threshold. Sometimes, creativity and inventive strategies of clinical researchers may be used – describing their clinical results to be ‘almost significant’ – to get their data published. This phrasing may convince readers about the value of their work. Since 2005 there is an increasing concern that most current published research findings are false and it has been generally advised to switch from null hypothesis significance testing to using effect sizes, estimation, and cumulation of evidence. If this ‘new statistics’ approach has worked out well should be reflected in the phases describing non-significance results of RCTs. In particular in changing patterns describing P values just above 0.05 value. More than five hundred phrases potentially suited to report or discuss non-significant results were searched in over half a million published RCTs. A stable overall prevalence of these phrases (10.87%, CI: 10.79–10.96; N: 61,741), with associated P values close to 0.05, was found in the last three decades, with strong increases or decreases in individual phrases describing these near-significant results. The pressure to pass scientific peer-review barrier may function as an incentive to use effective phrases to mask non-significant results in RCTs. However, this keeps the researcher’s pre-occupied with hypothesis testing rather than presenting outcome estimations with uncertainty. The effect of language on getting RCT results published should ideally be minimal to steer evidence-based medicine away from overselling of research results, unsubstantiated claims about the efficacy of certain RCTs and to prevent an over-reliance on P value cutoffs. Our exhaustive search suggests that presenting RCT findings remains a struggle when P values approach the carved-in-stone threshold of 0.05.


2021 ◽  
Author(s):  
Tanner Roy Wiegand ◽  
Aidan McVey ◽  
Anna Nemudraia ◽  
Artem Nemudryi ◽  
Blake Wiedenheft

In late December of 2019, high throughput sequencing technologies enabled rapid identification of SARS-CoV-2 as the etiological agent of COVID-19, and global sequencing efforts are now a critical tool for monitoring the ongoing spread and evolution of this virus. Here, we analyze a subset (n=87,032) of all publicly available SARS-CoV-2 genomes (n=~5.6 million) that were randomly selected, but equally distributed over the course of the pandemic. We plot the appearance of new variants of concern (VOCs) over time and show that the mutation rates in Omicron viruses are significantly greater than those in previously identified SARS-CoV-2 variants. Mutations in Omicron are primarily restricted to the spike protein, while 25 other viral proteins—including those involved in SARS-CoV-2 replication—are highly conserved. Collectively, this suggests that the genetic distinction of Omicron primarily arose from selective pressures on the spike, and that the fidelity of replication of this variant has not been altered.


10.2196/21345 ◽  
2020 ◽  
Vol 22 (8) ◽  
pp. e21345 ◽  
Author(s):  
Marcus Bendtsen

When should a trial stop? Such a seemingly innocent question evokes concerns of type I and II errors among those who believe that certainty can be the product of uncertainty and among researchers who have been told that they need to carefully calculate sample sizes, consider multiplicity, and not spend P values on interim analyses. However, the endeavor to dichotomize evidence into significant and nonsignificant has led to the basic driving force of science, namely uncertainty, to take a back seat. In this viewpoint we discuss that if testing the null hypothesis is the ultimate goal of science, then we need not worry about writing protocols, consider ethics, apply for funding, or run any experiments at all—all null hypotheses will be rejected at some point—everything has an effect. The job of science should be to unearth the uncertainties of the effects of treatments, not to test their difference from zero. We also show the fickleness of P values, how they may one day point to statistically significant results; and after a few more participants have been recruited, the once statistically significant effect suddenly disappears. We show plots which we hope would intuitively highlight that all assessments of evidence will fluctuate over time. Finally, we discuss the remedy in the form of Bayesian methods, where uncertainty leads; and which allows for continuous decision making to stop or continue recruitment, as new data from a trial is accumulated.


2019 ◽  
Vol 35 (20) ◽  
pp. 4196-4199 ◽  
Author(s):  
David S Robertson ◽  
Jan Wildenhain ◽  
Adel Javanmard ◽  
Natasha A Karp

Abstract Summary In many areas of biological research, hypotheses are tested in a sequential manner, without having access to future P-values or even the number of hypotheses to be tested. A key setting where this online hypothesis testing occurs is in the context of publicly available data repositories, where the family of hypotheses to be tested is continually growing as new data is accumulated over time. Recently, Javanmard and Montanari proposed the first procedures that control the FDR for online hypothesis testing. We present an R package, onlineFDR, which implements these procedures and provides wrapper functions to apply them to a historic dataset or a growing data repository. Availability and implementation The R package is freely available through Bioconductor (http://www.bioconductor.org/packages/onlineFDR). Supplementary information Supplementary data are available at Bioinformatics online.


Life ◽  
2021 ◽  
Vol 11 (10) ◽  
pp. 1055
Author(s):  
Elena S. Vashukova ◽  
Polina Y. Kozyulina ◽  
Roman A. Illarionov ◽  
Natalya O. Yurkina ◽  
Olga V. Pachuliia ◽  
...  

Although circulating microRNAs (miRNAs) in maternal blood may play an important role in regulation of pregnancy progression and serve as non-invasive biomarkers for different gestation complications, little is known about their profile in blood during normally developing pregnancy. In this study we evaluated the miRNA profiles in paired plasma and serum samples from pregnant women without health or gestational abnormalities at three time points using high-throughput sequencing technology. Sequencing revealed that the percentage of miRNA reads in plasma and serum decreased by a third compared to first and second trimesters. We found two miRNAs in plasma (hsa-miR-7853-5p and hsa-miR-200c-3p) and 10 miRNAs in serum (hsa-miR-203a-5p, hsa-miR-495-3p, hsa-miR-4435, hsa-miR-340-5p, hsa-miR-4417, hsa-miR-1266-5p, hsa-miR-4494, hsa-miR-134-3p, hsa-miR-5008-5p, and hsa-miR-6756-5p), that exhibit level changes during pregnancy (p-value adjusted < 0.05). In addition, we observed differences for 36 miRNAs between plasma and serum (p-value adjusted < 0.05), which should be taken into consideration when comparing the results between studies performed using different biosample types. The results were verified by analysis of three miRNAs using qRT-PCR (p < 0.05). The present study confirms that the circulating miRNA profile in blood changes during gestation. Our results set the basis for further investigation of molecular mechanisms, involved in regulation of pregnancy, and the search for biomarkers of gestation abnormalities.


2010 ◽  
Vol 192 (22) ◽  
pp. 6045-6055 ◽  
Author(s):  
Haruo Suzuki ◽  
Hirokazu Yano ◽  
Celeste J. Brown ◽  
Eva M. Top

ABSTRACT Despite the important contribution of self-transmissible plasmids to bacterial evolution, little is understood about the range of hosts in which these plasmids have evolved. Our goal was to infer this so-called evolutionary host range. The nucleotide composition, or genomic signature, of plasmids is often similar to that of the chromosome of their current host, suggesting that plasmids acquire their hosts’ signature over time. Therefore, we examined whether the evolutionary host range of plasmids could be inferred by comparing their trinucleotide composition to that of all completely sequenced bacterial chromosomes. The diversity of candidate hosts was determined using taxonomic classification and genetic distance. The method was first tested using plasmids from six incompatibility (Inc) groups whose host ranges are generally thought to be narrow (IncF, IncH, and IncI) or broad (IncN, IncP, and IncW) and then applied to other plasmid groups. The evolutionary host range was found to be broad for IncP plasmids, narrow for IncF and IncI plasmids, and intermediate for IncH and IncN plasmids, which corresponds with their known host range. The IncW plasmids as well as several plasmids from the IncA/C, IncP, IncQ, IncU, and PromA groups have signatures that were not similar to any of the chromosomal signatures, raising the hypothesis that these plasmids have not been ameliorated in any host due to their promiscuous nature. The inferred evolutionary host range of IncA/C, IncP-9, and IncL/M plasmids requires further investigation. In this era of high-throughput sequencing, this genomic signature method is a useful tool for predicting the host range of novel mobile elements.


Sign in / Sign up

Export Citation Format

Share Document