read coverage
Recently Published Documents


TOTAL DOCUMENTS

42
(FIVE YEARS 23)

H-INDEX

6
(FIVE YEARS 2)

Diagnostics ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 59
Author(s):  
Tom Mokveld ◽  
Zaid Al-Ars ◽  
Erik A. Sistermans ◽  
Marcel Reinders

In prenatal diagnostics, NIPT screening utilizing read coverage-based profiles obtained from shallow WGS data is routinely used to detect fetal CNVs. From this same data, fragment size distributions of fetal and maternal DNA fragments can be derived, which are known to be different, and often used to infer fetal fractions. We argue that the fragment size has the potential to aid in the detection of CNVs. By integrating, in parallel, fragment size and read coverage in a within-sample normalization approach, it is possible to construct a reference set encompassing both data types. This reference then allows the detection of CNVs within queried samples, utilizing both data sources. We present a new methodology, WisecondorFF, which improves sensitivity, while maintaining specificity, relative to existing approaches. WisecondorFF increases robustness of detected CNVs, and can reliably detect even at lower fetal fractions (<2%).


2021 ◽  
Vol 12 ◽  
Author(s):  
Delphine Girlich ◽  
Rémy A. Bonnin ◽  
Alexis Proust ◽  
Thierry Naas ◽  
Laurent Dortet

The differential expression of VIM-1 in Atlantibacter hermannii WEB-2 and Enterobacter hormaechei ssp. hoffmannii WEB-1 clinical isolates from a rectal swab of a hospitalized patient in France was investigated. A. hermannii WEB-2 was resistant to all β-lactams except carbapenems. It produced ESBL SHV-12, but the Carba NP test failed to detect any carbapenemase activity despite the production of VIM-1. Conversely, E. hormaechei WEB-1, previously recovered from the same patient, was positive for the detection of carbapenemase activity. The blaVIM–1 gene was located on a plasmid and embedded within class 1 integron. Both plasmids were of the same IncA incompatibility group and conferred the same resistance pattern when electroporated in Escherichia coli TOP10 or Enterobacter cloacae CIP7933. Quantitative RT-PCR experiments indicated a weaker replication of pWEB-2 in A. hermannii as compared to E. hormaechei. An isogenic mutant of A. hermannii WEB-2 selected after sequential passages with increased concentrations of imipenem possessed higher MICs for carbapenems and cephalosporins including cefiderocol, higher levels of the blaVIM–1 gene transcripts, and detectable carbapenemase activity using the Carba NP test. Assessment of read coverage demonstrated that a duplication of the region surrounding blaVIM–1 gene occurred in the A. hermannii mutant with detectable carbapenemase activity. The lack of detection of the VIM-1 carbapenemase activity in A. hermannii WEB-2 isolate was likely due to a weak replication of the IncA plasmid harboring the blaVIM–1 gene. Imipenem as selective pressure led to a duplication of this gene on the plasmid and to the restoration of a significant carbapenem-hydrolyzing phenotype.


2021 ◽  
Author(s):  
Meifang Qi ◽  
Utthara Nayar ◽  
Leif Ludwig ◽  
Nikhil Wagle ◽  
Esther Rheinbay

Exogenous cDNA introduced into an experimental system, either intentionally or accidentally, can appear as added read coverage over that gene in next-generation sequencing libraries derived from this system. If not properly recognized and managed, this cross-contamination with exogenous signal can lead to incorrect interpretation of research results. Yet, this problem is not routinely addressed in current sequence processing pipelines. Here, we present cDNA-detector, a computational tool to identify and remove exogenous cDNA contamination in DNA sequencing experiments. We apply cDNA-detector to several highly-cited public databases (TCGA, ENCODE, NCBI SRA) and show that contaminant genes appear in sequencing experiments where they lead to incorrect coverage peak calls. Our findings highlight the importance of sensitive detection and removal of contaminant cDNA from NGS libraries before downstream analysis.


2021 ◽  
Vol 22 (9) ◽  
pp. 4468
Author(s):  
Naima Ahmed Fahmi ◽  
Heba Nassereddeen ◽  
Jaewoong Chang ◽  
Meeyeon Park ◽  
Hsinsung Yeh ◽  
...  

(1) Background: A simplistic understanding of the central dogma falls short in correlating the number of genes in the genome to the number of proteins in the proteome. Post-transcriptional alternative splicing contributes to the complexity of the proteome and is critical in understanding gene expression. mRNA-sequencing (RNA-seq) has been widely used to study the transcriptome and provides opportunity to detect alternative splicing events among different biological conditions. Despite the popularity of studying transcriptome variants with RNA-seq, few efficient and user-friendly bioinformatics tools have been developed for the genome-wide detection and visualization of alternative splicing events. (2) Results: We propose AS-Quant, (Alternative Splicing Quantitation), a robust program to identify alternative splicing events from RNA-seq data. We then extended AS-Quant to visualize the splicing events with short-read coverage plots along with complete gene annotation. The tool works in three major steps: (i) calculate the read coverage of the potential spliced exons and the corresponding gene; (ii) categorize the events into five different categories according to the annotation, and assess the significance of the events between two biological conditions; (iii) generate the short reads coverage plot for user specified splicing events. Our extensive experiments on simulated and real datasets demonstrate that AS-Quant outperforms the other three widely used baselines, SUPPA2, rMATS, and diffSplice for detecting alternative splicing events. Moreover, the significant alternative splicing events identified by AS-Quant between two biological contexts were validated by RT-PCR experiment. (3) Availability: AS-Quant is implemented in Python 3.0. Source code and a comprehensive user’s manual are freely available online.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Yanyu Liang ◽  
François Aguet ◽  
Alvaro N. Barbeira ◽  
Kristin Ardlie ◽  
Hae Kyung Im

AbstractGenetic studies of the transcriptome help bridge the gap between genetic variation and phenotypes. To maximize the potential of such studies, efficient methods to identify expression quantitative trait loci (eQTLs) and perform fine-mapping and genetic prediction of gene expression traits are needed. Current methods that leverage both total read counts and allele-specific expression to identify eQTLs are generally computationally intractable for large transcriptomic studies. Here, we describe a unified framework that addresses these needs and is scalable to thousands of samples. Using simulations and data from GTEx, we demonstrate its calibration and performance. For example, mixQTL shows a power gain equivalent to a 29% increase in sample size for genes with sufficient allele-specific read coverage. To showcase the potential of mixQTL, we apply it to 49 GTEx tissues and find 20% additional eQTLs (FDR < 0.05, per tissue) that are significantly more enriched among trait associated variants and candidate cis-regulatory elements comparing to the standard approach.


Author(s):  
Xiaohui Wu ◽  
Tao Liu ◽  
Congting Ye ◽  
Wenbin Ye ◽  
Guoli Ji

Abstract Alternative polyadenylation (APA) generates diverse mRNA isoforms, which contributes to transcriptome diversity and gene expression regulation by affecting mRNA stability, translation and localization in cells. The rapid development of 3′ tag-based single-cell RNA-sequencing (scRNA-seq) technologies, such as CEL-seq and 10x Genomics, has led to the emergence of computational methods for identifying APA sites and profiling APA dynamics at single-cell resolution. However, existing methods fail to detect the precise location of poly(A) sites or sites with low read coverage. Moreover, they rely on priori genome annotation and can only detect poly(A) sites located within or near annotated genes. Here we proposed a tool called scAPAtrap for detecting poly(A) sites at the whole genome level in individual cells from 3′ tag-based scRNA-seq data. scAPAtrap incorporates peak identification and poly(A) read anchoring, enabling the identification of the precise location of poly(A) sites, even for sites with low read coverage. Moreover, scAPAtrap can identify poly(A) sites without using priori genome annotation, which helps locate novel poly(A) sites in previously overlooked regions and improve genome annotation. We compared scAPAtrap with two latest methods, scAPA and Sierra, using scRNA-seq data from different experimental technologies and species. Results show that scAPAtrap identified poly(A) sites with higher accuracy and sensitivity than competing methods and could be used to explore APA dynamics among cell types or the heterogeneous APA isoform expression in individual cells. scAPAtrap is available at https://github.com/BMILAB/scAPAtrap.


2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A61-A61
Author(s):  
Lee McDaniel ◽  
Rachel Pyke ◽  
Charles Abbott ◽  
Gabor Bartha ◽  
John West ◽  
...  

BackgroundPrecision immuno-oncology is increasingly relevant to cancer therapy given the ascendance of immunotherapy. While next-generation sequencing (NGS) based algorithms may elucidate immunotherapeutic response, many such algorithms require highly accurate Class I HLA typing. One major challenge of HLA type derivation resides in highly polymorphic HLA allelic diversity, which conventional exome sequencing technologies poorly capture. Further, accurate HLA typing requires definitive distinction between thousands of potential HLA alleles. These challenges may cause widely used NGS HLA typing tools, such as Polysolver and Optitype, to perform inaccurate HLA typing. Poor HLA coverage poses the risk of silently mistyping HLA alleles, yielding inaccurate downstream HLA loss of heterozygosity (LOH) detection and neoepitope predictions.MethodsWe designed the ImmunoID NeXT Platform® to more comprehensively profile the HLA region. To evaluate the accuracy of conventional NGS-based Class I HLA typing, a widely used dbGaP project (phs000452, n=160) of melanoma NGS data was evaluated alongside a set of over 500 solid tumor cancer patient samples sequenced on the ImmunoID NeXT Platform. Read coverage was derived from both GRCh38 and HLA allele database alignments. To test whether Polysolver over represents specific HLA alleles under reduced read conditions, a Monte Carlo bootstrap approach predicted theoretical allele frequency ranges.ResultsBelow 20x read coverage, nearly 50% of Polysolver HLA calls (phs000452) are homozygous, representing a divergence from typical HLA homozygous rates of between 10–20%, with p<10-15 (Fisher’s Exact) compared to reference 1000 Genomes homozygous rates. Polysolver’s homozygous, heterozygous, and no-calls demonstrated a statistically significant difference in coverage (p<10-6, Kruskal-Wallis) across all Class I HLA genes per Polysolver and public exome data (phs000452). The Personalis ImmunoID NeXT™ cohort did not demonstrate such a trend despite a similar exome-wide sequencing depth. Further, sixteen rare HLA alleles were identified with sample frequencies greater than expected from the dbGaP data set, with no such alleles identified from the Personalis ImmunoID NeXT data set.ConclusionsHLA typing may silently fail in the context of reduced read coverage without HLA-specific platform augmentation. This silent failure can have large implications for accurate neoantigen prediction and HLA LOH detection, both of which are becoming increasingly important for immuno-oncology treatment modalities such as personalized cancer vaccines, adoptive cell therapies, and blockade therapy response biomarkers. Studies utilizing neoepitope and HLA LOH prediction require careful validation for HLA calls, including assessments of coverage and homozygous rates, and may benefit from increased HLA locus coverage.


2020 ◽  
Author(s):  
Benjamin Kellman ◽  
Hratch Baghdassarian ◽  
Tiziano Pramparo ◽  
Isaac Shamie ◽  
Vahid Gazestani ◽  
...  

Abstract RNA-Seq is ubiquitous, but depending on the study, sub-optimal sample handling may be required, resulting in repeated freeze-thaw cycles. However, little is known about how each cycle impacts downstream analyses, due to a lack of study and known limitations in common RNA quality metrics, e.g., RIN, at quantifying RNA degradation following repeated freeze-thaws. Here we quantify the impact of repeated freeze-thaw on the reliability of RNA-Seq. To do so, we developed a method to estimate the relative noise between technical replicates independently of RIN. Using this approach we inferred the effect of both RIN and the number of freeze-thaw cycles on sample noise. We find that RIN is unable to fully account for the change in sample noise due to freeze-thaw cycles. Additionally, freeze-thaw is detrimental to sample quality and differential expression (DE) reproducibility, approaching zero after three cycles for poly(A)-enriched samples, wherein the inherent 3’ bias in read coverage is more exacerbated by freeze-thaw cycles, while ribosome-depleted samples are less affected by freeze-thaws. The use of poly(A)-enrichment for RNA sequencing is pervasive in library preparation of frozen tissue, and thus, it is important during experimental design and data analysis to consider the impact of repeated freeze-thaw cycles on reproducibility.


2020 ◽  
Vol 2 (3) ◽  
Author(s):  
Stuart Lee ◽  
Albert Y Zhang ◽  
Shian Su ◽  
Ashley P Ng ◽  
Aliaksei Z Holik ◽  
...  

Abstract RNA-seq datasets can contain millions of intron reads per library that are typically removed from downstream analysis. Only reads overlapping annotated exons are considered to be informative since mature mRNA is assumed to be the major component sequenced, especially for poly(A) RNA libraries. In this study, we show that intron reads are informative, and through exploratory data analysis of read coverage that intron signal is representative of both pre-mRNAs and intron retention. We demonstrate how intron reads can be utilized in differential expression analysis using our index method where a unique set of differentially expressed genes can be detected using intron counts. In exploring read coverage, we also developed the superintronic software that quickly and robustly calculates user-defined summary statistics for exonic and intronic regions. Across multiple datasets, superintronic enabled us to identify several genes with distinctly retained introns that had similar coverage levels to that of neighbouring exons. The work and ideas presented in this paper is the first of its kind to consider multiple biological sources for intron reads through exploratory data analysis, minimizing bias in discovery and interpretation of results. Our findings open up possibilities for further methods development for intron reads and RNA-seq data in general.


2020 ◽  
Vol 202 (20) ◽  
Author(s):  
Christopher E. Wozniak ◽  
Jordan J. Hendriksen ◽  
Baldomero M. Olivera ◽  
John R. Roth ◽  
Kelly T. Hughes

ABSTRACT A mutant of Salmonella enterica serovar Typhimurium was isolated that simultaneously affected two metabolic pathways as follows: NAD metabolism and DNA repair. The mutant was isolated as resistant to a nicotinamide analog and as temperature-sensitive for growth on minimal glucose medium. In this mutant, Salmonella's 94-kb virulence plasmid pSLT had recombined into the chromosome upstream of the NAD salvage pathway gene pncA. This insertion blocked most transcription of pncA, which reduced uptake of the nicotinamide analog. The pSLT insertion mutant also exhibited phenotypes associated with induction of the SOS DNA repair system, including an increase in filamentous cells, higher exonuclease III and catalase activities, and derepression of SOS gene expression. Genome sequencing revealed increased read coverage extending out from the site of pSLT insertion. The two pSLT replication origins are likely initiating replication of the chromosome near the normal replication terminus. Too much replication initiation at the wrong site is probably causing the observed growth defects. Accordingly, deletion of both pSLT replication origins restored growth at higher temperatures. IMPORTANCE In studies that insert a second replication origin into the chromosome, both origins are typically active at the same time. In contrast, the integrated pSLT plasmid initiated replication in stationary phase after normal chromosomal replication had finished. The gradient in read coverage extending out from a single site could be a simple but powerful tool for studying replication and detecting chromosomal rearrangements. This technique may be of particular value when a genome has been sequenced for the first time to verify correct assembly.


Sign in / Sign up

Export Citation Format

Share Document