sequencing depth
Recently Published Documents


TOTAL DOCUMENTS

164
(FIVE YEARS 102)

H-INDEX

18
(FIVE YEARS 5)

2022 ◽  
Author(s):  
Lenore Pipes ◽  
Zihao Chen ◽  
Svetlana Afanaseva ◽  
Rasmus Nielsen

Wastewater surveillance has become essential for monitoring the spread of SARS-CoV-2. The quantification of SARS-CoV-2 RNA in wastewater correlates with the Covid-19 caseload in a community. However, estimating the proportions of different SARS-CoV-2 strains has remained technically difficult. We present a method for estimating the relative proportions of SARS-CoV-2 strains from wastewater samples. The method uses an initial step to remove unlikely strains, imputation of missing nucleotides using the global SARS-CoV-2 phylogeny, and an Expectation-Maximization (EM) algorithm for obtaining maximum likelihood estimates of the proportions of different strains in a sample. Using simulations with a reference database of >3 million SARS-CoV-2 genomes, we show that the estimated proportions accurately reflect the true proportions given sufficiently high sequencing depth and that the phylogenetic imputation is highly accurate and substantially improves the reference database.


2022 ◽  
Author(s):  
Fernando Mejia ◽  
Francisco Avilés Jiménez ◽  
Alfonso Méndez Tenorio

Microbial diversity is the most abundant form of life. Next Generation Sequencing technologies provide the capacity to study complex bacterial communities, in which the depth and the bioinformatic tools can influence the results. In this work we explored two different protocols for bacterial classification and abundance evaluation, using 10 bacterial genomes in a simulated sample at different sequencing. Protocol A consisted of metagenome assembly with Megahit and Ray Meta and taxonomic classification with Kraken2 and Centrifuge. In protocol B only taxonomicclassification. In both protocols, rarefaction, relative abundance and beta diversity were analyzed. In the protocol A, Megahit had a mean contig length of 1,128 and Ray Meta de 8,893 nucleotides. The number of species correctly classified in all depth assays were 6 out of 10 for protocol A, and 9 out of 10 using protocol B. The rarefaction analysis showed an overestimation of the number of species in almost all assays regardless of the protocol, and the beta diversity analysis results indicated significant differences in all comparisons. Protocol A was more efficient for diversity analysis, while protocol B estimated a more precise relative abundance. Our results do not allow us to suggest an optimal sequencing depth at specie level.


2021 ◽  
Author(s):  
Zhe Liu ◽  
Weijin Qiu ◽  
Shujin Fu ◽  
Xia Zhao ◽  
Jun Xia ◽  
...  

Sequencing depth has always played an important role in the accurate detection of low-frequency mutations. The increase of sequencing depth and the reasonable setting of threshold can maximize the probability of true positive mutation, or sensitivity. Here, we found that when the threshold was set as a fixed number of positive mutated reads, the probability of both true and false-positive mutations increased with depth. However, When the number of positive mutated reads increased in an equal proportion with depth (the threshold was transformed from a fixed number to a fixed percentage of mutated reads), the true positive probability still increased while false positive probability decreased. Through binomial distribution simulation and experimental test, it is found that the "fidelity" of detected-VAFs is the cause of this phenomenon. Firstly, we used the binomial distribution to construct a model that can easily calculate the relationship between sequencing depth and probability of true positive (or false positive), which can standardize the minimum sequencing depth for different low-frequency mutation detection. Then, the effect of sequencing depth on the fidelity of NA12878 with 3% mutation frequency and circulating tumor DNA (ctDNA of 1%, 3% and 5%) showed that the increase of sequencing depth reduced the fluctuation range of detected-VAFs around the expected VAFs, that is, the fidelity was improved. Finally, based on our experiment result, the consistency of single-nucleotide variants (SNVs) between paired FF and FFPE samples of mice increased with increasing depth, suggesting that increasing depth can improve the precision and sensitivity of low-frequency mutations.


2021 ◽  
Author(s):  
B Gemeinholzer ◽  
O Rupp ◽  
A Becker ◽  
M. Strickert ◽  
C-M Müller

AbstractThe important worldwide forage crop red clover (Trifolium pratense L.) is widely cultivated as cattle feed and for soil improvement. Wild populations and landraces have great natural diversity that could be used to improve cultivated red clover. However, to date, there is still insufficient knowledge about the natural genetic and phenotypic diversity of the species. Here, we developed a low-cost transcriptome analysis (mRNA-GBS) with reduced complexity and compared the results with population genetic (GBS) and previously published mRNA-Seq data, to assess whether analysis of intraspecific variation within and between populations and transcriptome responses is possible simultaneously. The mRNA-GBS approach was successful. SNP analyses from the mRNA-GBS approach revealed comparable patterns to the GBS results, but it was not possible to link transcriptome analyses with reduced complexity and sequencing depth to previously published greenhouse and field expression studies. The use of short sequences upstream of the poly(A) tail of mRNA to reduce complexity are promising approaches that combine population genetics and expression profiling to analyze many individuals with trait differences simultaneously and cost-effectively, even in non-model species. Our mRNA-GBS approach revealed too many additional short mRNA sequences, hampering sequence alignment depth and SNP recovery. Optimizations are being discussed. Nevertheless, our study design across different regions in Germany was also challenging as the use of differential expression analyses with reduced complexity, in which mRNA is fragmented at specific sites rather than randomly, is most likely counteracted under natural conditions by highly complex plant reactions at low sequencing depth.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Haowen Zhang ◽  
Li Song ◽  
Xiaotao Wang ◽  
Haoyu Cheng ◽  
Chenfei Wang ◽  
...  

AbstractAs sequencing depth of chromatin studies continually grows deeper for sensitive profiling of regulatory elements or chromatin spatial structures, aligning and preprocessing of these sequencing data have become the bottleneck for analysis. Here we present Chromap, an ultrafast method for aligning and preprocessing high throughput chromatin profiles. Chromap is comparable to BWA-MEM and Bowtie2 in alignment accuracy and is over 10 times faster than traditional workflows on bulk ChIP-seq/Hi-C profiles and than 10x Genomics’ CellRanger v2.0.0 pipeline on single-cell ATAC-seq profiles.


Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 954-954
Author(s):  
Laxminath Tumburu ◽  
Maliha Maryam Ahmad ◽  
Chunyu Liu ◽  
Clifton L. Dalgard ◽  
Mehdi Pirooznia ◽  
...  

Abstract Background: The simple point mutation that causes sickle cell disease (SCD) belies the extensive systemic damage it can cause. While the sickle pathology is initiated by polymerization of HbS, the multiple end-organ damage is inflicted by years of on-going inflammation and vasculopathy. An emerging marker of inflammation is the accumulation of acquired heteroplasmy mutations in mitochondrial DNA (mtDNA). Given the underlying chronic inflammation in SCD, we hypothesized that SCD patients display increased rates of mtDNA mutations, and previously confirmed (1). Here, we further performed indepth analyses in an ethnically matched normal (HbAA) as well as sickle trait (HbAS) subjects from another independent cohort, the Jackson Heart Study (JHS). Methods: We analyzed and compared whole genome sequencing (WGS) data from the from NIH cohort of 676 SCD patients of African ancestry with that of 621 ethnic-matched indviduals from the 1000 Genomes Project (1KG), and 3,580 individuals from the JHS cohort. The NIH SCD cohort included 561 HbSS & HbSβ 0thalassemia (combined), 90 HbSC, and 25 HbSβ + thalassemia genotypes, the 1KG cohort - 516 HbAA and 105 HbAS and JHS cohort - 3,200 HbAA, 89 HbAC (hemoglobin C trait), and 291 HbAS. Additionally, to further understand any potential sequencing depth bias, as well as to compare between two patient cohorts (NIH SCD & JHS cohorts) with underlying conditons that may influence the heteroplasmy bias, we downsampled 300 NIH cohort HbSS samples to a sequencing depth similar to JHS cohort, and compared their heteroplasmy burden. Mitochondrial sequences extracted from the cleaned WGS data of these 3 cohorts were analyzed for heteroplasmic and homoplasmic variants using mitoCaller from the package mitoAnalyzer. Results: The average depth per locus was ~6,671X for the NIH SCD cohort , ~2,879X for the 1KG cohort, and ~2169X for JHS cohort. We compared the quantity of heteroplasmic variants across the different NIH SCD genotype with 1KG (HbAA & HbAS), and JHS (HbAA, HbAC and HbAS) genotypic groups. The median number of heteroplasmic variants per individual increased progressively from HbAA, HbAS, HbSβ +thalassemia, and HbSC with the highest median number of 118 in HbSS & HbSβ 0 (Fig 1A) in NIH SCD cohort. It is noteworthy that the median mtDNA heteroplasmy in HbAA individuals in 1KG cohort was significantly lower than those in JHS cohort (Table insert in Fig 1A) which may be related to the underlying cardiovascular disease in the JHS cohort; whereas similar heteroplasmy burden in HbAS individuals between these 2 cohorts may underscore the genotype (HbAS) as the driver of heteroplasmy in these cohorts. We compared the heteroplasmy burden of a downsampled subset (n=300) NIH HbSS with that of JHS HbAA, HbAC and HbAS genotypes (Fig 1B). Although, the 70% reduction in sequencing depth resulted in the slight reduction in heteroplasmy burden, we noticed higher heteroplasmic variability (standard deviation) in this subset of NIH HbSS patients. This variability may be attributable to extreme variation in SCD phenotypic severity. We then applied cumulative distribution function to this downsampled subset and compared with JHS genotypes. We found the NIH HbSS patients have disproportionately higher proportion of heteroplasmy variants (Fig 1D) when compared to the JHS genotypes (HbAA, HbAC, and HbAS). Conclusion: We conclude that there is an increased prevalence of heteroplasmic mtDNA variants in SCD compared to ethnic-matched normal (HbAA) populations. Normal individuals with HbAA in JHS cohort have significantly higher heteroplasmic burden compared to those in 1KG cohort, suggesting an underlying cardiovascular disease in JHS cohort as a driving factor. Within each 1KG and JHS cohorts, individuals with sickle cell trait (HbAS) have similar heteroplasmy burden and also higher than those with HbAA, highlighting the potential significance of this genotype. Reducing the sequencing depth by > 70% (downsampling) led to the filtering out of heteroplasmy variants that would have been discovered with the original deeper sequencing depth of ~7300X. Nonetheless, downsampled HbSS samples still retained disproportionately higher heteroplasmy burden compared to non-SCD subjects. We are currently investigating if there is any correlation between mtDNA heteroplasmy burden and severity of clinical phenotypes among the SCD patients. 1. Ahmad, MM et al, Blood 136 (1):11-11 (2020) Figure 1 Figure 1. Disclosures No relevant conflicts of interest to declare.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Georgette Tanner ◽  
David R. Westhead ◽  
Alastair Droop ◽  
Lucy F. Stead

AbstractIntratumour heterogeneity provides tumours with the ability to adapt and acquire treatment resistance. The development of more effective and personalised treatments for cancers, therefore, requires accurate characterisation of the clonal architecture of tumours, enabling evolutionary dynamics to be tracked. Many methods exist for achieving this from bulk tumour sequencing data, involving identifying mutations and performing subclonal deconvolution, but there is a lack of systematic benchmarking to inform researchers on which are most accurate, and how dataset characteristics impact performance. To address this, we use the most comprehensive tumour genome simulation tool available for such purposes to create 80 bulk tumour whole exome sequencing datasets of differing depths, tumour complexities, and purities, and use these to benchmark subclonal deconvolution pipelines. We conclude that i) tumour complexity does not impact accuracy, ii) increasing either purity or purity-corrected sequencing depth improves accuracy, and iii) the optimal pipeline consists of Mutect2, FACETS and PyClone-VI. We have made our benchmarking datasets publicly available for future use.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12177
Author(s):  
Vasco Elbrecht ◽  
Sarah J. Bourlat ◽  
Thomas Hörren ◽  
Angie Lindner ◽  
Adriana Mordente ◽  
...  

Background Small and rare specimens can remain undetected when metabarcoding is applied on bulk samples with a high specimen size heterogeneity. This is especially critical for Malaise trap samples, where most of the biodiversity is contributed by small taxa with low biomass. The separation of samples in different size fractions for downstream analysis is one possibility to increase detection of small and rare taxa. However, experiments systematically testing different size sorting approaches and subsequent proportional pooling of fractions are lacking, but would provide important information for the optimization of metabarcoding protocols. We set out to find a size sorting strategy for Malaise trap samples that maximizes taxonomic recovery but remains scalable and time efficient. Methods Three Malaise trap samples were sorted into four size classes using dry sieving. Each fraction was homogenized and lysed. The corresponding lysates were pooled to simulate unsorted samples. Pooling was additionally conducted in equal proportions and in four different proportions enriching the small size fraction of samples. DNA from the individual size classes as well as the pooled fractions was extracted and metabarcoded using the FwhF2 and Fol-degen-rev primer set. Additionally, alternative wet sieving strategies were explored. Results The small size fractions harboured the highest diversity and were best represented when pooling in favour of small specimens. Metabarcoding of unsorted samples decreases taxon recovery compared to size sorted samples. A size separation into only two fractions (below 4 mm and above) can double taxon recovery compared to not size sorting. However, increasing the sequencing depth 3- to 4-fold can also increase taxon recovery to levels comparable with size sorting, but remains biased towards biomass rich taxa in the sample. Conclusion We demonstrate that size fractionation of Malaise trap bulk samples can increase taxon recovery. While results show distinct patterns, the lack of statistical support due to the limited number of samples processed is a limitation. Due to increased speed and lower risk of cross-contamination as well as specimen damage we recommend wet sieving and proportional pooling of the lysates in favour of the small size fraction (80–90% volume). However, for large-scale projects with time constraints, increasing sequencing depth is an alternative solution.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yasemin Guenay-Greunke ◽  
David A. Bohan ◽  
Michael Traugott ◽  
Corinna Wallinger

AbstractHigh-throughput sequencing platforms are increasingly being used for targeted amplicon sequencing because they enable cost-effective sequencing of large sample sets. For meaningful interpretation of targeted amplicon sequencing data and comparison between studies, it is critical that bioinformatic analyses do not introduce artefacts and rely on detailed protocols to ensure that all methods are properly performed and documented. The analysis of large sample sets and the use of predefined indexes create challenges, such as adjusting the sequencing depth across samples and taking sequencing errors or index hopping into account. However, the potential biases these factors introduce to high-throughput amplicon sequencing data sets and how they may be overcome have rarely been addressed. On the example of a nested metabarcoding analysis of 1920 carabid beetle regurgitates to assess plant feeding, we investigated: (i) the variation in sequencing depth of individually tagged samples and the effect of library preparation on the data output; (ii) the influence of sequencing errors within index regions and its consequences for demultiplexing; and (iii) the effect of index hopping. Our results demonstrate that despite library quantification, large variation in read counts and sequencing depth occurred among samples and that the sequencing error rate in bioinformatic software is essential for accurate adapter/primer trimming and demultiplexing. Moreover, setting an index hopping threshold to avoid incorrect assignment of samples is highly recommended.


Sign in / Sign up

Export Citation Format

Share Document