scholarly journals BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing

2009 ◽  
Vol 19 (10) ◽  
pp. 1884-1895 ◽  
Author(s):  
W.-C. Kao ◽  
K. Stevens ◽  
Y. S. Song
PLoS ONE ◽  
2008 ◽  
Vol 3 (10) ◽  
pp. e3495 ◽  
Author(s):  
Katherine Sorber ◽  
Charles Chiu ◽  
Dale Webster ◽  
Michelle Dimon ◽  
J. Graham Ruby ◽  
...  

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
David Redin ◽  
Tobias Frick ◽  
Hooman Aghelpasand ◽  
Max Käller ◽  
Erik Borgström ◽  
...  

AbstractThe future of human genomics is one that seeks to resolve the entirety of genetic variation through sequencing. The prospect of utilizing genomics for medical purposes require cost-efficient and accurate base calling, long-range haplotyping capability, and reliable calling of structural variants. Short-read sequencing has lead the development towards such a future but has struggled to meet the latter two of these needs. To address this limitation, we developed a technology that preserves the molecular origin of short sequencing reads, with an insignificant increase to sequencing costs. We demonstrate a novel library preparation method for high throughput barcoding of short reads where millions of random barcodes can be used to reconstruct megabase-scale phase blocks.


2021 ◽  
Author(s):  
Wesley Marin ◽  
Ravi Dandekar ◽  
Danillo G. Augusto ◽  
Tasneem Yusufali ◽  
Bianca Heyn ◽  
...  

The killer-cell immunoglobulin-like receptor ( KIR) complex on chromosome 19 encodes receptors that modulate the activity of natural killer cells, and variation in these genes has been linked to infectious and autoimmune disease, as well as having bearing on pregnancy and transplant outcomes. The medical relevance and high variability of KIR genes makes short-read sequencing an attractive technology for interrogating the region, providing a high-throughput, high-fidelity sequencing method that is cost-effective. However, because this gene complex is characterized by extensive nucleotide polymorphism, structural variation including gene fusions and deletions, and a high level of homology between genes, its interrogation at high resolution has been thwarted by bioinformatic challenges, with most studies limited to examining presence or absence of specific genes. Here, we present the PING (Pushing Immunogenetics to the Next Generation) pipeline, which incorporates empirical data, novel alignment strategies and a custom alignment processing workflow to enable high-throughput KIR sequence analysis from short-read data. PING provides KIR gene copy number classification functionality for all KIR genes through use of a comprehensive alignment reference. The gene copy number determined per individual enables an innovative genotype determination workflow using genotype-matched references. Together, these methods address the challenges imposed by the structural complexity and overall homology of the KIR complex. To determine copy number and genotype determination accuracy, we applied PING to European and African validation cohorts and a synthetic dataset. PING demonstrated exceptional copy number determination performance across all datasets and robust genotype determination performance. Finally, an investigation into discordant genotypes for the synthetic dataset provides insight into misaligned reads, advancing our understanding in interpretation of short-read sequencing data in complex genomic regions. PING promises to support a new era of studies of KIR polymorphism, delivering high-resolution KIR genotypes that are highly accurate, enabling high-quality, high-throughput KIR genotyping for disease and population studies.


2021 ◽  
Author(s):  
Martin Philpott ◽  
Jonathan Watson ◽  
Anjan Thakurta ◽  
Tom Brown ◽  
Tom Brown ◽  
...  

AbstractDroplet-based single-cell sequencing techniques have provided unprecedented insight into cellular heterogeneities within tissues. However, these approaches only allow for the measurement of the distal parts of a transcript following short-read sequencing. Therefore, splicing and sequence diversity information is lost for the majority of the transcript. The application of long-read Nanopore sequencing to droplet-based methods is challenging because of the low base-calling accuracy currently associated with Nanopore sequencing. Although several approaches that use additional short-read sequencing to error-correct the barcode and UMI sequences have been developed, these techniques are limited by the requirement to sequence a library using both short- and long-read sequencing. Here we introduce a novel approach termed single-cell Barcode UMI Correction sequencing (scBUC-seq) to efficiently error-correct barcode and UMI oligonucleotide sequences synthesized by using blocks of dimeric nucleotides. The method can be applied to correct either short-read or long-read sequencing, thereby allowing users to recover more reads per cell and permits direct single-cell Nanopore sequencing for the first time. We illustrate our method by using species-mixing experiments to evaluate barcode assignment accuracy and evaluate differential isoform usage and fusion transcripts using myeloma and sarcoma cell line models.


2020 ◽  
Vol 2 (4) ◽  
Author(s):  
Thomas Gregory ◽  
Apollinaire Ngankeu ◽  
Shelley Orwick ◽  
Esko A Kautto ◽  
Jennifer A Woyach ◽  
...  

Abstract High-throughput short-read sequencing relies on fragmented DNA for optimal sampling of input nucleic acid. Several vendors now offer proprietary enzyme cocktails as a cheaper and more streamlined method of fragmentation when compared to acoustic shearing. We have discovered that these enzymes induce the formation of library molecules containing regions of nearby DNA from opposite strands. Sequencing reads derived from these molecules can lead to artifact-derived variant calls appearing at variant allele frequencies <5%. We present Fragmentation Artifact Detection and Elimination (FADE), software to remove these artifacts from mapped reads and mitigate artifact-related effects on downstream analysis. We find that the artifacts principally affect downstream analyses that are sensitive to a 1–3% artifact bias in the sequencing reads, such as targeted resequencing and rare variant discovery.


Sign in / Sign up

Export Citation Format

Share Document