scholarly journals Reliable variant calling during runtime of Illumina sequencing

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Tobias P. Loka ◽  
Simon H. Tausch ◽  
Bernhard Y. Renard

Abstract The sequential paradigm of data acquisition and analysis in next-generation sequencing leads to high turnaround times for the generation of interpretable results. We combined a novel real-time read mapping algorithm with fast variant calling to obtain reliable variant calls still during the sequencing process. Thereby, our new algorithm allows for accurate read mapping results for intermediate cycles and supports large reference genomes such as the complete human reference. This enables the combination of real-time read mapping results with complex follow-up analysis. In this study, we showed the accuracy and scalability of our approach by applying real-time read mapping and variant calling to seven publicly available human whole exome sequencing datasets. Thereby, up to 89% of all detected SNPs were already identified after 40 sequencing cycles while showing similar precision as at the end of sequencing. Final results showed similar accuracy to those of conventional post-hoc analysis methods. When compared to standard routines, our live approach enables considerably faster interventions in clinical applications and infectious disease outbreaks. Besides variant calling, our approach can be adapted for a plethora of other mapping-based analyses.

2018 ◽  
Author(s):  
Tobias P. Loka ◽  
Simon H. Tausch ◽  
Bernhard Y. Renard

AbstractThe sequential paradigm of data acquisition and analysis in next-generation sequencing leads to high turnaround times for the generation of interpretable results. We combined a novel real-time read mapping algorithm with fast variant calling to obtain reliable variant calls still during the sequencing process. Thereby, our new algorithm allows for accurate read mapping results for intermediate cycles and supports large reference genomes such as the complete human reference. This enables the combination of real-time read mapping results with complex follow-up analysis. In this study, we showed the accuracy and scalability of our approach by applying real-time read mapping and variant calling to seven publicly available human whole exome sequencing datasets. Thereby, up to 89% of all detected SNPs were already identified after 40 sequencing cycles while showing similar precision as at the end of sequencing. Final results showed similar accuracy to those of conventional post-hoc analysis methods. When compared to standard routines, our live approach enables considerably faster interventions in clinical applications and infectious disease outbreaks. Besides variant calling, our approach can be adapted for a plethora of other mapping-based analyses.


2021 ◽  
Vol 3 (4) ◽  
Author(s):  
Guilherme de Sena Brandine ◽  
Andrew D Smith

Abstract DNA cytosine methylation is an important epigenomic mark with a wide range of functions in many organisms. Whole genome bisulfite sequencing is the gold standard to interrogate cytosine methylation genome-wide. Algorithms used to map bisulfite-converted reads often encode the four-base DNA alphabet with three letters by reducing two bases to a common letter. This encoding substantially reduces the entropy of nucleotide frequencies in the resulting reference genome. Within the paradigm of read mapping by first filtering possible candidate alignments, reduced entropy in the sequence space can increase the required computing effort. We introduce another bisulfite mapping algorithm (abismal), based on the idea of encoding a four-letter DNA sequence as only two letters, one for purines and one for pyrimidines. We show that this encoding can lead to greater specificity compared to existing encodings used to map bisulfite sequencing reads. Through the two-letter encoding, the abismal software tool maps reads in less time and using less memory than most bisulfite sequencing read mapping software tools, while attaining similar accuracy. This allows in silico methylation analysis to be performed in a wider range of computing machines with limited hardware settings.


2021 ◽  
Author(s):  
Tatsuki Onishi ◽  
Naoki Honda ◽  
Yasunobu Igarashi

Coronavirus disease 2019 (COVID-19) is an emerging threat to the whole world, and every government is seeking an optimal solution. However, none of them have succeeded, and they have only provided series of natural experiments. Although simulation studies seem to be helpful, there is no model that addresses the how much testing to be conducted to minimise the emerging infectious disease outbreaks. In this study, we develop a testing susceptible, infectious, exposed, recovered, and dead (testing-SEIRD) model using two discrete populations inside and outside hospitals. The populations that tested positive were isolated. Through the simulations, we examined the infectious spread represented by the number of cumulative deaths, hospitalisations, and positive tests, depending on examination strategies, testing characteristics, and hospitalisation capacity. We found all-or-none responses of either expansion or extinction of the infectious spreads, depending on the rates of follow-up and mass testing, which represent testing the people identified as close contacts with infected patients using follow-up surveys and people with symptoms, respectively. We also demonstrated that there were optimal and worst examination strategies, which were determined by the total resources and testing costs. The testing-SEIRD model is useful in making decisions on examination strategies for the emerging infectious disease outbreaks.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Maurilio Monsu ◽  
Matteo Comin

Abstract Sequencing technologies has provided the basis of most modern genome sequencing studies due to its high base-level accuracy and relatively low cost. One of the most demanding step is mapping reads to the human reference genome. The reliance on a single reference human genome could introduce substantial biases in downstream analyses. Pangenomic graph reference representations offer an attractive approach for storing genetic variations. Moreover, it is possible to include known variants in the reference in order to make read mapping, variant calling, and genotyping variant-aware. Only recently a framework for variation graphs, vg [Garrison E, Adam MN, Siren J, et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018;36:875–9], have improved variation-aware alignment and variant calling in general. The major bottleneck of vg is its high cost of reads mapping to a variation graph. In this paper we study the problem of SNP calling on a variation graph and we present a fast reads alignment tool, named VG SNP-Aware. VG SNP-Aware is able align reads exactly to a variation graph and detect SNPs based on these aligned reads. The results show that VG SNP-Aware can efficiently map reads to a variation graph with a speedup of 40× with respect to vg and similar accuracy on SNPs detection.


2021 ◽  
Author(s):  
Giulio Formenti ◽  
Arang Rhie ◽  
Brian P Walenz ◽  
Francoise Thibaud-Nissen ◽  
Kishwar Shafin ◽  
...  

Read mapping and variant calling approaches have been widely used for accurate genotyping and improving consensus quality assembled from noisy long reads. Variant calling accuracy relies heavily on the read quality, the precision of the read mapping algorithm and variant caller, and the criteria adopted to filter the calls. However, it is impossible to define a single set of optimal parameters, as they vary depending on the quality of the read set, the variant caller of choice, and the quality of the unpolished assembly. To overcome this issue, we have devised a new tool called Merfin (k-mer based finishing tool), a k-mer based variant filtering algorithm for improved genotyping and polishing. Merfin evaluates the accuracy of a call based on expected k-mer multiplicity in the reads, independently of the quality of the read alignment and variant caller internal score. Moreover, we introduce novel assembly quality and completeness metrics that account for the expected genomic copy numbers. Merfin significantly increased the precision of a variant call and reduced frameshift errors when applied to PacBio HiFi, PacBio CLR, or Nanopore long read based assemblies. We demonstrate the utility while polishing the first complete human genome, a fully phased human genome, and non-human high-quality genomes.


2020 ◽  
Author(s):  
Miguel Betancourt-Cravioto ◽  
Jorge Falcón-Lezama ◽  
Fernando Rojas-Estrella ◽  
Rodrigo Saucedo-Martínez ◽  
Roberto Tapia-Conyer

In fighting infectious disease outbreaks, a basic epidemiological principle is to detect cases quickly and to isolate each case, to interrupt transmission. This principle has been the cornerstone of the Carso Group (CG) COVID Protocol, a systematic blueprint for the reopening of operations of workplaces in the context of ongoing disease transmission in Mexico. The CG comprises over 50 companies with approximately 180,000 employees engaged in economic activities including telecommunications, retail, construction, banking, mining, and manufacturing, among others. To cope with the COVID-19 pandemic within the CG, the Carlos Slim Foundation designed, developed and implemented MONITOR, a digital health ecosystem comprising a mobile phone application, web portal, and analytics platform, to assess the infection risk of each employee, follow-up their health status, and detect early symptoms of COVID-19. MONITOR provides daily notifications for any suspected cases and activates a COVID-19 testing request and follow-up of results. This intervention helps rapidly identify and isolate suspected cases, as well as follow-up of work and family contacts, to prevent further outbreaks. Use of MONITOR has thus enabled containment of COVID-19 in workplaces and safe return to work. MONITOR is an example of the implementation of public health practices in workplaces and can serve as the basis for larger deployment in population-wide settings.


Author(s):  
Paolo Buonanno ◽  
Marcello Puca

AbstractReal-time tracking of infectious disease outbreaks helps policymakers to make timely data-driven decisions. Official mortality data, whenever available, may be incomplete and published with a substantial delay. We report the results of using newspapers obituaries to nowcast the mortality levels observed in Italy during the COVID-19 outbreak between February 24, 2020 and April 15, 2020. We find that the mortality levels predicted using newspapers obituaries outperforms forecasts based on past mortality according to several performance metrics, making obituaries a potentially powerful alternative source of information to deal with real-time tracking of infectious disease outbreaks.


2019 ◽  
Vol 28 (3) ◽  
pp. 1039-1052
Author(s):  
Reva M. Zimmerman ◽  
JoAnn P. Silkes ◽  
Diane L. Kendall ◽  
Irene Minkina

Purpose A significant relationship between verbal short-term memory (STM) and language performance in people with aphasia has been found across studies. However, very few studies have examined the predictive value of verbal STM in treatment outcomes. This study aims to determine if verbal STM can be used as a predictor of treatment success. Method Retrospective data from 25 people with aphasia in a larger randomized controlled trial of phonomotor treatment were analyzed. Digit and word spans from immediately pretreatment were run in multiple linear regression models to determine whether they predict magnitude of change from pre- to posttreatment and follow-up naming accuracy. Pretreatment, immediately posttreatment, and 3 months posttreatment digit and word span scores were compared to determine if they changed following a novel treatment approach. Results Verbal STM, as measured by digit and word spans, did not predict magnitude of change in naming accuracy from pre- to posttreatment nor from pretreatment to 3 months posttreatment. Furthermore, digit and word spans did not change from pre- to posttreatment or from pretreatment to 3 months posttreatment in the overall analysis. A post hoc analysis revealed that only the less impaired group showed significant changes in word span scores from pretreatment to 3 months posttreatment. Discussion The results suggest that digit and word spans do not predict treatment gains. In a less severe subsample of participants, digit and word span scores can change following phonomotor treatment; however, the overall results suggest that span scores may not change significantly. The implications of these findings are discussed within the broader purview of theoretical and empirical associations between aphasic language and verbal STM processing.


Sign in / Sign up

Export Citation Format

Share Document