scholarly journals FamSeq: A Variant Calling Program for Family-Based Sequencing Data Using Graphics Processing Units

2014 ◽  
Vol 10 (10) ◽  
pp. e1003880 ◽  
Author(s):  
Gang Peng ◽  
Yu Fan ◽  
Wenyi Wang
2020 ◽  
Vol 36 (11) ◽  
pp. 3549-3551 ◽  
Author(s):  
Eddie K K Ip ◽  
Clinton Hadinata ◽  
Joshua W K Ho ◽  
Eleni Giannoulatou

Abstract Motivation In 2018, Google published an innovative variant caller, DeepVariant, which converts pileups of sequence reads into images and uses a deep neural network to identify single-nucleotide variants and small insertion/deletions from next-generation sequencing data. This approach outperforms existing state-of-the-art tools. However, DeepVariant was designed to call variants within a single sample. In disease sequencing studies, the ability to examine a family trio (father-mother-affected child) provides greater power for disease mutation discovery. Results To further improve DeepVariant’s variant calling accuracy in family-based sequencing studies, we have developed a family-based variant calling pipeline, dv-trio, which incorporates the trio information from the Mendelian genetic model into variant calling based on DeepVariant. Availability and implementation dv-trio is available via an open source BSD3 license at GitHub (https://github.com/VCCRI/dv-trio/). Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2013 ◽  
Vol 61 (4) ◽  
pp. 989-992 ◽  
Author(s):  
W. Frohmberg ◽  
M. Kierzynka ◽  
J. Blazewicz ◽  
P. Gawron ◽  
P. Wojciechowski

Abstract DNA/RNA sequencing has recently become a primary way researchers generate biological data for further analysis. Assembling algorithms are an integral part of this process. However, some of them require pairwise alignment to be applied to a great deal of reads. Although several efficient alignment tools have been released over the past few years, including those taking advantage of GPUs (Graphics Processing Units), none of them directly targets high-throughput sequencing data. As a result, a need arose to create software that could handle such data as effectively as possible. G-DNA (GPU-based DNA aligner) is the first highly parallel solution that has been optimized to process nucleotide reads (DNA/RNA) from modern sequencing machines. Results show that the software reaches up to 89 GCUPS (Giga Cell Updates Per Second) on a single GPU and as a result it is the fastest tool in its class. Moreover, it scales up well on multiple GPUs systems, including MPI-based computational clusters, where its performance is counted in TCUPS (Tera CUPS).


2019 ◽  
Vol 11 (1) ◽  
Author(s):  
Raquel Dias ◽  
Ali Torkamani

AbstractArtificial intelligence (AI) is the development of computer systems that are able to perform tasks that normally require human intelligence. Advances in AI software and hardware, especially deep learning algorithms and the graphics processing units (GPUs) that power their training, have led to a recent and rapidly increasing interest in medical AI applications. In clinical diagnostics, AI-based computer vision approaches are poised to revolutionize image-based diagnostics, while other AI subtypes have begun to show similar promise in various diagnostic modalities. In some areas, such as clinical genomics, a specific type of AI algorithm known as deep learning is used to process large and complex genomic datasets. In this review, we first summarize the main classes of problems that AI systems are well suited to solve and describe the clinical diagnostic tasks that benefit from these solutions. Next, we focus on emerging methods for specific tasks in clinical genomics, including variant calling, genome annotation and variant classification, and phenotype-to-genotype correspondence. Finally, we end with a discussion on the future potential of AI in individualized medicine applications, especially for risk prediction in common complex diseases, and the challenges, limitations, and biases that must be carefully addressed for the successful deployment of AI in medical applications, particularly those utilizing human genetics and genomics data.


Author(s):  
Steven J. Lind ◽  
Benedict D. Rogers ◽  
Peter K. Stansby

This paper presents a review of the progress of smoothed particle hydrodynamics (SPH) towards high-order converged simulations. As a mesh-free Lagrangian method suitable for complex flows with interfaces and multiple phases, SPH has developed considerably in the past decade. While original applications were in astrophysics, early engineering applications showed the versatility and robustness of the method without emphasis on accuracy and convergence. The early method was of weakly compressible form resulting in noisy pressures due to spurious pressure waves. This was effectively removed in the incompressible (divergence-free) form which followed; since then the weakly compressible form has been advanced, reducing pressure noise. Now numerical convergence studies are standard. While the method is computationally demanding on conventional processors, it is well suited to parallel processing on massively parallel computing and graphics processing units. Applications are diverse and encompass wave–structure interaction, geophysical flows due to landslides, nuclear sludge flows, welding, gearbox flows and many others. In the state of the art, convergence is typically between the first- and second-order theoretical limits. Recent advances are improving convergence to fourth order (and higher) and these will also be outlined. This can be necessary to resolve multi-scale aspects of turbulent flow.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Gundula Povysil ◽  
Monika Heinzl ◽  
Renato Salazar ◽  
Nicholas Stoler ◽  
Anton Nekrutenko ◽  
...  

Abstract Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with information on the forward and reverse strand. However, only a small proportion of reads are assembled into duplex consensus sequences (DCS), and reads with potentially valuable information are discarded at different steps of the bioinformatics pipeline, especially reads without a family. We developed a bioinformatics toolset that analyses the tag and family composition with the purpose to understand data loss and implement modifications to maximize the data output for the variant calling. Specifically, our tools show that tags contain polymerase chain reaction and sequencing errors that contribute to data loss and lower DCS yields. Our tools also identified chimeras, which likely reflect barcode collisions. Finally, we also developed a tool that re-examines variant calls from raw reads and provides different summary data that categorizes the confidence level of a variant call by a tier-based system. With this tool, we can include reads without a family and check the reliability of the call, that increases substantially the sequencing depth for variant calling, a particular important advantage for low-input samples or low-coverage regions.


Sign in / Sign up

Export Citation Format

Share Document