variant calling
Recently Published Documents


TOTAL DOCUMENTS

765
(FIVE YEARS 456)

H-INDEX

34
(FIVE YEARS 12)

2022 ◽  
Vol 12 ◽  
Author(s):  
Paolo Malune ◽  
Giovanna Piras ◽  
Maria Monne ◽  
Maura Fiamma ◽  
Rosanna Asproni ◽  
...  

BackgroundThe SARS-CoV-2 pandemic stimulated an outstanding global sequencing effort, which allowed to monitor viral circulation and evolution. Nuoro province (Sardinia, Italy), characterized by a relatively isolated geographical location and a low population density, was severely hit and displayed a high incidence of infection.MethodsAmplicon approach Next Generation Sequencing and subsequent variant calling in 92 respiratory samples from SARS-CoV-2 infected patients involved in infection clusters from March 2020 to May 2021.ResultsPhylogenetic analysis displayed a coherent distribution of sequences in terms of lineage and temporal evolution of pandemic. Circulating lineage/clade characterization highlighted a growing diversity over time, with an increasingly growing number of mutations and variability of spike and nucleocapsid proteins, while viral RdRp appeared to be more conserved. A total of 384 different mutations were detected, of which 196 were missense and 147 synonymous ones. Mapping mutations along the viral genome showed an irregular distribution in key genes. S gene was the most mutated gene with missense and synonymous variants frequencies of 58.8 and 23.5%, respectively. Mutation rates were similar for the S and N genes with one mutation every ∼788 nucleotides and every ∼712 nucleotides, respectively. Nsp12 gene appeared to be more conserved, with one mutation every ∼1,270 nucleotides. The frequency of variant Y144F in the spike protein deviated from global values with higher prevalence of this mutation in the island.ConclusionThe analysis of the 92 viral genome highlighted evolution over time and identified which mutations are more widespread than others. The high number of sequences also permits the identification of subclusters that are characterized by subtle differences, not only in terms of lineage, which may be used to reconstruct transmission clusters. The disclosure of viral genetic diversity and timely identification of new variants is a useful tool to guide public health intervention measures.


eLife ◽  
2022 ◽  
Vol 11 ◽  
Author(s):  
Lucie A Bergeron ◽  
Søren Besenbacher ◽  
Tychele Turner ◽  
Cyril J Versoza ◽  
Richard J Wang ◽  
...  

In the past decade, several studies have estimated the human per-generation germline mutation rate using large pedigrees. More recently, estimates for various non-human species have been published. However, methodological differences among studies in detecting germline mutations and estimating mutation rates make direct comparisons difficult. Here, we describe the many different steps involved in estimating pedigree-based mutation rates, including sampling, sequencing, mapping, variant calling, filtering, and how to appropriately account for false-positive and false-negative rates. For each step, we review the different methods and parameter choices that have been used in the recent literature. Additionally, we present the results from a 'Mutationathon', a competition organized among five research labs to compare germline mutation rate estimates for a single pedigree of rhesus macaques. We report almost a two-fold variation in the final estimated rate among groups using different post-alignment processing, calling, and filtering criteria and provide details into the sources of variation across studies. Though the difference among estimates is not statistically significant, this discrepancy emphasizes the need for standardized methods in mutation rate estimations and the difficulty in comparing rates from different studies. Finally, this work aims to provide guidelines for computational and statistical benchmarks for future studies interested in identifying germline mutations from pedigrees.


2022 ◽  
Author(s):  
Yoo-Jin Ha ◽  
Jisoo Kim ◽  
Seungseok Kang ◽  
Junhan Kim ◽  
Se-Young Jo ◽  
...  

Abstract The rapid advances in sequencing and analysis technologies have enabled the accurate detection of diverse forms of genomic variants, including germline, somatic, and mosaic mutations. However, unlike for the former two mutations, the best practices for mosaic variant calling still remain chaotic due to the technical and conceptual difficulties faced in evaluation. Here, we present our benchmark of nine feasible strategies for mosaic variant detection based on a systematically designed reference standard that mimics mosaic samples, with 390,153 control positive and 35,208,888 negative single-nucleotide variants and insertion–deletion mutations. We identified the condition-dependent strengths and weaknesses of the current strategies, instead of a single winner, regarding variant allele frequencies, variant sharing, and the usage of control samples. Moreover, feature-level investigation directs the way for immediate to prolonged improvements in mosaic variant calling. Our results will guide researchers in selecting suitable calling algorithms and suggest future strategies for developers.


2022 ◽  
Author(s):  
Jun Ma ◽  
Manuel Cáceres ◽  
Leena Salmela ◽  
Veli Mäkinen ◽  
Alexandru I. Tomescu

Aligning reads to a variation graph is a standard task in pangenomics, with downstream applications in e.g., improving variant calling. While the vg toolkit (Garrison et al., Nature Biotechnology, 2018) is a popular aligner of short reads, GraphAligner (Rautiainen and Marschall, Genome Biology, 2020) is the state-of-the-art aligner of long reads. GraphAligner works by finding candidate read occurrences based on individually extending the best seeds of the read in the variation graph. However, a more principled approach recognized in the community is to co-linearly chain multiple seeds. We present a new algorithm to co-linearly chain a set of seeds in an acyclic variation graph, together with the first efficient implementation of such a co-linear chaining algorithm into a new aligner of long reads to variation graphs, GraphChainer. Compared to GraphAligner, at a normalized edit distance threshold of 40%, it aligns 9% to 12% more reads, and 15% to 19% more total read length, on real PacBio reads from human chromosomes 1 and 22. On both simulated and real data, GraphChainer aligns between 97% and 99% of all reads, and of total read length. At the more stringent normalized edit distance threshold of 30%, GraphChainer aligns up to 29% more total real read length than GraphAligner. GraphChainer is freely available at https://github.com/algbio/GraphChainer


2022 ◽  
Vol 15 (1) ◽  
Author(s):  
Vinh Hoa Pham ◽  
Van Lam Nguyen ◽  
Hye-Eun Jung ◽  
Yong-Soon Cho ◽  
Jae-Gook Shin

Abstract Background Few studies have annotated the whole mitochondrial DNA (mtDNA) genome associated with drug responses in Asian populations. This study aimed to characterize mtDNA genetic profiles, especially the distribution and frequency of well-known genetic biomarkers associated with diseases and drug-induced toxicity in a Korean population. Method Whole mitochondrial genome was sequenced for 118 Korean subjects by using a next-generation sequencing approach. The bioinformatic pipeline was constructed for variant calling, haplogroup classification and annotation of mitochondrial mutation. Results A total of 681 variants was identified among all subjects. The MT-TRNP gene and displacement loop showed the highest numbers of variants (113 and 74 variants, respectively). The m.16189T > C allele, which is known to reduce the mtDNA copy number in human cells was detected in 25.4% of subjects. The variants (m.2706A > G, m.3010A > G, and m.1095T > C), which are associated with drug-induced toxicity, were observed with the frequency of 99.15%, 30.51%, and 0.08%, respectively. The m.2150T > A, a genotype associated with highly disruptive effects on mitochondrial ribosomes, was identified in five subjects. The D and M groups were the most dominant groups with the frequency of 34.74% and 16.1%, respectively. Conclusions Our finding was consistent with Korean Genome Project and well reflected the unique profile of mitochondrial haplogroup distribution. It was the first study to annotate the whole mitochondrial genome with drug-induced toxicity to predict the ADRs event in clinical implementation for Korean subjects. This approach could be extended for further study for validation of the potential ethnic-specific mitochondrial genetic biomarkers in the Korean population.


GigaScience ◽  
2022 ◽  
Vol 11 (1) ◽  
Author(s):  
Dries Decap ◽  
Louise de Schaetzen van Brienen ◽  
Maarten Larmuseau ◽  
Pascal Costanza ◽  
Charlotte Herzeel ◽  
...  

Abstract Background The accurate detection of somatic variants from sequencing data is of key importance for cancer treatment and research. Somatic variant calling requires a high sequencing depth of the tumor sample, especially when the detection of low-frequency variants is also desired. In turn, this leads to large volumes of raw sequencing data to process and hence, large computational requirements. For example, calling the somatic variants according to the GATK best practices guidelines requires days of computing time for a typical whole-genome sequencing sample. Findings We introduce Halvade Somatic, a framework for somatic variant calling from DNA sequencing data that takes advantage of multi-node and/or multi-core compute platforms to reduce runtime. It relies on Apache Spark to provide scalable I/O and to create and manage data streams that are processed on different CPU cores in parallel. Halvade Somatic contains all required steps to process the tumor and matched normal sample according to the GATK best practices recommendations: read alignment (BWA), sorting of reads, preprocessing steps such as marking duplicate reads and base quality score recalibration (GATK), and, finally, calling the somatic variants (Mutect2). Our approach reduces the runtime on a single 36-core node to 19.5 h compared to a runtime of 84.5 h for the original pipeline, a speedup of 4.3 times. Runtime can be further decreased by scaling to multiple nodes, e.g., we observe a runtime of 1.36 h using 16 nodes, an additional speedup of 14.4 times. Halvade Somatic supports variant calling from both whole-genome sequencing and whole-exome sequencing data and also supports Strelka2 as an alternative or complementary variant calling tool. We provide a Docker image to facilitate single-node deployment. Halvade Somatic can be executed on a variety of compute platforms, including Amazon EC2 and Google Cloud. Conclusions To our knowledge, Halvade Somatic is the first somatic variant calling pipeline that leverages Big Data processing platforms and provides reliable, scalable performance. Source code is freely available.


2021 ◽  
Author(s):  
Zhenxian Zheng ◽  
Shumin Li ◽  
Junhao Su ◽  
Amy Wing-Sze Leung ◽  
Tak-Wah Lam ◽  
...  

Deep learning-based variant callers are becoming the standard and have achieved superior SNP calling performance using long reads. In this paper, we present Clair3, which makes the best of two major method categories: pile-up calling handles most variant candidates with speed, and full-alignment tackles complicated candidates to maximize precision and recall. Clair3 ran faster than any of the other state-of-the-art variant callers and performed the best, especially at lower coverage.


Biology ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 49
Author(s):  
Samathmika Ravi ◽  
Mahdi Hassani ◽  
Bahram Heidari ◽  
Saptarathi Deb ◽  
Elena Orsini ◽  
...  

Rhizoctonia solani, causing Rhizoctonia crown and root rot, is a major risk to sugar beet (Beta vulgaris L.) cultivation. The development of resistant varieties accelerated by marker-assisted selection is a priority of breeding programs. We report the identification of a single-nucleotide polymorphism (SNP) marker linked to Rhizoctonia resistance using restriction site-associated DNA (RAD) sequencing of two geographically discrete sets of plant materials with different degrees of resistance/susceptibility to enable a wider selection of superior genotypes. The variant calling pipeline utilized SAMtools for variant calling and the resulting raw SNPs from RAD sequencing (15,988 and 22,439 SNPs) were able to explain 13.40% and 25.45% of the phenotypic variation in the two sets of material from different sources of origin, respectively. An association analysis was carried out independently on both the datasets and mutually occurring significant SNPs were filtered depending on their contribution to the phenotype using principal component analysis (PCA) biplots. To provide a ready-to-use marker for the breeding community, a systematic molecular validation of significant SNPs distributed across the genome was undertaken to combine high-resolution melting, Sanger sequencing, and rhAmp SNP genotyping. We report that RsBv1 located on Chromosome 6 (9000093 bp) is significantly associated with Rhizoctonia resistance (p < 0.01) and able to explain 10% of the phenotypic disease variance. The related SNP assay is thus ready for marker-assisted selection in sugar beet breeding for Rhizoctonia resistance.


2021 ◽  
pp. gr.275579.121
Author(s):  
Daniel P Cooke ◽  
David C Wedge ◽  
Gerton Lunter

Genotyping from sequencing is the basis of emerging strategies in the molecular breeding of polyploid plants. However, compared with the situation for diploids, where genotyping accuracies are confidently determined with comprehensive benchmarks, polyploids have been neglected; there are no benchmarks measuring genotyping error rates for small variants using real sequencing reads. We previously introduced a variant calling method - Octopus - that accurately calls germline variants in diploids and somatic mutations in tumors. Here, we evaluate Octopus and other popular tools on whole-genome tetraploid and hexaploid datasets created using in silico mixtures of diploid Genome In a Bottle (GIAB) samples. We find that genotyping errors are abundant for typical sequencing depths, but that Octopus makes 25% fewer errors than other methods on average. We supplement our benchmarks with concordance analysis in real autotriploid banana datasets.


Genes ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 45
Author(s):  
Ben Braiek ◽  
Carole Moreno-Romieux ◽  
Charlotte Allain ◽  
Philippe Bardou ◽  
Arnaud Bordes ◽  
...  

We recently demonstrated that the Lacaune deficient homozygous haplotype 6 (LDHH6) potentially hosts a recessive perinatal lethal mutation in Lacaune dairy sheep mapped on OAR3. In the present study, we have analyzed the whole-genome sequences of two Lacaune ram heterozygous carriers of LDHH6. After variant calling and filtering against the variants of 86 non-carrier rams, we have identified a single nucleotide variant (SNV) in the two LDHH6 carriers whose variant allele induced a premature stop codon (p.Glu111*) in the Coiled-Coil Domain Containing 65 (CCDC65) gene. CCDC65 is involved in the assembly of the nexin-dynein regulatory complex for the formation of microtubules in ciliated cells. In order to identify the phenotype in homozygous sheep, we generated at-risk matings (n = 17) between rams and ewes heterozygous for the candidate variant in CCDC65. A total of 16 lambs were born alive with five genotyped as homozygous carriers. The homozygous lambs suffered from respiratory problems, and four of them died within the first month of life. At necropsy, we observed a broad hepatization of lung lobes possibly induced by infectious pneumonia. The management of this lethal recessive allele (frequency of 0.06) through reasoned mating in the Lacaune sheep selection schemes could reduce lamb mortality by 2%.


Sign in / Sign up

Export Citation Format

Share Document