scholarly journals Assessment of inter-laboratory differences in SARS-CoV-2 consensus genome assemblies between public health laboratories in Australia

Author(s):  
Charles S.P. Foster ◽  
Sacha Stelzer-Braid ◽  
Ira W. Deveson ◽  
Rowena A. Bull ◽  
Malinna Yeang ◽  
...  

Whole-genome sequencing of viral isolates is critical for informing transmission patterns and ongoing evolution of pathogens, especially during a pandemic. However, when genomes have low variability in the early stages of a pandemic, the impact of technical and/or sequencing errors increases. We quantitatively assessed inter-laboratory differences in consensus genome assemblies of 72 matched SARS-CoV-2-positive specimens sequenced at different laboratories in Sydney, Australia. Raw sequence data were assembled using two different bioinformatics pipelines in parallel, and resulting consensus genomes were compared to detect laboratory-specific differences. Matched genome sequences were predominantly concordant, with a median pairwise identity of 99.997%. Identified differences were predominantly driven by ambiguous site content. Ignoring these produced differences in only 2.3% (5/216) of pairwise comparisons, each differing by a single nucleotide. Matched samples were assigned the same Pango lineage in 98.2% (212/216) of pairwise comparisons, and were mostly assigned to the same phylogenetic clade. However, epidemiological inference based only on single nucleotide variant distances may lead to significant differences in the number of defined clusters if variant allele frequency thresholds for consensus genome generation differ between laboratories. These results underscore the need for a unified, best-practices approach to bioinformatics between laboratories working on a common outbreak problem.

2018 ◽  
Author(s):  
Roger Ros-Freixedes ◽  
Battagin Mara ◽  
Martin Johnsson ◽  
Gregor Gorjanc ◽  
Alan J Mileham ◽  
...  

AbstractBackgroundInherent sources of error and bias that affect the quality of the sequence data include index hopping and bias towards the reference allele. The impact of these artefacts is likely greater for low-coverage data than for high-coverage data because low-coverage data has scant information and standard tools for processing sequence data were designed for high-coverage data. With the proliferation of cost-effective low-coverage sequencing there is a need to understand the impact of these errors and bias on resulting genotype calls.ResultsWe used a dataset of 26 pigs sequenced both at 2x with multiplexing and at 30x without multiplexing to show that index hopping and bias towards the reference allele due to alignment had little impact on genotype calls. However, pruning of alternative haplotypes supported by a number of reads below a predefined threshold, a default and desired step for removing potential sequencing errors in high-coverage data, introduced an unexpected bias towards the reference allele when applied to low-coverage data. This bias reduced best-guess genotype concordance of low-coverage sequence data by 19.0 absolute percentage points.ConclusionsWe propose a simple pipeline to correct this bias and we recommend that users of low-coverage sequencing be wary of unexpected biases produced by tools designed for high-coverage sequencing.


2019 ◽  
Author(s):  
Christina Nieuwoudt ◽  
Angela Brooks-Wilson ◽  
Jinko Graham

1AbstractSummaryFamily-based studies have several advantages over case-control studies for finding causal rare variants for a disease; these include increased power, smaller sample size requirements, and improved detection of sequencing errors. However, collecting suitable families and compiling their data is time-consuming and expensive. To evaluate methodology to identify causal rare variants in family-based studies, one can use simulated data. For this purpose we present the R package SimRVSequences. Users supply a sample of pedigrees and single-nucleotide variant data from a sample of unrelated individuals representing the pedigree founders. Users may also model genetic heterogeneity among families. For ease of use, SimRVSequences offers methods to import and format single-nucleotide variant data and pedigrees from existing software.Availability and ImplementationSimRVSequences is available as a library for R≥ 3.5.0 on the comprehensive R archive network.


2021 ◽  
Vol 149 ◽  
Author(s):  
Jing Wang ◽  
Mian Wang ◽  
Zihao Li ◽  
Xinyin Wu ◽  
Xian Zhang ◽  
...  

Abstract The aim of this study was to explore the impact of polymorphism of PD-1 gene and its interaction with tea drinking on susceptibility to tuberculosis (TB). A total of 503 patients with TB and 494 controls were enrolled in this case–control study. Three single-nucleotide polymorphisms of PD-1 (rs7568402, rs2227982 and rs36084323) were genotyped and unconditional logistic regression analysis was used to identify the association between PD-1 polymorphism and TB, while marginal structural linear odds models were used to estimate the interactions. Genotypes GA (OR 1.434), AA (OR 1.891) and GA + AA (OR 1.493) at rs7568402 were more prevalent in the TB patients than in the controls (P < 0.05). The relative excess risk of interaction (RERI) between rs7568402 of PD-1 genes and tea drinking was −0.3856 (95% confidence interval −0.7920 to −0.0209, P < 0.05), which showed a negative interaction. However, the RERIs between tea drinking and both rs2227982 and rs36084323 of PD-1 genes were not statistically significant. Our data demonstrate that rs7568402 of PD-1 genes was associated with susceptibility to TB, and there was a significant negative interaction between rs7568402 and tea drinking. Therefore, preventive measures through promoting the consumption of tea should be emphasised in the high-risk populations.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 477-477
Author(s):  
Leah K Treffer ◽  
Edward S Rice ◽  
Anna M Fuller ◽  
Samuel Cutler ◽  
Jessica L Petersen

Abstract Domestic yak (Bos grunniens) are bovids native to the Asian Qinghai-Tibetan Plateau. Studies of Asian yak have revealed that introgression with domestic cattle has contributed to the evolution of the species. When imported to North America (NA), some hybridization with B. taurus did occur. The objective of this study was to use mitochondrial (mt) DNA sequence data to better understand the mtDNA origin of NA yak and their relationship to Asian yak and related species. The complete mtDNA sequence of 14 individuals (12 NA yak, 1 Tibetan yak, 1 Tibetan B. indicus) was generated and compared with sequences of similar species from GeneBank (B. indicus, B. grunniens (Chinese), B. taurus, B. gaurus, B. primigenius, B. frontalis, Bison bison, and Ovis aries). Individuals were aligned to the B. grunniens reference genome (ARS_UNL_BGru_maternal_1.0), which was also included in the analyses. The mtDNA genes were annotated using the ARS-UCD1.2 cattle sequence as a reference. Ten unique NA yak haplotypes were identified, which a haplotype network separated into two clusters. Variation among the NA haplotypes included 93 nonsynonymous single nucleotide polymorphisms. A maximum likelihood tree including all taxa was made using IQtree after the data were partitioned into twenty-two subgroups using PartitionFinder2. Notably, six NA yak haplotypes formed a clade with B. indicus; the other four haplotypes grouped with B. grunniens and fell as a sister clade to bison, gaur and gayal. These data demonstrate two mitochondrial origins of NA yak with genetic variation in protein coding genes. Although these data suggest yak introgression with B. indicus, it appears to date prior to importation into NA. In addition to contributing to our understanding of the species history, these results suggest the two major mtDNA haplotypes in NA yak may functionally differ. Characterization of the impact of these differences on cellular function is currently underway.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Jean-Marc Aury ◽  
Benjamin Istace

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Sebastian Carrasco Pro ◽  
Katia Bulekova ◽  
Brian Gregor ◽  
Adam Labadorf ◽  
Juan Ignacio Fuxman Bass

Abstract Single nucleotide variants (SNVs) located in transcriptional regulatory regions can result in gene expression changes that lead to adaptive or detrimental phenotypic outcomes. Here, we predict gain or loss of binding sites for 741 transcription factors (TFs) across the human genome. We calculated ‘gainability’ and ‘disruptability’ scores for each TF that represent the likelihood of binding sites being created or disrupted, respectively. We found that functional cis-eQTL SNVs are more likely to alter TF binding sites than rare SNVs in the human population. In addition, we show that cancer somatic mutations have different effects on TF binding sites from different TF families on a cancer-type basis. Finally, we discuss the relationship between these results and cancer mutational signatures. Altogether, we provide a blueprint to study the impact of SNVs derived from genetic variation or disease association on TF binding to gene regulatory regions.


2015 ◽  
Vol 308 (9) ◽  
pp. C758-C766 ◽  
Author(s):  
Xinjun Cindy Zhu ◽  
Rafiquel Sarker ◽  
John R. Horton ◽  
Molee Chakraborty ◽  
Tian-E Chen ◽  
...  

Genetic determinants appear to play a role in susceptibility to chronic diarrhea, but the genetic abnormalities involved have only been identified in a few conditions. The Na+/H+ exchanger 3 (NHE3) accounts for a large fraction of physiologic intestinal Na+ absorption. It is highly regulated through effects on its intracellular COOH-terminal regulatory domain. The impact of genetic variation in the NHE3 gene, such as single nucleotide polymorphisms (SNPs), on transporter activity remains unexplored. From a total of 458 SNPs identified in the entire NHE3 gene, we identified three nonsynonymous mutations (R474Q, V567M, and R799C), which were all in the protein's intracellular COOH-terminal domain. Here we evaluated whether these SNPs affect NHE3 activity by expressing them in a mammalian cell line that is null for all plasma membrane NHEs. These variants significantly reduced basal NHE3 transporter activity through a reduction in intrinsic NHE3 function in variant R474Q, abnormal trafficking in variant V567M, or defects in both intrinsic NHE3 function and trafficking in variant R799C. In addition, variants NHE3 R474Q and R799C failed to respond to acute dexamethasone stimulation, suggesting cells with these mutant proteins might be defective in NHE3 function during postprandial stimulation and perhaps under stressful conditions. Finally, variant R474Q was shown to exhibit an aberrant interaction with calcineurin B homologous protein (CHP), an NHE3 regulatory protein required for basal NHE3 activity. Taken together, these results demonstrate decreased transport activity in three SNPs of NHE3 and provide mechanistic insight into how these SNPs impact NHE3 function.


Author(s):  
Jacqueline Neubauer ◽  
Shouyu Wang ◽  
Giancarlo Russo ◽  
Cordula Haas

AbstractSudden unexplained death (SUD) takes up a considerable part in overall sudden death cases, especially in adolescents and young adults. During the past decade, many channelopathy- and cardiomyopathy-associated single nucleotide variants (SNVs) have been identified in SUD studies by means of postmortem molecular autopsy, yet the number of cases that remain inconclusive is still high. Recent studies had suggested that structural variants (SVs) might play an important role in SUD, but there is no consensus on the impact of SVs on inherited cardiac diseases. In this study, we searched for potentially pathogenic SVs in 244 genes associated with cardiac diseases. Whole-exome sequencing and appropriate data analysis were performed in 45 SUD cases. Re-analysis of the exome data according to the current ACMG guidelines identified 14 pathogenic or likely pathogenic variants in 10 (22.2%) out of the 45 SUD cases, whereof 2 (4.4%) individuals had variants with likely functional effects in the channelopathy-associated genes SCN5A and TRDN and 1 (2.2%) individual in the cardiomyopathy-associated gene DTNA. In addition, 18 structural variants (SVs) were identified in 15 out of the 45 individuals. Two SVs with likely functional impairment were found in the coding regions of PDSS2 and TRPM4 in 2 SUD cases (4.4%). Both were identified as heterozygous deletions, which were confirmed by multiplex ligation-dependent probe amplification. In conclusion, our findings support that SVs could contribute to the pathology of the sudden death event in some of the cases and therefore should be investigated on a routine basis in suspected SUD cases.


2012 ◽  
Vol 2012 ◽  
pp. 1-10 ◽  
Author(s):  
Weifan Gao ◽  
Sukumar Saha ◽  
Din-Pow Ma ◽  
Yufang Guo ◽  
Johnie N. Jenkins ◽  
...  

A cotton fiber cDNA and its genomic sequences encoding an A-type cyclin-dependent kinase (GhCDKA) were cloned and characterized. The encoded GhCDKA protein contains the conserved cyclin-binding, ATP binding, and catalytic domains. Northern blot and RT-PCR analysis revealed that the GhCDKA transcript was high in 5–10 DPA fibers, moderate in 15 and 20 DPA fibers and roots, and low in flowers and leaves. GhCDKA protein levels in fibers increased from 5–15 DPA, peaked at 15 DPA, and decreased from 15 t0 20 DPA. The differential expression of GhCDKA suggested that the gene might play an important role in fiber development. The GhCDKA sequence data was used to develop single nucleotide polymorphism (SNP) markers specific for the CDKA gene in cotton. A primer specific to one of the SNPs was used to locate the CDKA gene to chromosome 16 by deletion analysis using a series of hypoaneuploid interspecific hybrids.


Sign in / Sign up

Export Citation Format

Share Document