Whole genome sequencing of orofacial cleft trios from the Gabriella Miller Kids First Pediatric Research Consortium identifies a new locus on chromosome 21

Mapping Intimacies ◽

10.1101/743526 ◽

2019 ◽

Author(s):

Nandita Mukhopadhyay ◽

Madison Bishop ◽

Michael Mortillo ◽

Pankaj Chopra ◽

Jacqueline B. Hetmanski ◽

...

Keyword(s):

Birth Defects ◽

Large Scale ◽

Pediatric Research ◽

Chromosome 21 ◽

Whole Genome Sequence ◽

Whole Genome ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Family Based ◽

Kids First

AbstractOrofacial clefts (OFCs) are one of the most common birth defects worldwide and create a significant health burden. The majority of OFCs are non-syndromic, and the genetic component has been only partially determined. Here, we analyze whole genome sequence (WGS) data for association with risk of OFCs in European and Colombian families selected from a multicenter family-based OFC study. Part of the Gabriella Miller Kids First Pediatric Research Program, this is the first large-scale WGS study of OFC in parent-offspring trios. WGS provides deeper and more specific genetic data than currently available using imputation on single nucleotide polymorphic (SNP) marker panels. Here, association analysis of genome-wide single nucleotide variants (SNV) and short insertions and deletions (indels) identified a new locus on chromosome 21 in Colombian families, within a region known to be expressed during craniofacial development. This study reinforces the ancestry differences seen in the genetic etiology of OFCs, and the need for larger samples when for studying OFCs and other birth defects in admixed populations.

Whole genome sequencing of orofacial cleft trios from the Gabriella Miller Kids First Pediatric Research Consortium identifies a new locus on chromosome 21

Human Genetics ◽

10.1007/s00439-019-02099-1 ◽

2019 ◽

Vol 139 (2) ◽

pp. 215-226 ◽

Cited By ~ 3

Author(s):

Nandita Mukhopadhyay ◽

Madison Bishop ◽

Michael Mortillo ◽

Pankaj Chopra ◽

Jacqueline B. Hetmanski ◽

...

Keyword(s):

Latin American ◽

Birth Defects ◽

Pediatric Research ◽

Chromosome 21 ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genetic Etiology ◽

Single Nucleotide ◽

Public Health Burden ◽

Kids First

AbstractOrofacial clefts (OFCs) are among the most prevalent craniofacial birth defects worldwide and create a significant public health burden. The majority of OFCs are non-syndromic, and the genetic etiology of non-syndromic OFCs is only partially determined. Here, we analyze whole genome sequence (WGS) data for association with risk of OFCs in European and Colombian families selected from a multicenter family-based OFC study. This is the first large-scale WGS study of OFC in parent–offspring trios, and a part of the Gabriella Miller Kids First Pediatric Research Program created for the study of childhood cancers and structural birth defects. WGS provides deeper and more specific genetic data than using imputation on present-day single nucleotide polymorphic (SNP) marker panels. Genotypes of case–parent trios at single nucleotide variants (SNV) and short insertions and deletions (indels) spanning the entire genome were called from their sequences using human GRCh38 genome assembly, and analyzed for association using the transmission disequilibrium test. Among genome-wide significant associations, we identified a new locus on chromosome 21 in Colombian families, not previously observed in other larger OFC samples of Latin American ancestry. This locus is situated within a region known to be expressed during craniofacial development. Based on deeper investigation of this locus, we concluded that it contributed risk for OFCs exclusively in the Colombians. This study reinforces the ancestry differences seen in the genetic etiology of OFCs, and underscores the need for larger samples when studying for OFCs and other birth defects in populations with diverse ancestry.

Family-based gene-environment interaction using sequence kernel association test (FGE-SKAT) for complex quantitative traits

Scientific Reports ◽

10.1038/s41598-021-86871-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chao-Yu Guo ◽

Reng-Hong Wang ◽

Hsin-Chou Yang

Keyword(s):

Complex Traits ◽

Association Studies ◽

Association Test ◽

Whole Genome Sequence ◽

Environment Interaction ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Sequence Kernel Association Test ◽

Gene Environment ◽

Family Based

AbstractAfter the genome-wide association studies (GWAS) era, whole-genome sequencing is highly engaged in identifying the association of complex traits with rare variations. A score-based variance-component test has been proposed to identify common and rare genetic variants associated with complex traits while quickly adjusting for covariates. Such kernel score statistic allows for familial dependencies and adjusts for random confounding effects. However, the etiology of complex traits may involve the effects of genetic and environmental factors and the complex interactions between genes and the environment. Therefore, in this research, a novel method is proposed to detect gene and gene-environment interactions in a complex family-based association study with various correlated structures. We also developed an R function for the Fast Gene-Environment Sequence Kernel Association Test (FGE-SKAT), which is freely available as supplementary material for easy GWAS implementation to unveil such family-based joint effects. Simulation studies confirmed the validity of the new strategy and the superior statistical power. The FGE-SKAT was applied to the whole genome sequence data provided by Genetic Analysis Workshop 18 (GAW18) and discovered concordant and discordant regions compared to the methods without considering gene by environment interactions.

Whole-genome sequence data suggests environmental adaptation of Ethiopian sheep populations

Genome Biology and Evolution ◽

10.1093/gbe/evab014 ◽

2021 ◽

Author(s):

Pamela Wiener ◽

Christelle Robert ◽

Abulgasim Ahbara ◽

Mazdak Salavati ◽

Ayele Abebe ◽

...

Keyword(s):

High Altitude ◽

Environmental Variables ◽

Large Scale ◽

Sequence Data ◽

Strong Association ◽

Environmental Adaptation ◽

Whole Genome Sequence ◽

Single Nucleotide Variants ◽

High Altitude Adaptation ◽

Altitude Adaptation

Abstract Great progress has been made over recent years in the identification of selection signatures in the genomes of livestock species. This work has primarily been carried out in commercial breeds for which the dominant selection pressures, are associated with artificial selection. As agriculture and food security are likely to be strongly affected by climate change, a better understanding of environment-imposed selection on agricultural species is warranted. Ethiopia is an ideal setting to investigate environmental adaptation in livestock due to its wide variation in geo-climatic characteristics and the extensive genetic and phenotypic variation of its livestock. Here, we identified over three million single nucleotide variants across 12 Ethiopian sheep populations and applied landscape genomics approaches to investigate the association between these variants and environmental variables. Our results suggest that environmental adaptation for precipitation-related variables is stronger than that related to altitude or temperature, consistent with large-scale meta-analyses of selection pressure across species. The set of genes showing association with environmental variables was enriched for genes highly expressed in human blood and nerve tissues. There was also evidence of enrichment for genes associated with high-altitude adaptation although no strong association was identified with hypoxia-inducible-factor (HIF) genes. One of the strongest altitude-related signals was for a collagen gene, consistent with previous studies of high-altitude adaptation. Several altitude-associated genes also showed evidence of adaptation with temperature, suggesting a relationship between responses to these environmental factors. These results provide a foundation to investigate further the effects of climatic variables on small ruminant populations.

Accurate detection of subclonal single nucleotide variants in whole genome amplified and pooled cancer samples using HaloPlex target enrichment

BMC Genomics ◽

10.1186/1471-2164-14-856 ◽

2013 ◽

Vol 14 (1) ◽

pp. 856 ◽

Cited By ~ 15

Author(s):

Eva C Berglund ◽

Carl Lindqvist ◽

Shahina Hayat ◽

Elin Övernäs ◽

Niklas Henriksson ◽

...

Keyword(s):

Whole Genome ◽

Target Enrichment ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Accurate Detection

Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants

10.1101/316976 ◽

2018 ◽

Cited By ~ 4

Author(s):

Maxime Garcia ◽

Szilveszter Juhos ◽

Malin Larsson ◽

Pall I. Olason ◽

Marcel Martin ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Open Source ◽

Genome Sequencing ◽

Development Project ◽

Whole Genome ◽

Sequencing Analysis ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Insertion And Deletion ◽

Sample Heterogeneity

AbstractSummaryWhole-genome sequencing (WGS) is a cornerstone of precision medicine, but portable and reproducible open-source workflows for WGS analyses of germline and somatic variants are lacking. We present Sarek, a modular, comprehensive, and easy-to-install workflow, combining a range of software for the identification and annotation of single-nucleotide variants (SNVs), insertion and deletion variants (indels), structural variants, tumor sample heterogeneity, and karyotyping from germline or paired tumor/normal samples. Sarek is implemented in a bioinformatics workflow language (Nextflow) with Docker and Singularity compatible containers, ensuring easy deployment and full reproducibility at any Linux based compute cluster or cloud computing environment. Sarek supports the human reference genomes GRCh37 and GRCh38, and can readily be used both as a core production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups.AvailabilitySource code and instructions for local installation are available at GitHub (https://github.com/SciLifeLab/Sarek) under the MIT open-source license, and we invite the research community to contribute additional functionality as a collaborative open-source development project.

Whole-Genome Sequencing Allows for Improved Identification of Persistent Listeria monocytogenes in Food-Associated Environments

Applied and Environmental Microbiology ◽

10.1128/aem.01049-15 ◽

2015 ◽

Vol 81 (17) ◽

pp. 6024-6037 ◽

Cited By ~ 76

Author(s):

Matthew J. Stasiewicz ◽

Haley F. Oliver ◽

Martin Wiedmann ◽

Henk C. den Bakker

Keyword(s):

Listeria Monocytogenes ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genetic Determinants ◽

Single Nucleotide ◽

Content Type ◽

Food Borne ◽

Clonal Spread

ABSTRACTWhile the food-borne pathogenListeria monocytogenescan persist in food associated environments, there are no whole-genome sequence (WGS) based methods to differentiate persistent from sporadic strains. Whole-genome sequencing of 188 isolates from a longitudinal study ofL. monocytogenesin retail delis was used to (i) apply single-nucleotide polymorphism (SNP)-based phylogenetics for subtyping ofL. monocytogenes, (ii) use SNP counts to differentiate persistent from repeatedly reintroduced strains, and (iii) identify genetic determinants ofL. monocytogenespersistence. WGS analysis revealed three prophage regions that explained differences between three pairs of phylogenetically similar populations with pulsed-field gel electrophoresis types that differed by ≤3 bands. WGS-SNP-based phylogenetics found that putatively persistentL. monocytogenesrepresent SNP patterns (i) unique to a single retail deli, supporting persistence within the deli (11 clades), (ii) unique to a single state, supporting clonal spread within a state (7 clades), or (iii) spanning multiple states (5 clades). Isolates that formed one of 11 deli-specific clades differed by a median of 10 SNPs or fewer. Isolates from 12 putative persistence events had significantly fewer SNPs (median, 2 to 22 SNPs) than between isolates of the same subtype from other delis (median up to 77 SNPs), supporting persistence of the strain. In 13 events, nearly indistinguishable isolates (0 to 1 SNP) were found across multiple delis. No individual genes were enriched among persistent isolates compared to sporadic isolates. Our data show that WGS analysis improves food-borne pathogen subtyping and identification of persistent bacterial pathogens in food associated environments.

Whole Genome Sequencing of the Mutamouse Model Reveals Strain- and Colony-Level Variation, and Genomic Features of the Transgene Integration Site

Scientific Reports ◽

10.1038/s41598-019-50302-0 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 2

Author(s):

Matthew J. Meier ◽

Marc A. Beal ◽

Andrew Schoenrock ◽

Carole L. Yauk ◽

Francesco Marchetti

Keyword(s):

Integration Site ◽

Whole Genome Sequence ◽

Comparative Genomic ◽

Whole Genome ◽

Missense Mutations ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

High Coverage ◽

Deletion Event ◽

Transgene Integration

Abstract The MutaMouse transgenic rodent model is widely used for assessing in vivo mutagenicity. Here, we report the characterization of MutaMouse’s whole genome sequence and its genetic variants compared to the C57BL/6 reference genome. High coverage (>50X) next-generation sequencing (NGS) of whole genomes from multiple MutaMouse animals from the Health Canada (HC) colony showed ~5 million SNVs per genome, ~20% of which are putatively novel. Sequencing of two animals from a geographically separated colony at Covance indicated that, over the course of 23 years, each colony accumulated 47,847 (HC) and 17,677 (Covance) non-parental homozygous single nucleotide variants. We found no novel nonsense or missense mutations that impair the MutaMouse response to genotoxic agents. Pairing sequencing data with array comparative genomic hybridization (aCGH) improved the accuracy and resolution of copy number variants (CNVs) calls and identified 300 genomic regions with CNVs. We also used long-read sequence technology (PacBio) to show that the transgene integration site involved a large deletion event with multiple inversions and rearrangements near a retrotransposon. The MutaMouse genome gives important genetic context to studies using this model, offers insight on the mechanisms of structural variant formation, and contributes a framework to analyze aCGH results alongside NGS data.

Twenty-Five Years of Propagation in Suspension Cell Culture Results in Substantial Alterations of the Arabidopsis Thaliana Genome

Genes ◽

10.3390/genes10090671 ◽

2019 ◽

Vol 10 (9) ◽

pp. 671 ◽

Cited By ~ 2

Author(s):

Pucker ◽

Rückert ◽

Stracke ◽

Viehöver ◽

Kalinowski ◽

...

Keyword(s):

Cell Culture ◽

Arabidopsis Thaliana ◽

Large Scale ◽

Suspension Cell ◽

Reference Sequence ◽

Model Organisms ◽

Suspension Cell Culture ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Haploid Genome Size

Arabidopsis thaliana is one of the best studied plant model organisms. Besides cultivation in greenhouses, cells of this plant can also be propagated in suspension cell culture. At7 is one such cell line that was established about 25 years ago. Here, we report the sequencing and the analysis of the At7 genome. Large scale duplications and deletions compared to the Columbia-0 (Col-0) reference sequence were detected. The number of deletions exceeds the number of insertions, thus indicating that a haploid genome size reduction is ongoing. Patterns of small sequence variants differ from the ones observed between A. thaliana accessions, e.g., the number of single nucleotide variants matches the number of insertions/deletions. RNA-Seq analysis reveals that disrupted alleles are less frequent in the transcriptome than the native ones.

Cytosine base editor 4 but not adenine base editor generates off-target mutations in mouse embryos

Communications Biology ◽

10.1038/s42003-019-0745-3 ◽

2020 ◽

Vol 3 (1) ◽

Cited By ~ 15

Author(s):

Hye Kyung Lee ◽

Harold E. Smith ◽

Chengyu Liu ◽

Michaela Willi ◽

Lothar Hennighausen

Keyword(s):

Point Mutations ◽

Mouse Embryos ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Base Editing ◽

Genome Wide ◽

Wide Range ◽

Family Based ◽

Correct Point ◽

Adenine Base

AbstractDeaminase base editing has emerged as a tool to install or correct point mutations in the genomes of living cells in a wide range of organisms. However, the genome-wide off-target effects introduced by base editors in the mammalian genome have been examined in only one study. Here, we have investigated the fidelity of cytosine base editor 4 (BE4) and adenine base editors (ABE) in mouse embryos using unbiased whole-genome sequencing of a family-based trio cohort. The same sgRNA was used for BE4 and ABE. We demonstrate that BE4-edited mice carry an excess of single-nucleotide variants and deletions compared to ABE-edited mice and controls. Therefore, an optimization of cytosine base editors is required to improve its fidelity. While the remarkable fidelity of ABE has implications for a wide range of applications, the occurrence of rare aberrant C-to-T conversions at specific target sites needs to be addressed.

Large-Scale Whole-Genome Sequencing Reveals the Genetic Architecture of Primary Membranoproliferative GN and C3 Glomerulopathy

Journal of the American Society of Nephrology ◽

10.1681/asn.2019040433 ◽

2020 ◽

Vol 31 (2) ◽

pp. 365-373 ◽

Cited By ~ 7

Author(s):

Adam P. Levine ◽

Melanie M.Y. Chan ◽

Omid Sadeghi-Alavijeh ◽

Edwin K.S. Wong ◽

H. Terence Cook ◽

...

Keyword(s):

Large Scale ◽

Rare Variants ◽

Alternative Pathway ◽

Atypical Hemolytic Uremic Syndrome ◽

Gene Mutations ◽

Whole Genome Sequence ◽

European Ancestry ◽

Whole Genome ◽

C3 Glomerulopathy ◽

Complement Gene

BackgroundPrimary membranoproliferative GN, including complement 3 (C3) glomerulopathy, is a rare, untreatable kidney disease characterized by glomerular complement deposition. Complement gene mutations can cause familial C3 glomerulopathy, and studies have reported rare variants in complement genes in nonfamilial primary membranoproliferative GN.MethodsWe analyzed whole-genome sequence data from 165 primary membranoproliferative GN cases and 10,250 individuals without the condition (controls) as part of the National Institutes of Health Research BioResource–Rare Diseases Study. We examined copy number, rare, and common variants.ResultsOur analysis included 146 primary membranoproliferative GN cases and 6442 controls who were unrelated and of European ancestry. We observed no significant enrichment of rare variants in candidate genes (genes encoding components of the complement alternative pathway and other genes associated with the related disease atypical hemolytic uremic syndrome; 6.8% in cases versus 5.9% in controls) or exome-wide. However, a significant common variant locus was identified at 6p21.32 (rs35406322) (P=3.29×10−8; odds ratio [OR], 1.93; 95% confidence interval [95% CI], 1.53 to 2.44), overlapping the HLA locus. Imputation of HLA types mapped this signal to a haplotype incorporating DQA1*05:01, DQB1*02:01, and DRB1*03:01 (P=1.21×10−8; OR, 2.19; 95% CI, 1.66 to 2.89). This finding was replicated by analysis of HLA serotypes in 338 individuals with membranoproliferative GN and 15,614 individuals with nonimmune renal failure.ConclusionsWe found that HLA type, but not rare complement gene variation, is associated with primary membranoproliferative GN. These findings challenge the paradigm of complement gene mutations typically causing primary membranoproliferative GN and implicate an underlying autoimmune mechanism in most cases.