Genome Wide Variant Analysis of Simplex Autism Families with an Integrative Clinical-Bioinformatics Pipeline

Autism spectrum disorders (ASD) are a group of developmental disabilities that affect social interaction, communication and are characterized by repetitive behaviors. There is now a large body of evidence that suggests a complex role of genetics in ASD, in which many different loci are involved. Although many current population scale genomic studies have been demonstrably fruitful, these studies generally focus on analyzing a limited part of the genome or use a limited set of bioinformatics tools. These limitations preclude the analysis of genome-wide perturbations that may contribute to the development and severity of ASD-related phenotypes. To overcome these limitations, we have developed and utilized an integrative clinical and bioinformatics pipeline for generating a more complete and reliable set of genomic variants for downstream analyses. Our study focuses on the analysis of three simplex autism families consisting of one affected child, unaffected parents, and one unaffected sibling. All members were clinically evaluated and widely phenotyped. Genotyping arrays and whole genome sequencing were performed on each member, and the resulting sequencing data were analyzed using a variety of available bioinformatics tools. We searched for rare variants of putative functional impact that were found to be segregating according to de-novo, autosomal recessive, x-linked, mitochondrial and compound heterozygote transmission models. The resulting candidate variants included three small heterozygous CNVs, a rare heterozygous de novo nonsense mutation in MYBBP1A located within exon 1, and a novel de novo missense variant in LAMB3. Our work demonstrates how more comprehensive analyses that include rich clinical data and whole genome sequencing data can generate reliable results for use in downstream investigations. We are moving to implement our framework for the analysis and study of larger cohorts of families, where statistical rigor can accompany genetic findings.

Download Full-text

Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data

BMC Bioinformatics ◽

10.1186/s12859-017-1927-y ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 21

Author(s):

Kosai Al-Nakeeb ◽

Thomas Nordahl Petersen ◽

Thomas Sicheritz-Pontén

Keyword(s):

Mitochondrial Dna ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text

Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data

Genome Biology ◽

10.1186/s13059-017-1216-0 ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 46

Author(s):

Yang Wu ◽

Zhili Zheng ◽

Peter M. Visscher ◽

Jian Yang

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Association Studies ◽

Genome Wide Association ◽

Whole Genome Sequencing Data ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Sequencing Data ◽

Genome Wide

Download Full-text

De novo indels within introns contribute to ASD incidence

10.1101/137471 ◽

2017 ◽

Cited By ~ 2

Author(s):

Adriana Munoz ◽

Boris Yamrom ◽

Yoon-ha Lee ◽

Peter Andrews ◽

Steven Marks ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Target Genes ◽

De Novo ◽

Whole Genome Sequencing Data ◽

P Value ◽

Whole Genome ◽

Sequencing Data ◽

Control Sets ◽

The Difference

AbstractCopy number profiling and whole-exome sequencing has allowed us to make remarkable progress in our understanding of the genetics of autism over the past ten years, but there are major aspects of the genetics that are unresolved. Through whole-genome sequencing, additional types of genetic variants can be observed. These variants are abundant and to know which are functional is challenging. We have analyzed whole-genome sequencing data from 510 of the Simons Simplex Collections quad families and focused our attention on intronic variants. Within the introns of 546 high-quality autism target genes, we identified 63 de novo indels in the affected and only 37 in the unaffected siblings. The difference of 26 events is significantly larger than expected (p-val = 0.01) and using reasonable extrapolation shows that de novo intronic indels can contribute to at least 10% of simplex autism. The significance increases if we restrict to the half of the autism targets that are intolerant to damaging variants in the normal human population, which half we expect to be even more enriched for autism genes. For these 273 targets we observe 43 and 20 events in affected and unaffected siblings, respectively (p-value of 0.005). There was no significant signal in the number of de novo intronic indels in any of the control sets of genes analyzed. We see no signal from de novo substitutions in the introns of target genes.

Download Full-text

Harmonization of whole-genome sequencing for outbreak surveillance of Enterobacteriaceae and Enterococci

Microbial Genomics ◽

10.1099/mgen.0.000567 ◽

2021 ◽

Vol 7 (7) ◽

Author(s):

Casper Jamin ◽

Sien De Koster ◽

Stefanie van Koeveringe ◽

Dieter De Coninck ◽

Klaas Mensaert ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Type Species ◽

De Novo ◽

Whole Genome ◽

Data Generation ◽

Sequencing Data ◽

Content Type ◽

Link Type ◽

Antimicrobial Resistance Genes

Whole-genome sequencing (WGS) is becoming the de facto standard for bacterial typing and outbreak surveillance of resistant bacterial pathogens. However, interoperability for WGS of bacterial outbreaks is poorly understood. We hypothesized that harmonization of WGS for outbreak surveillance is achievable through the use of identical protocols for both data generation and data analysis. A set of 30 bacterial isolates, comprising of various species belonging to the Enterobacteriaceae family and Enterococcus genera, were selected and sequenced using the same protocol on the Illumina MiSeq platform in each individual centre. All generated sequencing data were analysed by one centre using BioNumerics (6.7.3) for (i) genotyping origin of replications and antimicrobial resistance genes, (ii) core-genome multi-locus sequence typing (cgMLST) for Escherichia coli and Klebsiella pneumoniae and whole-genome multi-locus sequencing typing (wgMLST) for all species. Additionally, a split k-mer analysis was performed to determine the number of SNPs between samples. A precision of 99.0% and an accuracy of 99.2% was achieved for genotyping. Based on cgMLST, a discrepant allele was called only in 2/27 and 3/15 comparisons between two genomes, for E. coli and K. pneumoniae, respectively. Based on wgMLST, the number of discrepant alleles ranged from 0 to 7 (average 1.6). For SNPs, this ranged from 0 to 11 SNPs (average 3.4). Furthermore, we demonstrate that using different de novo assemblers to analyse the same dataset introduces up to 150 SNPs, which surpasses most thresholds for bacterial outbreaks. This shows the importance of harmonization of data-processing surveillance of bacterial outbreaks. In summary, multi-centre WGS for bacterial surveillance is achievable, but only if protocols are harmonized.

Download Full-text

A Bioinformatics Pipeline for Estimating Mitochondria DNA Copy Number and Heteroplasmy Levels from Whole Genome Sequencing Data

10.1101/2021.12.28.21268452 ◽

2021 ◽

Author(s):

Stephanie L Battle ◽

Daniela Puiu ◽

Eric Boerwinkle ◽

Kent Taylor ◽

Jerome Rotter ◽

...

Keyword(s):

Mitochondrial Genome ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Dna Molecules ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Accurate Identification

Mitochondrial diseases are a heterogeneous group of disorders that can be caused by mutations in the nuclear or mitochondrial genome. Mitochondrial DNA variants may exist in a state of heteroplasmy, where a percentage of DNA molecules harbor a variant, or homoplasmy, where all DNA molecules have a variant. The relative quantity of mtDNA in a cell, or copy number (mtDNA-CN), is associated with mitochondrial function, human disease, and mortality. To facilitate accurate identification of heteroplasmy and quantify mtDNA-CN, we built a bioinformatics pipeline that takes whole genome sequencing data and outputs mitochondrial variants, and mtDNA-CN. We incorporate variant annotations to facilitate determination of variant significance. Our pipeline yields uniform coverage by remapping to a circularized chrM and recovering reads falsely mapped to nuclear-encoded mitochondrial sequences. Notably, we construct a consensus chrM sequence for each sample and recall heteroplasmy against the sample's unique mitochondrial genome. We observe an approximately 3-fold increased association with age for heteroplasmic variants in non-homopolymer regions and, are better able to capture genetic variation in the D-loop of chrM compared to existing software. Our bioinformatics pipeline more accurately captures features of mitochondrial genetics than existing pipelines that are important in understanding how mitochondrial dysfunction contributes to disease.

Download Full-text

Whole Genome Sequencing Reveals a De Novo SHANK3 Mutation in Familial Autism Spectrum Disorder

PLoS ONE ◽

10.1371/journal.pone.0116358 ◽

2015 ◽

Vol 10 (2) ◽

pp. e0116358 ◽

Cited By ~ 32

Author(s):

Sergio I. Nemirovsky ◽

Marta Córdoba ◽

Jonathan J. Zaiat ◽

Sabrina P. Completa ◽

Patricia A. Vega ◽

...

Keyword(s):

Autism Spectrum Disorder ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Whole Genome

Download Full-text

Genome-Wide Identification of Microsatellites and Transposable Elements in the Dromedary Camel Genome Using Whole-Genome Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2019.00692 ◽

2019 ◽

Vol 10 ◽

Cited By ~ 1

Author(s):

Reza Khalkhali-Evrigh ◽

Nemat Hedayat-Evrigh ◽

Seyed Hasan Hafezian ◽

Ayoub Farhadi ◽

Mohammad Reza Bakhtiarizadeh

Keyword(s):

Transposable Elements ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Dromedary Camel ◽

Whole Genome ◽

Sequencing Data ◽

Genome Wide

Download Full-text

Review of "Uniparental disomy analysis in trios using genome-wide SNP array and whole-genome sequencing data imply segmental uniparental isodisomy in general populations"

Publons reviews and discussion ◽

10.14322/publons.r63528 ◽

2015 ◽

Author(s):

Thomas Liehr

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Snp Array ◽

Uniparental Disomy ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Genome Wide ◽

Uniparental Isodisomy ◽

General Populations

Download Full-text

Identification of de novo and rare inherited mutations in autism spectrum disorder probands from Qatar by whole genome sequencing.

Qatar Foundation Annual Research Forum Proceedings ◽

10.5339/qfarf.2013.bioo-07 ◽

2013 ◽

pp. BIOO 07

Author(s):

Hatem El-Shanti

Keyword(s):

Autism Spectrum Disorder ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Whole Genome ◽

Inherited Mutations

Download Full-text

Crambled: A Shiny application to enable intuitive resolution of conflicting cellularity estimates

F1000Research ◽

10.12688/f1000research.7453.1 ◽

2015 ◽

Vol 4 ◽

pp. 1407 ◽

Cited By ~ 2

Author(s):

Andy G. Lynch

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome ◽

Sequencing Data ◽

Data Set ◽

Tumour Sample ◽

Copy Numbers ◽

Genome Wide ◽

Insight Into ◽

Shiny Application

It is now commonplace to investigate tumour samples using whole-genome sequencing, and some commonly performed tasks are the estimation of cellularity (or sample purity), the genome-wide profiling of copy numbers, and the assessment of sub-clonal behaviours. Several tools are available to undertake these tasks, but often give conflicting results – not least because there is often genuine uncertainty due to a lack of model identifiability. Presented here is a tool, "Crambled", that allows for an intuitive visual comparison of the conflicting solutions. Crambled is implemented as a Shiny application within R, and is accompanied by example images from two use cases (one tumour sample with matched normal sequencing, and one standalone cell line example) as well as functions to generate the necessary images from any sequencing data set. Through the use of Crambled, a user may gain insight into why each tool has offered its given solution and combined with a knowledge of the disease being studied can choose between the competing solutions in an informed manner.

Download Full-text