scholarly journals Genome Wide Variant Analysis of Simplex Autism Families with an Integrative Clinical-Bioinformatics Pipeline

2015 ◽  
Author(s):  
Laura T Jiménez-Barrón ◽  
Jason A O'Rawe ◽  
Yiyang Wu ◽  
Margaret Yoon ◽  
Han Fang ◽  
...  

Autism spectrum disorders (ASD) are a group of developmental disabilities that affect social interaction, communication and are characterized by repetitive behaviors. There is now a large body of evidence that suggests a complex role of genetics in ASD, in which many different loci are involved. Although many current population scale genomic studies have been demonstrably fruitful, these studies generally focus on analyzing a limited part of the genome or use a limited set of bioinformatics tools. These limitations preclude the analysis of genome-wide perturbations that may contribute to the development and severity of ASD-related phenotypes. To overcome these limitations, we have developed and utilized an integrative clinical and bioinformatics pipeline for generating a more complete and reliable set of genomic variants for downstream analyses. Our study focuses on the analysis of three simplex autism families consisting of one affected child, unaffected parents, and one unaffected sibling. All members were clinically evaluated and widely phenotyped. Genotyping arrays and whole genome sequencing were performed on each member, and the resulting sequencing data were analyzed using a variety of available bioinformatics tools. We searched for rare variants of putative functional impact that were found to be segregating according to de-novo, autosomal recessive, x-linked, mitochondrial and compound heterozygote transmission models. The resulting candidate variants included three small heterozygous CNVs, a rare heterozygous de novo nonsense mutation in MYBBP1A located within exon 1, and a novel de novo missense variant in LAMB3. Our work demonstrates how more comprehensive analyses that include rich clinical data and whole genome sequencing data can generate reliable results for use in downstream investigations. We are moving to implement our framework for the analysis and study of larger cohorts of families, where statistical rigor can accompany genetic findings.

2017 ◽  
Author(s):  
Adriana Munoz ◽  
Boris Yamrom ◽  
Yoon-ha Lee ◽  
Peter Andrews ◽  
Steven Marks ◽  
...  

AbstractCopy number profiling and whole-exome sequencing has allowed us to make remarkable progress in our understanding of the genetics of autism over the past ten years, but there are major aspects of the genetics that are unresolved. Through whole-genome sequencing, additional types of genetic variants can be observed. These variants are abundant and to know which are functional is challenging. We have analyzed whole-genome sequencing data from 510 of the Simons Simplex Collections quad families and focused our attention on intronic variants. Within the introns of 546 high-quality autism target genes, we identified 63 de novo indels in the affected and only 37 in the unaffected siblings. The difference of 26 events is significantly larger than expected (p-val = 0.01) and using reasonable extrapolation shows that de novo intronic indels can contribute to at least 10% of simplex autism. The significance increases if we restrict to the half of the autism targets that are intolerant to damaging variants in the normal human population, which half we expect to be even more enriched for autism genes. For these 273 targets we observe 43 and 20 events in affected and unaffected siblings, respectively (p-value of 0.005). There was no significant signal in the number of de novo intronic indels in any of the control sets of genes analyzed. We see no signal from de novo substitutions in the introns of target genes.


2021 ◽  
Vol 7 (7) ◽  
Author(s):  
Casper Jamin ◽  
Sien De Koster ◽  
Stefanie van Koeveringe ◽  
Dieter De Coninck ◽  
Klaas Mensaert ◽  
...  

Whole-genome sequencing (WGS) is becoming the de facto standard for bacterial typing and outbreak surveillance of resistant bacterial pathogens. However, interoperability for WGS of bacterial outbreaks is poorly understood. We hypothesized that harmonization of WGS for outbreak surveillance is achievable through the use of identical protocols for both data generation and data analysis. A set of 30 bacterial isolates, comprising of various species belonging to the Enterobacteriaceae family and Enterococcus genera, were selected and sequenced using the same protocol on the Illumina MiSeq platform in each individual centre. All generated sequencing data were analysed by one centre using BioNumerics (6.7.3) for (i) genotyping origin of replications and antimicrobial resistance genes, (ii) core-genome multi-locus sequence typing (cgMLST) for Escherichia coli and Klebsiella pneumoniae and whole-genome multi-locus sequencing typing (wgMLST) for all species. Additionally, a split k-mer analysis was performed to determine the number of SNPs between samples. A precision of 99.0% and an accuracy of 99.2% was achieved for genotyping. Based on cgMLST, a discrepant allele was called only in 2/27 and 3/15 comparisons between two genomes, for E. coli and K. pneumoniae, respectively. Based on wgMLST, the number of discrepant alleles ranged from 0 to 7 (average 1.6). For SNPs, this ranged from 0 to 11 SNPs (average 3.4). Furthermore, we demonstrate that using different de novo assemblers to analyse the same dataset introduces up to 150 SNPs, which surpasses most thresholds for bacterial outbreaks. This shows the importance of harmonization of data-processing surveillance of bacterial outbreaks. In summary, multi-centre WGS for bacterial surveillance is achievable, but only if protocols are harmonized.


2021 ◽  
Author(s):  
Stephanie L Battle ◽  
Daniela Puiu ◽  
Eric Boerwinkle ◽  
Kent Taylor ◽  
Jerome Rotter ◽  
...  

Mitochondrial diseases are a heterogeneous group of disorders that can be caused by mutations in the nuclear or mitochondrial genome. Mitochondrial DNA variants may exist in a state of heteroplasmy, where a percentage of DNA molecules harbor a variant, or homoplasmy, where all DNA molecules have a variant. The relative quantity of mtDNA in a cell, or copy number (mtDNA-CN), is associated with mitochondrial function, human disease, and mortality. To facilitate accurate identification of heteroplasmy and quantify mtDNA-CN, we built a bioinformatics pipeline that takes whole genome sequencing data and outputs mitochondrial variants, and mtDNA-CN. We incorporate variant annotations to facilitate determination of variant significance. Our pipeline yields uniform coverage by remapping to a circularized chrM and recovering reads falsely mapped to nuclear-encoded mitochondrial sequences. Notably, we construct a consensus chrM sequence for each sample and recall heteroplasmy against the sample's unique mitochondrial genome. We observe an approximately 3-fold increased association with age for heteroplasmic variants in non-homopolymer regions and, are better able to capture genetic variation in the D-loop of chrM compared to existing software. Our bioinformatics pipeline more accurately captures features of mitochondrial genetics than existing pipelines that are important in understanding how mitochondrial dysfunction contributes to disease.


PLoS ONE ◽  
2015 ◽  
Vol 10 (2) ◽  
pp. e0116358 ◽  
Author(s):  
Sergio I. Nemirovsky ◽  
Marta Córdoba ◽  
Jonathan J. Zaiat ◽  
Sabrina P. Completa ◽  
Patricia A. Vega ◽  
...  

F1000Research ◽  
2015 ◽  
Vol 4 ◽  
pp. 1407 ◽  
Author(s):  
Andy G. Lynch

It is now commonplace to investigate tumour samples using whole-genome sequencing, and some commonly performed tasks are the estimation of cellularity (or sample purity), the genome-wide profiling of copy numbers, and the assessment of sub-clonal behaviours. Several tools are available to undertake these tasks, but often give conflicting results – not least because there is often genuine uncertainty due to a lack of model identifiability. Presented here is a tool, "Crambled", that allows for an intuitive visual comparison of the conflicting solutions. Crambled is implemented as a Shiny application within R, and is accompanied by example images from two use cases (one tumour sample with matched normal sequencing, and one standalone cell line example) as well as functions to generate the necessary images from any sequencing data set. Through the use of Crambled, a user may gain insight into why each tool has offered its given solution and combined with a knowledge of the disease being studied can choose between the competing solutions in an informed manner.


Sign in / Sign up

Export Citation Format

Share Document