scholarly journals Robust identification of deletions in exome and genome sequence data based on clustering of Mendelian errors

2017 ◽  
Author(s):  
Kathryn B. Manheimer ◽  
Nihir Patel ◽  
Felix Richter ◽  
Joshua Gorham ◽  
Angela C. Tai ◽  
...  

AbstractMultiple tools have been developed to identify copy number variants (CNVs) from whole exome (WES) and whole genome sequencing (WGS) data. Current tools such as XHMM for WES and CNVnator for WGS identify CNVs based on changes in read depth. For WGS, other methods to identify CNVs include utilizing discordant read pairs and split reads and genome-wide local assembly with tools such as Lumpy and SvABA, respectively. Here, we introduce a new method to identify deletion CNVs from WES and WGS trio data based on the clustering of Mendelian errors (MEs). Using our Mendelian Error Method (MEM), we identified 127 deletions (inherited and de novo) in 2,601 WES trios from the Pediatric Cardiac Genomics Consortium, with a validation rate of 88% by digital droplet PCR. MEM identified additional de novo deletions compared to XHMM, and also identified sample switches, DNA contamination, a significant enrichment of 15q11.2 deletions compared to controls and eight cases of uniparental disomy. We applied MEM to WGS data from the Genome In A Bottle Ashkenazi trio and identified deletions with 97% specificity. MEM provides a robust, computationally inexpensive method for identifying deletions, and an orthogonal approach for verifying deletions called by other tools.

2018 ◽  
Author(s):  
Whitney Whitford ◽  
Klaus Lehnert ◽  
Russell G. Snell ◽  
Jessie C. Jacobsen

AbstractBackgroundThe popularisation and decreased cost of genome resequencing has resulted in an increased use in molecular diagnostics. While there are a number of established and high quality bioinfomatic tools for identifying small genetic variants including single nucleotide variants and indels, currently there is no established standard for the detection of copy number variants (CNVs) from sequence data. The requirement for CNV detection from high throughput sequencing has resulted in the development of a large number of software packages. These tools typically utilise the sequence data characteristics: read depth, split reads, read pairs, and assembly-based techniques. However the additional source of information from read balance, defined as relative proportion of reads of each allele at each position, has been underutilised in the existing applications.ResultsWe present Read Balance Validator (RBV), a bioinformatic tool which uses read balance for prioritisation and validation of putative CNVs. The software simultaneously interrogates nominated regions for the presence of deletions or multiplications, and can differentiate larger CNVs from diploid regions. Additionally, the utility of RBV to test for inheritance of CNVs is demonstrated in this report.ConclusionsRBV is a CNV validation and prioritisation bioinformatic tool for both genome and exome sequencing available as a python package from https://github.com/whitneywhitford/RBV


Blood ◽  
2007 ◽  
Vol 110 (11) ◽  
pp. 107-107
Author(s):  
Matthew J. Walter ◽  
R. Ries ◽  
X. Li ◽  
W. Shannon ◽  
J. Payton ◽  
...  

Abstract To test if small deletions or amplifications (ie. below the resolution of cytogenetics) exist in bone marrow-derived tumor DNA from acute myeloid leukemia (AML) patients (pts), we used a dense tiling path array comparative genomic hybridization (aCGH) platform consisting of 386,165 unique oligomers spaced evenly at ∼6Kb intervals across the genome. We analyzed 144 adult de novo AML pts; 64 had normal karyotypes, and 80 had 1 or 2 clonal aberrations. Similar numbers of FAB M0/1, M2, M3, and M4 pts were included, and all samples had >30% blasts (median=72%). To generate a cancer-free control set of data, we also analyzed 23 DNA samples from normal individuals matched for age and ethnicity, and with no history of cancer. Both the tumor and cancer-free control DNA samples were co-hybridized with a pool of control DNAs from blood of 4 healthy young males. To define the sensitivity and specificity of the aCGH platform, we examined its ability to detect cytogenetically defined chromosome gains and losses. Of the 33 gains and losses present in >20% of metaphases, 29 (88%) were detected by aCGH. Of the 20 gains and losses present in ≤20% of metaphases, aCGH detected only 5 (25%). Three of 63 (4.8%) balanced translocations [t(15;17), t(8;21), t(9;11)] were detected using aCGH, indicating that breakpoints of some translocations contained small deletions. Further, we identified many previously described germline copy number variants (CNVs) in both the AML pts and cancer-free controls. To improve our ability to define even smaller somatic microdeletions and amplifications, we tested 20 AML pts using CGH arrays containing 1.5 million probes per genome (average probe spacing 1.5 Kb). To preclude detection of germline CNVs, the higher resolution CGH experiments were performed comparing tumor and skin-derived DNA from the same patient. These same sample pairs were also analyzed individually with the Affymetrix 500K SNP arrays. Using stringent criteria to define abnormal segments, we identified 64 altered loci in the 20 AML pts that were not apparent cytogenetically, and that contained ≥1 gene. SNP arrays confirmed aCGH findings in 7/9 loci >100 Kb, and in 1/55 loci <100 Kb in size. In addition, SNP arrays revealed copy number neutral loss of heterozygosity of the 11p arm in 2/20 AML pts, indicating partial uniparental disomy (UPD) involving this region. We also detected somatic deletions in the T cell receptor (TCR) (n=3/20) and immunoglobulin heavy chain (n=1/20) genes, including a homozygous deletion measuring 4.3 Kb in size. The remaining loci identified with the 1.5M oligo aCGH platform were validated using quantitative PCR with matched tumor and germline DNA. Only 5/60 putative calls were validated using this approach, and include a deletion of IGFBP2, and amplifications of CROP, CPEB4, HOMER1, and ZNF148. In summary, 13 loci containing genes have been validated by SNP arrays or qPCR. No recurrent deletions or amplifications were found in the 20 AML pts. Thus, an additional 74 AML pts are being screened for evidence of recurrence at these loci. Our data suggest that an ultra-dense platform may be required to detect the majority of somatic copy number changes in AML genomes, and that UPD is relatively rare in AML pts, occurring in ∼10% of pts, and recurrent only in the 11p region.


2018 ◽  
Author(s):  
Andrew M Gross ◽  
Subramanian S. Ajay ◽  
Vani Rajan ◽  
Carolyn Brown ◽  
Krista Bluske ◽  
...  

AbstractPurposeCurrent diagnostic testing for genetic disorders involves serial use of specialized assays spanning multiple technologies. In principle, whole genome sequencing (WGS) has the potential to detect all genomic mutation types on a single platform and workflow. Here we sought to evaluate copy number variant (CNV) calling as part of a clinically accredited WGS test.MethodsUsing a depth-based copy number caller we performed analytical validation of CNV calling on a reference panel of 17 samples, compared the sensitivity of WGS-based variants to those from a clinical microarray, and set a bound on precision using orthogonal technologies. We developed a protocol for family-based analysis, annotation, filtering, visualization of WGS based CNV calls, and deployed this across a clinical cohort of 79 rare and undiagnosed cases.ResultsWe found that CNV calls from WGS are at least as sensitive as those from microarrays, while only creating a modest increase in the number of variants interpreted (~10 CNVs per case). We identified clinically significant CNVs in 15% of the first 79 cases analyzed. This pipeline also enabled identification of cases of uniparental disomy (UPD) and a 50% mosaic trisomy 14. Directed analysis of some CNVs enabled break-point level resolution of genomic rearrangements and phasing of de-novo CNVs.ConclusionRobust identification of CNVs by WGS is possible within a clinical testing environment, and further developments will bring improvements in resolution of smaller and more complex CNVs.


2020 ◽  
Vol 57 (12) ◽  
pp. 851-857
Author(s):  
Brooke Sadler ◽  
Gabe Haller ◽  
Lilian Antunes ◽  
Momchil Nikolov ◽  
Ina Amarillo ◽  
...  

IntroductionCongenital clubfoot is a common birth defect that affects at least 0.1% of all births. Nearly 25% cases are familial and the remaining are sporadic in inheritance. Copy number variants (CNVs) involving transcriptional regulators of limb development, including PITX1 and TBX4, have previously been shown to cause familial clubfoot, but much of the heritability remains unexplained.MethodsExome sequence data from 816 unrelated clubfoot cases and 2645 in-house controls were analysed using coverage data to identify rare CNVs. The precise size and location of duplications were then determined using high-density Affymetrix Cytoscan chromosomal microarray (CMA). Segregation in families and de novo status were determined using qantitative PCR.ResultsChromosome Xp22.33 duplications involving SHOX were identified in 1.1% of cases (9/816) compared with 0.07% of in-house controls (2/2645) (p=7.98×10−5, OR=14.57) and 0.27% (38/13592) of Atherosclerosis Risk in Communities/the Wellcome Trust Case Control Consortium 2 controls (p=0.001, OR=3.97). CMA validation confirmed an overlapping 180.28 kb duplicated region that included SHOX exons as well as downstream non-coding regions. In four of six sporadic cases where DNA was available for unaffected parents, the duplication was de novo. The probability of four de novo mutations in SHOX by chance in a cohort of 450 sporadic clubfoot cases is 5.4×10–10.ConclusionsMicroduplications of the pseudoautosomal chromosome Xp22.33 region (PAR1) containing SHOX and downstream enhancer elements occur in ~1% of patients with clubfoot. SHOX and regulatory regions have previously been implicated in skeletal dysplasia as well as idiopathic short stature, but have not yet been reported in clubfoot. SHOX duplications likely contribute to clubfoot pathogenesis by altering early limb development.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Whitney Whitford ◽  
Klaus Lehnert ◽  
Russell G. Snell ◽  
Jessie C. Jacobsen

AbstractThe popularisation and decreased cost of genome resequencing has resulted in an increased use in molecular diagnostics. While there are a number of established and high quality bioinfomatic tools for identifying small genetic variants including single nucleotide variants and indels, currently there is no established standard for the detection of copy number variants (CNVs) from sequence data. The requirement for CNV detection from high throughput sequencing has resulted in the development of a large number of software packages. These tools typically utilise the sequence data characteristics: read depth, split reads, read pairs, and assembly-based techniques. However, the additional source of information from read balance (defined as relative proportion of reads of each allele at each position) has been underutilised in the existing applications. Here we present Read Balance Validator (RBV), a bioinformatic tool that uses read balance for prioritisation and validation of putative CNVs. The software simultaneously interrogates nominated regions for the presence of deletions or multiplications, and can differentiate larger CNVs from diploid regions. Additionally, the utility of RBV to test for inheritance of CNVs is demonstrated in this report. RBV is a CNV validation and prioritisation bioinformatic tool for both genome and exome sequencing available as a python package from https://github.com/whitneywhitford/RBV.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Livia O. Loureiro ◽  
Jennifer L. Howe ◽  
Miriam S. Reuter ◽  
Alana Iaboni ◽  
Kristina Calli ◽  
...  

AbstractAutism Spectrum Disorder (ASD) is genetically complex with ~100 copy number variants and genes involved. To try to establish more definitive genotype and phenotype correlations in ASD, we searched genome sequence data, and the literature, for recurrent predicted damaging sequence-level variants affecting single genes. We identified 18 individuals from 16 unrelated families carrying a heterozygous guanine duplication (c.3679dup; p.Ala1227Glyfs*69) occurring within a string of 8 guanines (genomic location [hg38]g.50,721,512dup) affecting SHANK3, a prototypical ASD gene (0.08% of ASD-affected individuals carried the predicted p.Ala1227Glyfs*69 frameshift variant). Most probands carried de novo mutations, but five individuals in three families inherited it through somatic mosaicism. We scrutinized the phenotype of p.Ala1227Glyfs*69 carriers, and while everyone (17/17) formally tested for ASD carried a diagnosis, there was the variable expression of core ASD features both within and between families. Defining such recurrent mutational mechanisms underlying an ASD outcome is important for genetic counseling and early intervention.


2021 ◽  
pp. 1-10
Author(s):  
Sophie E. Legge ◽  
Marcos L. Santoro ◽  
Sathish Periyasamy ◽  
Adeniran Okewole ◽  
Arsalan Arsalan ◽  
...  

Abstract Schizophrenia is a severe psychiatric disorder with high heritability. Consortia efforts and technological advancements have led to a substantial increase in knowledge of the genetic architecture of schizophrenia over the past decade. In this article, we provide an overview of the current understanding of the genetics of schizophrenia, outline remaining challenges, and summarise future directions of research. World-wide collaborations have resulted in genome-wide association studies (GWAS) in over 56 000 schizophrenia cases and 78 000 controls, which identified 176 distinct genetic loci. The latest GWAS from the Psychiatric Genetics Consortium, available as a pre-print, indicates that 270 distinct common genetic loci have now been associated with schizophrenia. Polygenic risk scores can currently explain around 7.7% of the variance in schizophrenia case-control status. Rare variant studies have implicated eight rare copy-number variants, and an increased burden of loss-of-function variants in SETD1A, as increasing the risk of schizophrenia. The latest exome sequencing study, available as a pre-print, implicates a burden of rare coding variants in a further nine genes. Gene-set analyses have demonstrated significant enrichment of both common and rare genetic variants associated with schizophrenia in synaptic pathways. To address current challenges, future genetic studies of schizophrenia need increased sample sizes from more diverse populations. Continued expansion of international collaboration will likely identify new genetic regions, improve fine-mapping to identify causal variants, and increase our understanding of the biology and mechanisms of schizophrenia.


Author(s):  
Guangtu Gao ◽  
Susana Magadan ◽  
Geoffrey C Waldbieser ◽  
Ramey C Youngblood ◽  
Paul A Wheeler ◽  
...  

Abstract Currently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2 N = 64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.


Sign in / Sign up

Export Citation Format

Share Document