Automated prediction of the clinical impact of structural copy number variations

AbstractCopy number variants (CNVs) play an important role in many biological processes, including the development of genetic diseases, making them attractive targets for genetic analyses. The interpretation of the effect of these structural variants is a challenging problem due to highly variable numbers of gene, regulatory, or other genomic elements affected by the CNV. This led to the demand for the interpretation tools that would relieve researchers, laboratory diagnosticians, genetic counselors, and clinical geneticists from the laborious process of annotation and classification of CNVs. We designed and validated a prediction method (ISV; Interpretation of Structural Variants) that is based on boosted trees which takes into account annotations of CNVs from several publicly available databases. The presented approach achieved more than 98% prediction accuracy on both copy number loss and copy number gain variants while also allowing CNVs being assigned “uncertain” significance in predictions. We believe that ISV’s prediction capability and explainability have a great potential to guide users to more precise interpretations and classifications of CNVs.

Download Full-text

Automated prediction of the clinical impact of structural copy number variations

10.1101/2020.07.30.228601 ◽

2020 ◽

Author(s):

Michaela Gaziova ◽

Tomas Sladecek ◽

Ondrej Pos ◽

Martin Stevko ◽

Werner Krampl ◽

...

Keyword(s):

Copy Number ◽

Genetic Diseases ◽

Copy Number Variants ◽

Copy Number Gain ◽

Regulatory Elements ◽

Copy Number Variations ◽

Clinical Impact ◽

Copy Number Loss ◽

Pathogenicity Prediction

Introduction: Copy number variants (CNVs) play an important role in many biological processes, including the development of genetic diseases, making them attractive targets for genetic analyses. The interpretation of the effect of structural variants is a challenging problem due to highly variable numbers of gene, regulatory or other genomic elements affected by the CNV. This led to the demand for the interpretation tools that would relieve researchers, laboratory diagnosticians, genetic counselors, and clinical geneticists from the laborious process of annotation and classification of CNVs. Materials and Methods: We designed a classifier method based on the annotations of CNVs from several publicly available databases. The attributes take into account gene elements, regulatory elements affected by the CNV, as well as other CNVs with known clinical significance that overlap the candidate CNV. We also describe the process of model selection and the construction of training, validation, and test set. Results: The presented approach achieved more than 98% prediction accuracy on both copy number loss and copy number gain variants and can be improved by imposing probability thresholds to eliminate low confidence predictions. Discussion: Method has shown considerable performance in predicting the clinical impact of CNVs and therefore has a great potential to guide users to more precise conclusions. The CNV annotation and pathogenicity prediction can be fully automated, relieving users of tedious interpretation processes. Availability and Implementation: The results can be reproduced by following instructions at {{https://github.com/tsladecek/isv}}.

Download Full-text

Automated classification of copy number variants based on 2019 ACMG standards

Molecular Genetics and Metabolism ◽

10.1016/s1096-7192(21)00531-x ◽

2021 ◽

Vol 132 ◽

pp. S287-S288

Author(s):

Jianling Ji ◽

Ryan Schmidt ◽

Westley Sherman ◽

Ryan Peralta ◽

Megan Roytman ◽

...

Keyword(s):

Copy Number ◽

Copy Number Variants ◽

Automated Classification

Download Full-text

Combination of Genome-Wide Polymorphisms and Copy Number Variations of Pharmacogenes in Koreans

Journal of Personalized Medicine ◽

10.3390/jpm11010033 ◽

2021 ◽

Vol 11 (1) ◽

pp. 33

Author(s):

Nayoung Han ◽

Jung Mi Oh ◽

In-Wha Kim

Keyword(s):

Copy Number ◽

Genome Wide Association Study ◽

Copy Number Gain ◽

Copy Number Variations ◽

Gene Gain ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Haplotype Blocks ◽

Genome Wide ◽

Control And Prevention

For predicting phenotypes and executing precision medicine, combination analysis of single nucleotide variants (SNVs) genotyping with copy number variations (CNVs) is required. The aim of this study was to discover SNVs or common copy CNVs and examine the combined frequencies of SNVs and CNVs in pharmacogenes using the Korean genome and epidemiology study (KoGES), a consortium project. The genotypes (N = 72,299) and CNV data (N = 1000) were provided by the Korean National Institute of Health, Korea Centers for Disease Control and Prevention. The allele frequencies of SNVs, CNVs, and combined SNVs with CNVs were calculated and haplotype analysis was performed. CYP2D6 rs1065852 (c.100C>T, p.P34S) was the most common variant allele (48.23%). A total of 8454 haplotype blocks in 18 pharmacogenes were estimated. DMD ranked the highest in frequency for gene gain (64.52%), while TPMT ranked the highest in frequency for gene loss (51.80%). Copy number gain of CYP4F2 was observed in 22 subjects; 13 of those subjects were carriers with CYP4F2*3 gain. In the case of TPMT, approximately one-half of the participants (N = 308) had loss of the TPMT*1*1 diplotype. The frequencies of SNVs and CNVs in pharmacogenes were determined using the Korean cohort-based genome-wide association study.

Download Full-text

Genotype-Phenotype correlation in Dravet Syndrome with SCN1A mutation increase efficiency of molecular diagnosis

Journal of Epilepsy and Clinical Neurophysiology ◽

10.1590/s1676-26492012000200009 ◽

2012 ◽

Vol 18 (2) ◽

pp. 60-62

Author(s):

MC Gonsales ◽

P Preto ◽

MA Montenegro ◽

MM Guerreiro ◽

I Lopes-Cendes

Keyword(s):

Protein Function ◽

Copy Number ◽

Mutation Screening ◽

Copy Number Variants ◽

Febrile Seizures ◽

Copy Number Variations ◽

Dravet Syndrome ◽

Missense Mutations ◽

The Impact ◽

Doose Syndrome

OBJECTIVES: The purpose of this study was to advance the knowledge on the clinical use of SCN1A testing for severe epilepsies within the spectrum of generalized epilepsy with febrile seizures plus by performing genetic screening in patients with Dravet and Doose syndromes and establishing genotype-phenotype correlations. METHODS: Mutation screening in SCN1A was performed in 15 patients with Dravet syndrome and 13 with Doose syndrome. Eight prediction algorithms were used to analyze the impact of the mutations in putative protein function. Furthermore, all SCN1A mutations previously published were compiled and analyzed. In addition, Multiplex Ligation-Dependent Probe Amplification (MLPA) technique was used to detect possible copy number variations within SCN1A. RESULTS: Twelve mutations were identified in patients with Dravet syndrome, while patients with Doose syndrome showed no mutations. Our results show that the most common type of mutation found is missense, and that they are mostly located in the pore region and the N- and C-terminal of the protein. No copy number variants in SCN1A were identified in our cohort. CONCLUSIONS: SCN1A testing is clinically useful for patients with Dravet syndrome, but not for those with Doose syndrome, since both syndromes do not seem to share the same genetic basis. Our results indicate that indeed missense mutations can cause severe phenotypes depending on its location and the type of amino-acid substitution. Moreover, our strategy for predicting deleterious effect of mutations using multiple computation algorithms was efficient for most of the mutations identified.

Download Full-text

CNV-P: a machine-learning framework for predicting high confident copy number variations

PeerJ ◽

10.7717/peerj.12564 ◽

2021 ◽

Vol 9 ◽

pp. e12564

Author(s):

Taifu Wang ◽

Jinghua Sun ◽

Xiuqing Zhang ◽

Wen-Jing Wang ◽

Qing Zhou

Keyword(s):

Machine Learning ◽

False Positive ◽

Copy Number ◽

Genetic Disorders ◽

Genetic Diseases ◽

Basic Research ◽

Read Depth ◽

Copy Number Variations ◽

Sequencing Data ◽

Learning Framework

Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.

Download Full-text

A Simple, Universal, and Cost-Efficient Digital PCR Method for the Targeted Analysis of Copy Number Variations

Clinical Chemistry ◽

10.1373/clinchem.2019.304246 ◽

2019 ◽

Vol 65 (9) ◽

pp. 1153-1160 ◽

Cited By ~ 5

Author(s):

Kévin Cassinari ◽

Olivier Quenez ◽

Géraldine Joly-Hélas ◽

Ludivine Beaussire ◽

Nathalie Le Meur ◽

...

Keyword(s):

Copy Number ◽

Segregation Analysis ◽

Genetic Diseases ◽

Genomic Medicine ◽

Digital Pcr ◽

Copy Number Variations ◽

Locked Nucleic Acid ◽

Targeted Analysis ◽

Cost Efficient ◽

High Flexibility

Abstract BACKGROUND Rare copy number variations (CNVs) are a major cause of genetic diseases. Simple targeted methods are required for their confirmation and segregation analysis. We developed a simple and universal CNV assay based on digital PCR (dPCR) and universal locked nucleic acid (LNA) hydrolysis probes. METHODS We analyzed the mapping of the 90 LNA hydrolysis probes from the Roche Universal ProbeLibrary (UPL). For each CNV, selection of the optimal primers and LNA probe was almost automated; probes were reused across assays and each dPCR assay included the CNV amplicon and a reference amplicon. We assessed the assay performance on 93 small and large CNVs and performed a comparative cost-efficiency analysis. RESULTS UPL-LNA probes presented nearly 20000000 occurrences on the human genome and were homogeneously distributed with a mean interval of 156 bp. The assay accurately detected all the 93 CNVs, except one (<200 bp), with coefficient of variation <10%. The assay was more cost-efficient than all the other methods. CONCLUSIONS The universal dPCR CNV assay is simple, robust, and cost-efficient because it combines a straightforward design allowed by universal probes and end point PCR, the advantages of a relative quantification of the target to the reference within the same reaction, and the high flexibility of the LNA hydrolysis probes. This method should be a useful tool for genomic medicine, which requires simple methods for the interpretation and segregation analysis of genomic variations.

Download Full-text

Whole exome sequencing is necessary to clarify ID/DD cases with de novo copy number variants of uncertain significance: Two proof-of-concept examples

American Journal of Medical Genetics Part A ◽

10.1002/ajmg.a.37649 ◽

2016 ◽

Vol 170 (7) ◽

pp. 1772-1779 ◽

Cited By ~ 13

Author(s):

Elisa Giorgio ◽

Andrea Ciolfi ◽

Elisa Biamino ◽

Viviana Caputo ◽

Eleonora Di Gregorio ◽

...

Keyword(s):

Exome Sequencing ◽

Whole Exome Sequencing ◽

Copy Number ◽

De Novo ◽

Copy Number Variants ◽

Proof Of Concept ◽

Variants Of Uncertain Significance ◽

Whole Exome ◽

Uncertain Significance

Download Full-text

Molecular karyotyping in routine diagnostics – a view back and forth1)

LaboratoriumsMedizin ◽

10.1515/labmed-2013-0042 ◽

2013 ◽

Vol 36 (5) ◽

Author(s):

Uwe Heinrich ◽

Meike Gabert ◽

Imma Rost

Keyword(s):

Copy Number ◽

Copy Number Variants ◽

Uniparental Disomy ◽

Simultaneous Detection ◽

Molecular Karyotyping ◽

Comparative Genomic ◽

Detection Rates ◽

Routine Diagnostics ◽

Indispensable Tool

AbstractSince its introduction in the routine diagnostics of patients with mental retardation/developmental delay, array-comparative genomic hybridization (aCGH) has become an indispensable tool for the detection of clinically relevant copy number variants (CNVs). Despite the current tendency for higher resolution arrays, the growing number of public internet databases as well as better calling algorithms allow save reporting and a better classification of CNVs. The application of combined aCGH plus single nucleotide polymorphism (SNP) arrays will increase detection rates by revealing copy number neutral changes, such as uniparental disomy. In the future, next generation sequencing techniques will lead to a further increase in resolution with the simultaneous detection of unbalanced and even balanced chromosomal aberrations.

Download Full-text

Molecular profiling of lung adenocarcinoma using pleural effusion specimens.

Journal of Clinical Oncology ◽

10.1200/jco.2020.38.15_suppl.e21709 ◽

2020 ◽

Vol 38 (15_suppl) ◽

pp. e21709-e21709

Author(s):

Wei Zhang ◽

Bei Zhang ◽

Yifan Zhou ◽

Xiaochen Zhao ◽

Yuezong Bai

Keyword(s):

Targeted Therapy ◽

Next Generation Sequencing ◽

Pleural Effusion ◽

Lung Adenocarcinoma ◽

Copy Number ◽

Copy Number Gain ◽

Molecular Profiling ◽

Copy Number Loss ◽

Next Generation ◽

Generation Sequencing

e21709 Background: Molecular profiling of lung adenocarcinoma is essential for therapeutic decision-making and prognosis predicting. Pleural effusion may provide an opportunity for molecular profiling and thereby possibly provide information enabling targeted therapy. In this study, we performed next generation sequencing (NGS) in pleural effusion samples in order to study molecular profiling of lung adenocarcinoma using pleural effusion specimens. Methods: 45 Chinese lung adenocarcinoma patients with pleural effusion specimens were included. The pleural effusion samples were centrifugated, then cell pellets were collected and prepared into cell blocks. Genetic mutations were assessed using a validated targeted next generation sequencing assay. Immunohistochemistry (IHC) of PD-L1 was performed with 22C3 kit. Results: In 45 pleural effusion samples collected, 43 (95.5%) patients had at least one mutation classed as pathogenic or likely pathogenic. There were 245 somatic mutation and 160 germline mutations were detected, with an average of 8.0 mutations per patient. Of the 45 specimens with somatic mutations, seventeen (37.8%) of harbored EGFR mutations. The most frequent mutations were the deletion mutation in exon19 (15/17, 40.9%), the point mutation (L858R) in exon 21 (13/17, 76.5%), and resistance mutation (T790M) in exon 20(4/17,23.5%). Aside from the EGFR mutation, 1 case exhibited KRAS mutation (G12C), 1 case harbored ERBB2 mutation(Y772_A775dup),1 case harbored TP53 mutation, and 2 cases exhibited fusion (EML4-ALK, KIF5B-RET). 2 cases exhibited CD274 copy number gain, 2 cases exhibited CDK4 copy number gains, and one case carried CDK6 copy number gain, one case carried CKD6 copy number loss. The top frequent germline mutation genes were APC (5/45), ALK (4/45), ARID1A (4/45) and BARD1 (4/45). Regarding biomarkers for immunotherapy, three sample showed TMB-H (6.7%), and one sample showed MSI-H (2.2%). Of 29 samples underwent PDL1 IHC test, 21 samples (72.4%) show positive PDL1 expression, in concordance with previous reported rates. Conclusions: These results suggest that pleural effusions are important specimens for oncogene mutation analysis and enable targeted therapy for patients with lung adenocarcinoma.

Download Full-text

Whole Genome Detection of Sequence and Structural Polymorphism in Six Diverse Horses

10.1101/545111 ◽

2019 ◽

Author(s):

Mohammed Ali Al Abri ◽

Heather Marie Holl ◽

Sara E Kalla ◽

Nate Sutter ◽

Samantha Brooks

Keyword(s):

Copy Number ◽

Copy Number Gain ◽

Copy Number Variations ◽

Nucleotide Polymorphisms ◽

Structural Variations ◽

Long Distance ◽

Single Nucleotide ◽

Evolutionary Selection ◽

Climatic Environment ◽

Physiological Adaptations

AbstractThe domesticated horse has played a unique role in human history, serving not just as a source of animal protein, but also as a catalyst for long-distance migration and military conquest. As a result, the horse developed unique physiological adaptations to meet the demands of both their climatic environment and their relationship with man. Completed in 2009, the first domesticated horse reference genome assembly (EquCab 2.0) produced most of the publicly available genetic variations annotations in this species. Yet, there are around 400 geographically and physiologically diverse breeds of horse. To enrich the current collection of genetic variants in the horse, we sequenced whole genomes from six horses of six different breeds: an American Miniature, a Percheron, an Arabian, a Mangalarga Marchador, a Native Mongolian Chakouyi, and a Tennessee Walking Horse. Aside from extreme contrasts in body size, these breeds originate from diverse global locations and each possess unique adaptive physiology. A total of 1.3 billion reads were generated for the six horses with coverage between 15x to 24x per horse. After applying rigorous filtration, we identified and functionally annotated 8,128,658 Single Nucleotide Polymorphisms (SNPs), and 830,370 Insertions/Deletions (INDELs), as well as novel Copy Number Variations (CNVs) and Structural Variations (SVs). Our results revealed putatively functional variants including genes associated with size variation like ANKRD1 and HMGA1 in the very large Percheron and the ZFAT gene in the American Miniature horse. We detected a copy number gain in the Latherin gene that may be the result of evolutionary selection for thermoregulation by sweating, an important component of athleticism and heat tolerance. The newly discovered variants were formatted into user-friendly browser tracks and will provide a foundational database for future studies of the genetic underpinnings of diverse phenotypes within the horse.Author SummaryThe domesticated horse played a unique role in human history, serving not just as a source of dietary animal protein, but also as a catalyst for long-distance migration and military conquest. As a result, the horse developed unique physiological adaptations to meet the demands of both their climatic environment and their relationship with man. Although the completion of the horse reference genome yielded the discovery of many genetic variants, the remarkable diversity across breeds of horse calls for additional effort in quantification of the breadth of genetic polymorphism within this unique species. Here, we present genome re-sequencing and variant detection analysis for six horses belonging to geographically and physiologically diverse breeds. We identified and annotated not just single nucleotide polymorphisms (SNPs), but also large insertions and deletions (INDELs), copy number variations (CNVs) and structural variations (SVs). Our results illustrate novel sources of polymorphism and highlight potentially impactful variations for phenotypes of body size and conformation. We also detected a copy number gain in the Latherin gene that could be the result of an evolutionary selection for thermoregulation through sweating. Our newly discovered variants were formatted into easy-to-use tracks that can be easily accessed by researchers around the globe.

Download Full-text