Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion

Abstract Background: Genomic instability plays a large role in the process of cancer. Tumor mutational burden (TMB) is closely related to immunotherapy outcome and is an important manifestation of genomic instability. However, the cost of TMB detection is extremely high, which limits the use of TMB in clinical practice. Another new indicator of genome instability, CNVA (the average copy number variation) which calculates the changes of 0.5 Mb chromosomal fragments, requires extremely low sequencing depth, and is expected to replace TMB as a new marker of immune efficacy.Methods: A total of 50 samples (23 of which came from patients who received immunotherapy) were subjected to low-depth (10X) chromosome sequencing on the MGI platform. CNVA was calculated by the formula avg (abs (copy number-2)). Then, we analyzed the relationship between CNVA and immune infiltration or immunotherapy efficacy. In addition, through the analysis of whole genome sequencing data of 509 lung adenocarcinoma in the TCGA database, we compared CNVA with classic marker TMB to evaluate the value of CNVA as an immune evaluation index.Results: Compared with the low CNVA group, the high CNVA group had higher expression of PD-L1, CD39 and CD19, and more infiltration of CD8 + T cells and CD3 + T cells. Among the 23 patients treated with immunotherapy, the average CNVA value of the SD (stable disease)/PR (partial response) group was higher than that of the PD (progressive disease) group (P <0.05). The data of whole genome sequencing data of 509 lung adenocarcinomas from TCGA and real-time quantitative PCR results of 22 frozen specimens found that CNVA was more correlated with CD8 and PD-L1 than TMB. In addition, CNVA showed a specific positive correlation with TMB (r = 0.2728, p < 0.0001).Conclusion: CNVA can be a good indicator of immune infiltration and predicting immunotherapy efficacy. With its low cost and potential clinical application for testing, it is expected to become a substitute for TMB.

Download Full-text

Investigation of copy number variation in subjects with major depression based on whole-genome sequencing data

Journal of Affective Disorders ◽

10.1016/j.jad.2017.05.044 ◽

2017 ◽

Vol 220 ◽

pp. 38-42 ◽

Cited By ~ 6

Author(s):

Chenglong Yu ◽

Bernhard T. Baune ◽

Ma-Li Wong ◽

Julio Licinio

Keyword(s):

Major Depression ◽

Copy Number Variation ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Number Variation

Download Full-text

Effective normalization for copy number variation detection from whole genome sequencing

BMC Genomics ◽

10.1186/1471-2164-13-s6-s16 ◽

2012 ◽

Vol 13 (Suppl 6) ◽

pp. S16 ◽

Cited By ~ 10

Author(s):

Angel Janevski ◽

Vinay Varadan ◽

Sitharthan Kamalakaran ◽

Nilanjana Banerjee ◽

Nevenka Dimitrova

Keyword(s):

Copy Number Variation ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Whole Genome ◽

Number Variation ◽

Copy Number Variation Detection

Download Full-text

Detection of Autosomal Hemizygous Regions in the Fleckvieh Population Based on SNP-chip Data and Parent Offspring Pairs

Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis ◽

10.11118/actaun201967061447 ◽

2019 ◽

Vol 67 (6) ◽

pp. 1447-1452

Author(s):

Judith Himmelbauer ◽

Gábor Mészáros ◽

Johann Sölkner

Keyword(s):

Dna Sequence ◽

Genome Sequencing ◽

Copy Number ◽

Population Based ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Chip Data ◽

Snp Chip ◽

Number Variation

A Copy Number Variation (CNV) is a loss or a gain in the DNA sequence, ranging from 50 basepairs to a few megabasepairs. Most studies use whole genome sequencing data to detect deletions. Due to the fact that SNP-chip data is more commonly used in livestock, especially in cattle, the detection of deletions based on SNP-chip data is of interest. In the present study an approach based on SNP chip data and the analysis of Mendelian mismatches in parent-offspring-pairs was developed. Use was made of the fact that deletions appear as homozygous after SNP Chip genotyping. For some SNPs with high number of mismatches, the inheritance of the mismatches could be traced back to one or a few bulls and thereby regions of possible deletions were defined. The study has shown that an approach based on Mendelian mismatches and SNP-chip data is a promising way of detecting deletions.

Download Full-text

Identification of Copy Number Variation in Domestic Chicken Using Whole-Genome Sequencing Reveals Evidence of Selection in the Genome

Animals ◽

10.3390/ani9100809 ◽

2019 ◽

Vol 9 (10) ◽

pp. 809

Author(s):

Donghyeok Seol ◽

Byung June Ko ◽

Bongsang Kim ◽

Han-Ha Chai ◽

Dajeong Lim ◽

...

Keyword(s):

Copy Number Variation ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Breeding Systems ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Genetic Characteristics ◽

Red Jungle Fowl ◽

Number Variation

Copy number variation (CNV) has great significance both functionally and evolutionally. Various CNV studies are in progress to find the cause of human disease and to understand the population structure of livestock. Recent advances in next-generation sequencing (NGS) technology have made CNV detection more reliable and accurate at whole-genome level. However, there is a lack of CNV studies on chickens using NGS. Therefore, we obtained whole-genome sequencing data of 65 chickens including Red Jungle Fowl, Cornish (broiler), Rhode Island Red (hybrid), and White Leghorn (layer) from the public databases for CNV region (CNVR) detection. Using CNVnator, a read-depth based software, a total of 663 domesticated-specific CNVRs were identified across autosomes. Gene ontology analysis of genes annotated in CNVRs showed that mainly enriched terms involved in organ development, metabolism, and immune regulation. Population analysis revealed that CN and RIR are closer to each other than WL, and many genes (LOC772271, OR52R1, RD3, ADH6, TLR2B, PRSS2, TPK1, POPDC3, etc.) with different copy numbers between breeds found. In conclusion, this study has helped to understand the genetic characteristics of domestic chickens at CNV level, which may provide useful information for the development of breeding systems in chickens.

Download Full-text

Copy Number Variation Identification on 3,800 Alzheimer’s Disease Whole Genome Sequencing Data from the Alzheimer’s Disease Sequencing Project

Frontiers in Genetics ◽

10.3389/fgene.2021.752390 ◽

2021 ◽

Vol 12 ◽

Author(s):

Wan-Ping Lee ◽

Albert A. Tucci ◽

Mitchell Conery ◽

Yuk Yee Leung ◽

Amanda B. Kuzma ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Copy Number Variation ◽

Copy Number ◽

Whole Genome Sequence ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Project ◽

Number Variation

Alzheimer’s Disease (AD) is a progressive neurologic disease and the most common form of dementia. While the causes of AD are not completely understood, genetics plays a key role in the etiology of AD, and thus finding genetic factors holds the potential to uncover novel AD mechanisms. For this study, we focus on copy number variation (CNV) detection and burden analysis. Leveraging whole-genome sequence (WGS) data released by Alzheimer’s Disease Sequencing Project (ADSP), we developed a scalable bioinformatics pipeline to identify CNVs. This pipeline was applied to 1,737 AD cases and 2,063 cognitively normal controls. As a result, we observed 237,306 and 42,767 deletions and duplications, respectively, with an average of 2,255 deletions and 1,820 duplications per subject. The burden tests show that Non-Hispanic-White cases on average have 16 more duplications than controls do (p-value 2e-6), and Hispanic cases have larger deletions than controls do (p-value 6.8e-5).

Download Full-text

Copy number variation detection in Chinese indigenous cattle by whole genome sequencing

Genomics ◽

10.1016/j.ygeno.2019.05.023 ◽

2020 ◽

Vol 112 (1) ◽

pp. 831-836 ◽

Cited By ~ 2

Author(s):

Chugang Mei ◽

Zainaguli Junjvlieke ◽

Sayed Haidar Abbas Raza ◽

Hongbao Wang ◽

Gong Cheng ◽

...

Keyword(s):

Copy Number Variation ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Whole Genome ◽

Indigenous Cattle ◽

Number Variation ◽

Copy Number Variation Detection

Download Full-text

PS1146 LOW-COVERAGE WHOLE GENOME SEQUENCING OUTPERFORMS FISH IN COPY NUMBER VARIATION DETECTION IN CHRONIC LYMPHOCYTIC LEUKEMIA

HemaSphere ◽

10.1097/01.hs9.0000562868.12249.a2 ◽

2019 ◽

Vol 3 (S1) ◽

pp. 519

Author(s):

B. Ariceta ◽

A. Aguilera-Díaz ◽

I. Vázquez ◽

M.J. Larrayoz ◽

A. Mañú ◽

...

Keyword(s):

Chronic Lymphocytic Leukemia ◽

Copy Number Variation ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Lymphocytic Leukemia ◽

Whole Genome ◽

Number Variation ◽

Low Coverage ◽

Copy Number Variation Detection

Download Full-text

Insights into dispersed duplications and complex structural mutations from whole genome sequencing 706 families

10.1101/2020.08.03.235358 ◽

2020 ◽

Author(s):

Christopher W. Whelan ◽

Robert E. Handsaker ◽

Giulio Genovese ◽

Seva Kashin ◽

Monkol Lek ◽

...

Keyword(s):

Gene Expression ◽

Copy Number Variation ◽

Copy Number ◽

De Novo ◽

Whole Genome ◽

Sequencing Data ◽

Number Variation ◽

Structural Mutations ◽

Or Gene ◽

Genomic Locations

AbstractTwo intriguing forms of genome structural variation (SV) – dispersed duplications, and de novo rearrangements of complex, multi-allelic loci – have long escaped genomic analysis. We describe a new way to find and characterize such variation by utilizing identity-by-descent (IBD) relationships between siblings together with high-precision measurements of segmental copy number. Analyzing whole-genome sequence data from 706 families, we find hundreds of “IBD-discordant” (IBDD) CNVs: loci at which siblings’ CNV measurements and IBD states are mathematically inconsistent. We found that commonly-IBDD CNVs identify dispersed duplications; we mapped 95 of these common dispersed duplications to their true genomic locations through family-based linkage and population linkage disequilibrium (LD), and found several to be in strong LD with genome-wide association (GWAS) signals for common diseases or gene expression variation at their revealed genomic locations. Other CNVs that were IBDD in a single family appear to involve de novo mutations in complex and multi-allelic loci; we identified 26 de novo structural mutations that had not been previously detected in earlier analyses of the same families by diverse SV analysis methods. These included a de novo mutation of the amylase gene locus and multiple de novo mutations at chromosome 15q14. Combining these complex mutations with more-conventional CNVs, we estimate that segmental mutations larger than 1kb arise in about one per 22 human meioses. These methods are complementary to previous techniques in that they interrogate genomic regions that are home to segmental duplication, high CNV allele frequencies, and multi-allelic CNVs.Author SummaryCopy number variation is an important form of genetic variation in which individuals differ in the number of copies of segments of their genomes. Certain aspects of copy number variation have traditionally been difficult to study using short-read sequencing data. For example, standard analyses often cannot tell whether the duplicated copies of a segment are located near the original copy or are dispersed to other regions of the genome. Another aspect of copy number variation that has been difficult to study is the detection of mutations in the copy number of DNA segments passed down from parents to their children, particularly when the mutations affect genome segments which already display common copy number variation in the population. We develop an analytical approach to solving these problems when sequencing data is available for all members of families with at least two children. This method is based on determining the number of parental haplotypes the two siblings share at each location in their genome, and using that information to determine the possible inheritance patterns that might explain the copy numbers we observe in each family member. We show that dispersed duplications and mutations can be identified by looking for copy number variants that do not follow these expected inheritance patterns. We use this approach to determine the location of 95 common duplications which are dispersed to distant regions of the genome, and demonstrate that these duplications are linked to genetic variants that affect disease risk or gene expression levels. We also identify a set of copy number mutations not detected by previous analyses of sequencing data from a large cohort of families, and show that repetitive and complex regions of the genome undergo frequent mutations in copy number.

Download Full-text