scholarly journals Genome-Wide Assessment Characteristics of Genes Overlapping Copy Number Variation Regions in Duroc Purebred Population

2021 ◽  
Vol 12 ◽  
Author(s):  
Zhipeng Wang ◽  
Yuanyuan Guo ◽  
Shengwei Liu ◽  
Qingli Meng

Copy number variations (CNVs) are important structural variations that can cause significant phenotypic diversity. Reliable CNVs mapping can be achieved by identification of CNVs from different genetic backgrounds. Investigations on the characteristics of overlapping between CNV regions (CNVRs) and protein-coding genes (CNV genes) or miRNAs (CNV-miRNAs) can reveal the potential mechanisms of their regulation. In this study, we used 50 K SNP arrays to detect CNVs in Duroc purebred pig. A total number of 211 CNVRs were detected with a total length of 118.48 Mb, accounting for 5.23% of the autosomal genome sequence. Of these CNVRs, 32 were gains, 175 losses, and four contained both types (loss and gain within the same region). The CNVRs we detected were non-randomly distributed in the swine genome and were significantly enriched in the segmental duplication and gene density region. Additionally, these CNVRs were overlapping with 1,096 protein-coding genes (CNV-genes), and 39 miRNAs (CNV-miRNAs), respectively. The CNV-genes were enriched in terms of dosage-sensitive gene list. The expression of the CNV genes was significantly higher than that of the non-CNV genes in the adult Duroc prostate. Of all detected CNV genes, 22.99% genes were tissue-specific (TSI > 0.9). Strong negative selection had been underway in the CNV-genes as the ones that were located entirely within the loss CNVRs appeared to be evolving rapidly as determined by the median dN plus dS values. Non-CNV genes tended to be miRNA target than CNV-genes. Furthermore, CNV-miRNAs tended to target more genes compared to non-CNV-miRNAs, and a combination of two CNV-miRNAs preferentially synergistically regulated the same target genes. We also focused our efforts on examining CNV genes and CNV-miRNAs functions, which were also involved in the lipid metabolism, including DGAT1, DGAT2, MOGAT2, miR143, miR335, and miRLET7. Further molecular experiments and independent large studies are needed to confirm our findings.

2020 ◽  
Author(s):  
Zhipeng Wang ◽  
Yuanyuan Guo ◽  
Tao Wang ◽  
Chaoxin Zhang ◽  
Shengwei Liu ◽  
...  

Abstract Background: Copy number variations (CNVs) are important structural variations that can cause significant phenotypic diversity. Reliable CNVs mapping can be achieved by identification of CNVs from different genetic backgrounds. Investigations on the characteristics of overlapping CNV regions (CNVRs) between protein-coding genes (CNV genes) and miRNAs (CNV-miRNAs) can reveal the potential mechanisms of their regulation. Results: In this study, we used 55K SNP arrays to detect CNVs in Duroc purebred pig. A total number of 211 CNVRs were detected with a total length of 118.48 Mb, accounting for 5.23% of the autosomal genome sequence. Of these CNVRs, 32 were gains, 175 losses, and 4 contained both types (loss and gain within the same region). The CNVRs we detected were non-randomly distributed in the swine genome and were significantly enriched in the segmental duplication and gene density region. Additionally, these CNVRs were overlapping with 1096 protein-coding genes, and 39 miRNAs, respectively. The CNV genes were enriched in terms of dosage-sensitive gene list. The expression of the CNV genes was significantly higher than that of the non-CNV genes in the adult Duroc liver and prostate. Of all detected CNV genes, 252 genes, which accounted for 22.99%, were tissue-specific (TSI > 0.9). Strong negative selection had been underway in the CNV genes as the ones that were located entirely within the loss CNVRs appeared to be evolving rapidly as determined by the median dN plus dS values. Non-CNV genes tended to be miRNA target than CNV genes. Furthermore, CNV-miRNAs tended to target more genes compared to non-CNV-miRNAs, and a combination of two CNV-miRNAs preferentially synergistically regulated the same target genes. We also focused our efforts on examining CNV genes and CNV-miRNAs, which were also involved in the lipid metabolism, including DGAT1, DGAT2, MOGAT2, miR143, miR335, and miRLET7. Conclusions: Our analyses of CNV-genes and CNV-miRNAs provide new insights into the characteristics of CNVRs in Duroc purebred population. Further molecular experiments and independent large studies are needed to confirm our findings.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Swati Agarwala ◽  
Avinash M. Veerappa ◽  
Nallur B. Ramachandra

Abstract Background Autism is a neurodevelopmental condition with genetic heterogeneity. It is characterized by difficulties in reciprocal social interactions with strong repetitive behaviors and stereotyped interests. Copy number variations (CNVs) are genomic structural variations altering the genomic structure either by duplication or deletion. De novo or inherited CNVs are found in 5–10% of autistic subjects with a size range of few kilobases to several megabases. CNVs predispose humans to various diseases by altering gene regulation, generation of chimeric genes, and disruption of the coding region or through position effect. Although, CNVs are not the initiating event in pathogenesis; additional preceding mutations might be essential for disease manifestation. The present study is aimed to identify the primary CNVs responsible for autism susceptibility in healthy cohorts to sensitize secondary-hits. In the current investigation, primary-hit autism gene CNVs are characterized in 1715 healthy cohorts of varying ethnicities across 12 populations using Affymetrix high-resolution array study. Thirty-eight individuals from twelve families residing in Karnataka, India, with the age group of 13–73 years are included for the comparative CNV analysis. The findings are validated against global 179 autism whole-exome sequence datasets derived from Simons Simplex Collection. These datasets are deposited at the Simons Foundation Autism Research Initiative (SFARI) database. Results The study revealed that 34.8% of the subjects carried 2% primary-hit CNV burden with 73 singleton-autism genes in different clusters. Of these, three conserved CNV breakpoints were identified with ARHGAP11B, DUSP22, and CHRNA7 as the target genes across 12 populations. Enrichment analysis of the population-specific autism genes revealed two signaling pathways—calcium and mitogen-activated protein kinases (MAPK) in the CNV identified regions. These impaired pathways affected the downstream cascades of neuronal function and physiology, leading to autism behavior. The pathway analysis of enriched genes unravelled complex protein interaction networks, which sensitized secondary sites for autism. Further, the identification of miRNA targets associated with autism gene CNVs added severity to the condition. Conclusion These findings contribute to an atlas of primary-hit genes to detect autism susceptibility in healthy cohorts, indicating their impact on secondary sites for manifestation.


2019 ◽  
Vol 62 (2) ◽  
pp. 571-578 ◽  
Author(s):  
Xiaogang Wang ◽  
Xiukai Cao ◽  
Yifan Wen ◽  
Yilei Ma ◽  
Ibrahim Elsaeid Elnour ◽  
...  

Abstract. Copy number variations (CNVs) are gains and losses of genomic sequence of more than 50 bp between two individuals of a species. Also, CNV is considered to be one of the main elements affecting the phenotypic diversity and evolutionary adaptation of animals. ORMDL sphingolipid biosynthesis regulator 1 (ORMDL1) is a protein-coding gene associated with diseases and development. In our study, the polymorphism of ORMDL1 gene copy numbers in four Chinese sheep breeds (abbreviated CK, HU, STH, and LTH) was detected. In addition, we analyzed the transcriptional expression level of ORMDL1 gene in different tissues of sheep and examined the association of ORMDL1 CNV with growth traits. The statistical analysis revealed that ORMDL1 CNV was remarkably correlated with body height, heart girth, and circumference of cannon bone in HU sheep (P<0.05), and there are significant effects on body weight, body height, body length, chest depth, and height of hip cross in STH sheep (P<0.05). In conclusion, our results provide a basis for the relationship between CNV of ORMDL1 gene and sheep growth traits, suggesting that ORMDL1 CNV may be considered a promising marker for the molecular breeding of Chinese sheep.


2019 ◽  
Vol 32 (19) ◽  
pp. 15281-15299 ◽  
Author(s):  
Md. Rezaul Karim ◽  
Ashiqur Rahman ◽  
João Bosco Jares ◽  
Stefan Decker ◽  
Oya Beyan

AbstractAn accurate diagnosis and prognosis for cancer are specific to patients with particular cancer types and molecular traits, which needs to address carefully. The discovery of important biomarkers is becoming an important step toward understanding the molecular mechanisms of carcinogenesis in which genomics data and clinical outcomes need to be analyzed before making any clinical decision. Copy number variations (CNVs) are found to be associated with the risk of individual cancers and hence can be used to reveal genetic predispositions before cancer develops. In this paper, we collect the CNVs data about 8000 cancer patients covering 14 different cancer types from The Cancer Genome Atlas. Then, two different sparse representations of CNVs based on 578 oncogenes and 20,308 protein-coding genes, including genomic deletions and duplication across the samples, are prepared. Then, we train Conv-LSTM and convolutional autoencoder (CAE) networks using both representations and create snapshot models. While the Conv-LSTM can capture locally and globally important features, CAE can utilize unsupervised pretraining to initialize the weights in the subsequent convolutional layers against the sparsity. Model averaging ensemble (MAE) is then applied to combine the snapshot models in order to make a single prediction. Finally, we identify most significant CNVs biomarkers using guided-gradient class activation map plus (GradCAM++) and rank top genes for different cancer types. Results covering several experiments show fairly high prediction accuracies for the majority of cancer types. In particular, using protein-coding genes, Conv-LSTM and CAE networks can predict cancer types correctly at least 72.96% and 76.77% of the cases, respectively. Contrarily, using oncogenes gives moderately higher accuracies of 74.25% and 78.32%, whereas the snapshot model based on MAE shows overall 2.5% of accuracy improvement.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Joonhyung Jung ◽  
Changkyun Kim ◽  
Joo-Hwan Kim

Abstract Background Commelinaceae (Commelinales) comprise 41 genera and are widely distributed in both the Old and New Worlds, except in Europe. The relationships among genera in this family have been suggested in several morphological and molecular studies. However, it is difficult to explain their relationships due to high morphological variations and low support values. Currently, many researchers have been using complete chloroplast genome data for inferring the evolution of land plants. In this study, we completed 15 new plastid genome sequences of subfamily Commelinoideae using the Mi-seq platform. We utilized genome data to reveal the structural variations and reconstruct the problematic positions of genera for the first time. Results All examined species of Commelinoideae have three pseudogenes (accD, rpoA, and ycf15), and the former two might be a synapomorphy within Commelinales. Only four species in tribe Commelineae presented IR expansion, which affected duplication of the rpl22 gene. We identified inversions that range from approximately 3 to 15 kb in four taxa (Amischotolype, Belosynapsis, Murdannia, and Streptolirion). The phylogenetic analysis using 77 chloroplast protein-coding genes with maximum parsimony, maximum likelihood, and Bayesian inference suggests that Palisota is most closely related to tribe Commelineae, supported by high support values. This result differs significantly from the current classification of Commelinaceae. Also, we resolved the unclear position of Streptoliriinae and the monophyly of Dichorisandrinae. Among the ten CDS (ndhH, rpoC2, ndhA, rps3, ndhG, ndhD, ccsA, ndhF, matK, and ycf1), which have high nucleotide diversity values (Pi > 0.045) and over 500 bp length, four CDS (ndhH, rpoC2, matK, and ycf1) show that they are congruent with the topology derived from 77 chloroplast protein-coding genes. Conclusions In this study, we provide detailed information on the 15 complete plastid genomes of Commelinoideae taxa. We identified characteristic pseudogenes and nucleotide diversity, which can be used to infer the family evolutionary history. Also, further research is needed to revise the position of Palisota in the current classification of Commelinaceae.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Weihua Pan ◽  
Desheng Gong ◽  
Da Sun ◽  
Haohui Luo

AbstractDue to the high complexity of cancer genome, it is too difficult to generate complete cancer genome map which contains the sequence of every DNA molecule until now. Nevertheless, phasing each chromosome in cancer genome into two haplotypes according to germline mutations provides a suboptimal solution to understand cancer genome. However, phasing cancer genome is also a challenging problem, due to the limit in experimental and computational technologies. Hi-C data is widely used in phasing in recent years due to its long-range linkage information and provides an opportunity for solving the problem of phasing cancer genome. The existing Hi-C based phasing methods can not be applied to cancer genome directly, because the somatic mutations in cancer genome such as somatic SNPs, copy number variations and structural variations greatly reduce the correctness and completeness. Here, we propose a new Hi-C based pipeline for phasing cancer genome called HiCancer. HiCancer solves different kinds of somatic mutations and variations, and take advantage of allelic copy number imbalance and linkage disequilibrium to improve the correctness and completeness of phasing. According to our experiments in K562 and KBM-7 cell lines, HiCancer is able to generate very high-quality chromosome-level haplotypes for cancer genome with only Hi-C data.


Author(s):  
Wenhui Li ◽  
Wanjun Lei ◽  
Xiaopei Chao ◽  
Xiaochen Song ◽  
Yalan Bi ◽  
...  

AbstractThe association between human papillomavirus (HPV) integration and relevant genomic changes in uterine cervical adenocarcinoma is poorly understood. This study is to depict the genomic mutational landscape in a cohort of 20 patients. HPV+ and HPV− groups were defined as patients with and without HPV integration in the host genome. The genetic changes between these two groups were described and compared by whole-genome sequencing (WGS) and whole-exome sequencing (WES). WGS identified 2916 copy number variations and 743 structural variations. WES identified 6113 somatic mutations, with a mutational burden of 2.4 mutations/Mb. Six genes were predicted as driver genes: PIK3CA, KRAS, TRAPPC12, NDN, GOLGA6L4 and BAIAP3. PIK3CA, NDN, GOLGA6L4, and BAIAP3 were recognized as significantly mutated genes (SMGs). HPV was detected in 95% (19/20) of patients with cervical adenocarcinoma, 7 of whom (36.8%) had HPV integration (HPV+ group). In total, 1036 genes with somatic mutations were confirmed in the HPV+ group, while 289 genes with somatic mutations were confirmed in the group without HPV integration (HPV− group); only 2.1% were shared between the two groups. In the HPV+ group, GOLGA6L4 and BAIAP3 were confirmed as SMGs, while PIK3CA, NDN, KRAS, FUT1, and GOLGA6L64 were identified in the HPV− group. ZDHHC3, PKD1P1, and TGIF2 showed copy number amplifications after HPV integration. In addition, the HPV+ group had significantly more neoantigens. HPV integration rather than HPV infection results in different genomic changes in cervical adenocarcinoma.


2010 ◽  
Vol 13 (5) ◽  
pp. 455-460 ◽  
Author(s):  
Shinji Ono ◽  
Akira Imamura ◽  
Shinya Tasaki ◽  
Naohiro Kurotaki ◽  
Hiroki Ozawa ◽  
...  

Copy number variations (CNVs) are common structural variations in the human genome that strongly affect genomic diversity and can play a role in the development of several diseases, including neurodevelopmental disorders. Recent reports indicate that monozygotic twins can show differential CNV profiles. We searched CNVs in monozygotic twins discordant for schizophrenia to identify susceptible loci for schizophrenia. Three pairs of monozygotic twins discordant for schizophrenia were subjected to analysis. Genomic DNA samples were extracted from peripheral blood lymphocytes. We adopted the Affymetrix Genome-Wide Human SNP (Single Nucleotide Polymorphism) Array 6.0 to detect copy number discordance using Partek Genomics Suite 6.5 beta. In three twin pairs, however, validations by quantitative PCR and DNA sequencing revealed that none of the regions had any discordance between the three twin pairs. Our results support the hypothesis that epigenetic changes or fluctuation in developmental process triggered by environmental factors mainly contribute to the pathogenesis of schizophrenia. Schizophrenia caused by strong genetics factors including copy number alteration or gene mutation may be a small subset of the clinical population.


2013 ◽  
Vol 54 ◽  
pp. 1-16 ◽  
Author(s):  
Michael B. Clark ◽  
Anupma Choudhary ◽  
Martin A. Smith ◽  
Ryan J. Taft ◽  
John S. Mattick

The ability to sequence genomes and characterize their products has begun to reveal the central role for regulatory RNAs in biology, especially in complex organisms. It is now evident that the human genome contains not only protein-coding genes, but also tens of thousands of non–protein coding genes that express small and long ncRNAs (non-coding RNAs). Rapid progress in characterizing these ncRNAs has identified a diverse range of subclasses, which vary widely in size, sequence and mechanism-of-action, but share a common functional theme of regulating gene expression. ncRNAs play a crucial role in many cellular pathways, including the differentiation and development of cells and organs and, when mis-regulated, in a number of diseases. Increasing evidence suggests that these RNAs are a major area of evolutionary innovation and play an important role in determining phenotypic diversity in animals.


Blood ◽  
2013 ◽  
Vol 122 (21) ◽  
pp. 233-233
Author(s):  
Cai Chen ◽  
Christoph Bartenhagen ◽  
Michael Gombert ◽  
Vera Okpanyi ◽  
Vera Binder ◽  
...  

Abstract High hyperdiploid acute lymphoblastic leukemia (HeH-ALL) is characterized by 51-67 chromosomes and nonrandom gains of specific chromosomes (X, 4, 6, 10, 14, 17, 18, and 21). It presents the most frequent numerical cytogenetic alteration in childhood pre B-cell ALL occurring in 25-30% of cases. Recurrent disease will affect 15-20%. Pre-leukemic HeH clones are generated in utero, but cooperating oncogenic lesions are necessary for overt leukemia and remain to be determined. Recently, a phenomenon termed chromothripsis has been described in which massive structural variations occur in a single aberrant mitosis. Whole or partial chromosomes are shattered and some fragments are lost in the process of rejoining. Thus, characteristically, chromosomal copy numbers oscillate between two copy number states. Chromothripsis has been suggested to be a tumor-driving alteration that may be present in 2-3% of all human cancers. Its role as a potential cooperating or initiating lesion in HeH-ALL has not been determined. We applied state-of-the-art whole-genome next-generation-sequencing to analyze structural variations in six pediatric patients with recurrent HeH-ALL. Matched sample sets taken at diagnosis, remission and/or relapse were compared. Paired end sequencing was carried out on a Genome Analyzer IIx or a HiSeq 2000 (Illumina), respectively. Reads were aligned against the human reference genome (GRCh37) using BWA. Translocations were detected by GASV. Copy number variations were analyzed by FREEC. Structural variations were validated by PCR/Sanger sequencing and FISH. Of the six patients analyzed, five harbored on average one interchromosomal translocation or intrachromosomal inversion, but one patient presented with massive genomic rearrangements (Figure). These affected chromosome 3, 11, 12 and 20. Ten copy number shifts on chromosome 3 oscillating between two copy number states (2 and 3) indicated that these rearrangements were caused by chromothripsis. Breakpoint sequencing revealed that one of the identified translocations (t(12;20)(p13.1;p12.3)), was indeed a three-loci-rearrangement composed of small fragments derived from chromosomes 3, 12 and 20. Characteristically for chromothripsis, the breakpoints clustered closely. Three breakpoints separated by 224 bp and 64 kb were located in the transducin (beta)-like X-linked receptor 1 (TBL1XR1) gene. Other genes repeatedly targeted included the MACRO domain-containing protein 2 (MACROD2) gene (a deacetylase involved in deacetylation of lysine residues in histones and other proteins), the KIAA1467 gene (a transmembrane protein of the integrin alpha FG-GAP repeat containing 3 (ITFG3) family), and a novel regulatory lincRNA (ENSG00000243276). MACROD2 was previously observed as a target of chromothripsis in a colorectal carcinoma. Thus, the characterized breakpoints may identify fragile genomic sites prone to chromothriptic rearrangement. DNA repair was effectuated by non-homologous-end-joining as typical addition of non-template nucleotides with microhomologies of two to four nucleotides at the breakpoints demonstrated. Copy number profiles of this patient showed that at least two distinct leukemic clones could be identified at diagnosis. One had acquired chromothriptic alterations and presented the dominant clone at relapse indicating chemotherapy resistance and tumor-driving potential. Prior whole-exome sequencing did not reveal mutations in known oncogenes or tumor suppressor genes. Therefore, loss of function or expression of genes affected by chromosomal rearrangements, such as TBL1XR1 that is recurrently mutated in childhood ALL with ETV6-RUNX1 translocation, may account for the tumor-driving effect. All leukemic cells at diagnosis showed conformity concerning number and pattern of whole chromosome gains demonstrating that chromothripsis was not an initiating oncogenic event, but occurred secondary to high hyperdiploidy. Further aberrations (t(4;7), loss of 4q) were gained by the chromothriptic clone and could be detected by FISH in minor subclones pointing at ongoing clonal evolution. Taken together, our study reveals chromothripsis as a novel assisting and tumor-driving lesion in HeH ALL. Chromothripsis in HeH-ALL. Copy number variations and translocations at diagnosis (left) and relapse (right). (magenta: chromothriptic translocations; green: other translocations) Disclosures: No relevant conflicts of interest to declare.


Sign in / Sign up

Export Citation Format

Share Document