scholarly journals Probabilistic method corrects previously uncharacterized Hi-C artifact

2020 ◽  
Author(s):  
Yihang Shen ◽  
Carl Kingsford

AbstractThree-dimensional chromosomal structure plays an important role in gene regulation. Chromosome conformation capture techniques, especially the high-throughput, sequencing-based technique Hi-C, provide new insights on spatial architectures of chromosomes. However, Hi-C data contains artifacts and systemic biases that substantially influence subsequent analysis. Computational models have been developed to address these biases explicitly, however, it is difficult to enumerate and eliminate all the biases in models. Other models are designed to correct biases implicitly, but they will also be invalid in some situations such as copy number variations. We characterize a new kind of artifact in Hi-C data. We find that this artifact is caused by incorrect alignment of Hi-C reads against approximate repeat regions and can lead to erroneous chromatin contact signals. The artifact cannot be corrected by current Hi-C correction methods. We design a probabilistic method and develop a new Hi-C processing pipeline by integrating our probabilistic method with the HiC-Pro pipeline. We find that the new pipeline can remove this new artifact effectively, while preserving important features of the original Hi-C matrices.

2019 ◽  
Author(s):  
Su Wang ◽  
Soohyun Lee ◽  
Chong Chu ◽  
Dhawal Jain ◽  
Geoff Nelson ◽  
...  

AbstractThe three-dimensional conformation of a genome can be profiled using Hi-C, a technique that combines chromatin conformation capture with high-throughput sequencing. However, structural variations (SV) often yield features that can be mistaken for chromosomal interactions. Here, we describe a computational method HiNT (Hi-C for copy Number variation and Translocation detection), which detects copy number variations and inter-chromosomal translocations within Hi-C data with breakpoints at single base-pair resolution. We demonstrate that HiNT outperforms existing methods on both simulated and real data. We also show that Hi-C can supplement whole-genome sequencing in SV detection by locating breakpoints in repetitive regions.


2017 ◽  
Author(s):  
N. Servant ◽  
N. Varoquaux ◽  
E. Heard ◽  
JP. Vert ◽  
E. Barillot

AbstractNormalization is essential to ensure accurate analysis and proper interpretation of sequencing data. Chromosome conformation data, such as Hi-C, is not different. The most widely used type of normalization of Hi-C data casts estimations of unwanted effects as a matrix balancing problem, relying on the assumption that all genomic regions interact as much as any other. Here, we show that these approaches, while very effective on fully haploid or diploid genome, fail to correct for unwanted effects in the presence of copy number variations. We propose a simple extension to matrix balancing methods that properly models the copy-number variation effects. Our approach can either retain the copy-number variation effects or remove it. We show that this leads to better downstream analysis of the three-dimensional organization of rearranged genome.


2020 ◽  
Vol 160 (11-12) ◽  
pp. 634-642
Author(s):  
Shiqiang Luo ◽  
Xingyuan Chen ◽  
Tizhen Yan ◽  
Jiaolian Ya ◽  
Zehui Xu ◽  
...  

High-throughput sequencing based on copy number variation (CNV-seq) is commonly used to detect chromosomal abnormalities. This study identifies chromosomal abnormalities in aborted embryos/fetuses in early and middle pregnancy and explores the application value of CNV-seq in determining the causes of pregnancy termination. High-throughput sequencing was used to detect chromosome copy number variations (CNVs) in 116 aborted embryos in early and middle pregnancy. The detection data were compared with the Database of Genomic Variants (DGV), the Database of Chromosomal Imbalance and Phenotype in Humans using Ensemble Resources (DECIPHER), and the Online Mendelian Inheritance in Man (OMIM) database to determine the CNV type and the clinical significance. High-throughput sequencing results were successfully obtained in 109 out of 116 specimens, with a detection success rate of 93.97%. In brief, there were 64 cases with abnormal chromosome numbers and 23 cases with CNVs, in which 10 were pathogenic mutations and 13 were variants of uncertain significance. An abnormal chromosome number is the most important reason for embryo termination in early and middle pregnancy, followed by pathogenic chromosome CNVs. CNV-seq can quickly and accurately detect chromosome abnormalities and identify microdeletion and microduplication CNVs that cannot be detected by conventional chromosome analysis, which is convenient and efficient for genetic etiology diagnosis in miscarriage.


Author(s):  
Yunpeng Sui ◽  
Shuanghong Peng

In recent years, more and more evidence has emerged showing that changes in copy number variations (CNVs) correlated with the transcriptional level can be found during evolution, embryonic development, and oncogenesis. However, the underlying mechanisms remain largely unknown. The success of the induced pluripotent stem cell suggests that genome changes could bring about transformations in protein expression and cell status; conversely, genome alterations generated during embryonic development and senescence might also be the result of genome changes. With rapid developments in science and technology, evidence of changes in the genome affected by transcriptional level has gradually been revealed, and a rational and concrete explanation is needed. Given the preference of the HIV-1 genome to insert into transposons of genes with high transcriptional levels, we propose a mechanism based on retrotransposons facilitated by specific pre-mRNA splicing style and homologous recombination (HR) to explain changes in CNVs in the genome. This mechanism is similar to that of the group II intron that originated much earlier. Under this proposed mechanism, CNVs on genome are dynamically and spontaneously extended in a manner that is positively correlated with transcriptional level or contract as the cell divides during evolution, embryonic development, senescence, and oncogenesis, propelling alterations in them. Besides, this mechanism explains several critical puzzles in these processes. From evidence collected to date, it can be deduced that the message contained in genome is not just three-dimensional but will become four-dimensional, carrying more genetic information.


2018 ◽  
Author(s):  
Whitney Whitford ◽  
Klaus Lehnert ◽  
Russell G. Snell ◽  
Jessie C. Jacobsen

AbstractBackgroundThe popularisation and decreased cost of genome resequencing has resulted in an increased use in molecular diagnostics. While there are a number of established and high quality bioinfomatic tools for identifying small genetic variants including single nucleotide variants and indels, currently there is no established standard for the detection of copy number variants (CNVs) from sequence data. The requirement for CNV detection from high throughput sequencing has resulted in the development of a large number of software packages. These tools typically utilise the sequence data characteristics: read depth, split reads, read pairs, and assembly-based techniques. However the additional source of information from read balance, defined as relative proportion of reads of each allele at each position, has been underutilised in the existing applications.ResultsWe present Read Balance Validator (RBV), a bioinformatic tool which uses read balance for prioritisation and validation of putative CNVs. The software simultaneously interrogates nominated regions for the presence of deletions or multiplications, and can differentiate larger CNVs from diploid regions. Additionally, the utility of RBV to test for inheritance of CNVs is demonstrated in this report.ConclusionsRBV is a CNV validation and prioritisation bioinformatic tool for both genome and exome sequencing available as a python package from https://github.com/whitneywhitford/RBV


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1229
Author(s):  
David Salgado ◽  
Irina M. Armean ◽  
Michael Baudis ◽  
Sergi Beltran ◽  
Salvador Capella-Gutierrez ◽  
...  

Copy number variations (CNVs) are major causative contributors both in the genesis of genetic diseases and human neoplasias. While “High-Throughput” sequencing technologies are increasingly becoming the primary choice for genomic screening analysis, their ability to efficiently detect CNVs is still heterogeneous and remains to be developed. The aim of this white paper is to provide a guiding framework for the future contributions of ELIXIR’s recently established human CNV Community, with implications beyond human disease diagnostics and population genomics. This white paper is the direct result of a strategy meeting that took place in September 2018 in Hinxton (UK) and involved representatives of 11 ELIXIR Nodes. The meeting led to the definition of priority objectives and tasks, to address a wide range of CNV-related challenges ranging from detection and interpretation to sharing and training. Here, we provide suggestions on how to align these tasks within the ELIXIR Platforms strategy, and on how to frame the activities of this new ELIXIR Community in the international context.


2021 ◽  
Vol 17 (2) ◽  
pp. e1009207
Author(s):  
Sage Z. Davis ◽  
Thomas Hollin ◽  
Todd Lenz ◽  
Karine G. Le Roch

The recent Coronavirus Disease 2019 pandemic has once again reminded us the importance of understanding infectious diseases. One important but understudied area in infectious disease research is the role of nuclear architecture or the physical arrangement of the genome in the nucleus in controlling gene regulation and pathogenicity. Recent advances in research methods, such as Genome-wide chromosome conformation capture using high-throughput sequencing (Hi-C), have allowed for easier analysis of nuclear architecture and chromosomal reorganization in both the infectious disease agents themselves as well as in their host cells. This review will discuss broadly on what is known about nuclear architecture in infectious disease, with an emphasis on chromosomal reorganization, and briefly discuss what steps are required next in the field.


Author(s):  
Suresh Kumar ◽  
Simardeep Kaur ◽  
Karishma Seem ◽  
Santosh Kumar ◽  
Trilochan Mohapatra

The genome of a eukaryotic organism is comprised of a supra-molecular complex of chromatin fibers and intricately folded three-dimensional (3D) structures. Chromosomal interactions and topological changes in response to the developmental and/or environmental stimuli affect gene expression. Chromatin architecture plays important roles in DNA replication, gene expression, and genome integrity. Higher-order chromatin organizations like chromosome territories (CTs), A/B compartments, topologically associating domains (TADs), and chromatin loops vary among cells, tissues, and species depending on the developmental stage and/or environmental conditions (4D genomics). Every chromosome occupies a separate territory in the interphase nucleus and forms the top layer of hierarchical structure (CTs) in most of the eukaryotes. While the A and B compartments are associated with active (euchromatic) and inactive (heterochromatic) chromatin, respectively, having well-defined genomic/epigenomic features, TADs are the structural units of chromatin. Chromatin architecture like TADs as well as the local interactions between promoter and regulatory elements correlates with the chromatin activity, which alters during environmental stresses due to relocalization of the architectural proteins. Moreover, chromatin looping brings the gene and regulatory elements in close proximity for interactions. The intricate relationship between nucleotide sequence and chromatin architecture requires a more comprehensive understanding to unravel the genome organization and genetic plasticity. During the last decade, advances in chromatin conformation capture techniques for unravelling 3D genome organizations have improved our understanding of genome biology. However, the recent advances, such as Hi-C and ChIA-PET, have substantially increased the resolution, throughput as well our interest in analysing genome organizations. The present review provides an overview of the historical and contemporary perspectives of chromosome conformation capture technologies, their applications in functional genomics, and the constraints in predicting 3D genome organization. We also discuss the future perspectives of understanding high-order chromatin organizations in deciphering transcriptional regulation of gene expression under environmental stress (4D genomics). These might help design the climate-smart crop to meet the ever-growing demands of food, feed, and fodder.


2017 ◽  
Author(s):  
Enrique Vidal ◽  
François le Dily ◽  
Javier Quilez ◽  
Ralph Stadhouders ◽  
Yasmina Cuartero ◽  
...  

AbstractThe three-dimensional conformation of genomes is an essential component of their biological activity. The advent of the Hi-C technology enabled an unprecedented progress in our understanding of genome structures. However, Hi-C is subject to systematic biases that can compromise downstream analyses. Several strategies have been proposed to remove those biases, but the issue of abnormal karyotypes received little attention. Many experiments are performed in cancer cell lines, which typically harbor large-scale copy number variations that create visible defects on the raw Hi-C maps. The consequences of these widespread artifacts on the normalized maps are mostly unexplored. We observed that current normalization methods are not robust to the presence of large-scale copy number variations, potentially obscuring biological differences and enhancing batch effects. To address this issue, we developed an alternative approach designed to take into account chromosomal abnormalities. The method, called OneD, increases reproducibility among replicates of Hi-C samples with abnormal karyotype, outperforming previous methods significantly. On normal karyotypes, OneD fared equally well as state-of-the-art methods, making it a safe choice for Hi-C normalization. OneD is fast and scales well in terms of computing resources for resolutions up to 1 kbp. OneD is implemented as an R package available at http://www.github.com/qenvio/dryhic.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Whitney Whitford ◽  
Klaus Lehnert ◽  
Russell G. Snell ◽  
Jessie C. Jacobsen

AbstractThe popularisation and decreased cost of genome resequencing has resulted in an increased use in molecular diagnostics. While there are a number of established and high quality bioinfomatic tools for identifying small genetic variants including single nucleotide variants and indels, currently there is no established standard for the detection of copy number variants (CNVs) from sequence data. The requirement for CNV detection from high throughput sequencing has resulted in the development of a large number of software packages. These tools typically utilise the sequence data characteristics: read depth, split reads, read pairs, and assembly-based techniques. However, the additional source of information from read balance (defined as relative proportion of reads of each allele at each position) has been underutilised in the existing applications. Here we present Read Balance Validator (RBV), a bioinformatic tool that uses read balance for prioritisation and validation of putative CNVs. The software simultaneously interrogates nominated regions for the presence of deletions or multiplications, and can differentiate larger CNVs from diploid regions. Additionally, the utility of RBV to test for inheritance of CNVs is demonstrated in this report. RBV is a CNV validation and prioritisation bioinformatic tool for both genome and exome sequencing available as a python package from https://github.com/whitneywhitford/RBV.


Sign in / Sign up

Export Citation Format

Share Document