scholarly journals MaxHiC: robust estimation of chromatin interaction frequency in Hi-C and capture Hi-C experiments

Author(s):  
Hamid Alinejad-Rokny ◽  
Rassa Ghavami ◽  
Hamid R. Rabiee ◽  
Narges Rezaei ◽  
Kin Tung Tam ◽  
...  

AbstractHi-C is a genome-wide chromosome conformation capture technology that detects interactions between pairs of genomic regions, and exploits higher order chromatin structures. Conceptually Hi-C data counts interaction frequencies between every position in the genome and every other position. Biologically functional interactions are expected to occur more frequently than random (background) interactions. To identify biologically relevant interactions, several background models that take biases such as distance, GC content and mappability into account have been proposed. Here we introduce MaxHiC, a background correction tool that deals with these complex biases and robustly identifies statistically significant interactions in both Hi-C and capture Hi-C experiments. MaxHiC uses a negative binomial distribution model and a maximum likelihood technique to correct biases in both Hi-C and capture Hi-C libraries. We systematically benchmark MaxHiC against major Hi-C background correction tools and demonstrate using published Hi-C and capture Hi-C datasets that 1) Interacting regions identified by MaxHiC have significantly greater levels of overlap with known regulatory features (e.g. active chromatin histone marks, CTCF binding sites, DNase sensitivity) and also disease-associated genome-wide association SNPs than those identified by currently existing models, and 2) the pairs of interacting regions are more likely to be linked by eQTL pairs and more likely to identify known enhancer-promoter pairs than any of the existing methods. We also demonstrate that interactions between different genomic region types have distinct distance distribution only revealed by MaxHiC. MaxHiC is publicly available as a python package for the analysis of Hi-C and capture Hi-C data.

2021 ◽  
Vol 7 (26) ◽  
pp. eabf8962
Author(s):  
Ke Xiao ◽  
Dan Xiong ◽  
Gong Chen ◽  
Jinsong Yu ◽  
Yue Li ◽  
...  

Like most DNA viruses, herpesviruses precisely deliver their genomes into the sophisticatedly organized nuclei of the infected host cells to initiate subsequent transcription and replication. However, it remains elusive how the viral genome specifically interacts with the host genome and hijacks host transcription machinery. Using pseudorabies virus (PRV) as model virus, we performed chromosome conformation capture assays to demonstrate a genome-wide specific trans-species chromatin interaction between the virus and host. Our data show that the PRV genome is delivered by the host DNA binding protein RUNX1 into the open chromatin and active transcription zone. This facilitates virus hijacking host RNAPII to efficiently transcribe viral genes, which is significantly inhibited by either a RUNX1 inhibitor or RNA interference. Together, these findings provide insights into the chromatin interaction between viral and host genomes and identify new areas of research to advance the understanding of herpesvirus genome transcription.


Agronomy ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 27
Author(s):  
Archana Khadgi ◽  
Courtney A. Weber

Red raspberry (Rubus idaeus L.) is an expanding high-value berry crop worldwide. The presence of prickles, outgrowths of epidermal tissues lacking vasculature, on the canes, petioles, and undersides of leaves complicates both field management and harvest. The utilization of cultivars with fewer prickles or prickle-free canes simplifies production. A previously generated population segregating for prickles utilizing the s locus between the prickle-free cultivar Joan J (ss) and the prickled cultivar Caroline (Ss) was analyzed to identify the genomic region associated with prickle development in red raspberry. Genotype by sequencing (GBS) was combined with a genome-wide association study (GWAS) using fixed and random model circulating probability unification (FarmCPU) to analyze 8474 single nucleotide polymorphisms (SNPs) and identify significant markers associated with the prickle-free trait. A total of four SNPs were identified on chromosome 4 that were associated with the phenotype and were located near or in annotated genes. This study demonstrates how association genetics can be used to decipher the genetic control of important horticultural traits in Rubus, and provides valuable information about the genomic region and potential genes underlying the prickle-free trait.


Genes ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 1065
Author(s):  
Reinhard Mischke ◽  
Julia Metzger ◽  
Ottmar Distl

Congenital fibrinogen disorders are very rare in dogs. Cases of afibrinogenemia have been reported in Bernese Mountain, Bichon Frise, Cocker Spaniel, Collie, Lhasa Apso, Viszla, and St. Bernard dogs. In the present study, we examined four miniature wire-haired Dachshunds with afibrinogenemia and ascertained their pedigree. Homozygosity mapping and a genome-wide association study identified a candidate genomic region at 50,188,932–64,187,680 bp on CFA15 harboring FGB (fibrinogen beta chain), FGA (fibrinogen alpha chain), and FGG (fibrinogen gamma-B chain). Sanger sequencing of all three fibrinogen genes in two cases and validation of the FGA-associated mutation (FGA:g.6296delT, NC_006597.3:g.52240694delA, rs1152388481) in pedigree members showed a perfect co-segregation with afibrinogenemia-affected phenotypes, obligate carriers, and healthy animals. In addition, the rs1152388481 variant was validated in 393 Dachshunds and samples from 33 other dog breeds. The rs1152388481 variant is predicted to modify the protein sequence of both FGA transcripts (FGA201:p.Ile486Met and FGA-202:p.Ile555Met) leading to proteins truncated by 306 amino acids. The present data provide evidence for a novel FGA truncating frameshift mutation that is very likely to explain the cases of severe bleeding due to afibrinogenemia in a Dachshund family. This mutation has already been spread in Dachshunds through carriers before cases were ascertained. Genetic testing allows selective breeding to prevent afibrinogenemia-affected puppies in the future.


2013 ◽  
Author(s):  
Benjamin P. Berman ◽  
Yaping Liu ◽  
Theresa K. Kelly

Background: Nucleosome organization and DNA methylation are two mechanisms that are important for proper control of mammalian transcription, as well as epigenetic dysregulation associated with cancer. Whole-genome DNA methylation sequencing studies have found that methylation levels in the human genome show periodicities of approximately 190 bp, suggesting a genome-wide relationship between the two marks. A recent report (Chodavarapu et al., 2010) attributed this to higher methylation levels of DNA within nucleosomes. Here, we analyzed a number of published datasets and found a more compelling alternative explanation, namely that methylation levels are highest in linker regions between nucleosomes. Results: Reanalyzing the data from (Chodavarapu et al., 2010), we found that nucleosome-associated methylation could be strongly confounded by known sequence-related biases of the next-generation sequencing technologies. By accounting for these biases and using an unrelated nucleosome profiling technology, NOMe-seq, we found that genome-wide methylation was actually highest within linker regions occurring between nucleosomes in multi-nucleosome arrays. This effect was consistent among several methylation datasets generated independently using two unrelated methylation assays. Linker-associated methylation was most prominent within long Partially Methylated Domains (PMDs) and the positioned nucleosomes that flank CTCF binding sites. CTCF adjacent nucleosomes retained the correct positioning in regions completely devoid of CpG dinucleotides, suggesting that DNA methylation is not required for proper nucleosomes positioning. Conclusions: The biological mechanisms responsible for DNA methylation patterns outside of gene promoters remain poorly understood. We identified a significant genome-wide relationship between nucleosome organization and DNA methylation, which can be used to more accurately analyze and understand the epigenetic changes that accompany cancer and other diseases.


2014 ◽  
Vol 23 (03) ◽  
pp. 1460008
Author(s):  
Kevin Byron ◽  
Jason T. L. Wang ◽  
Dongrong Wen

Developing effective artificial intelligence tools to find motifs in DNA, RNA and proteins poses a challenging yet important problem in life science research. In this paper, we present a computational approach for finding RNA tertiary motifs in genomic sequences. Specifically, we predict genomic coordinate locations for coaxial helical stackings in 3-way RNA junctions. These predictions are provided by our tertiary motif search package, named CSminer, which utilizes two versatile methodologies: random forests and covariance models. A coaxial helical stacking tertiary motif occurs in a 3-way RNA junction where two separate helical elements form a pseudocontiguous helix and provide thermodynamic stability to the RNA molecule as a whole. Our CSminer tool first uses a genome-wide search method based on covariance models to find a genomic region that may potentially contain a coaxial helical stacking tertiary motif. CSminer then uses a random forests classifier to predict whether the genomic region indeed contains the tertiary motif. Experimental results demonstrate the effectiveness of our approach.


2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Suhua Feng ◽  
Zhenhui Zhong ◽  
Ming Wang ◽  
Steven E. Jacobsen

Abstract Background 5′ methylation of cytosines in DNA molecules is an important epigenetic mark in eukaryotes. Bisulfite sequencing is the gold standard of DNA methylation detection, and whole-genome bisulfite sequencing (WGBS) has been widely used to detect methylation at single-nucleotide resolution on a genome-wide scale. However, sodium bisulfite is known to severely degrade DNA, which, in combination with biases introduced during PCR amplification, leads to unbalanced base representation in the final sequencing libraries. Enzymatic conversion of unmethylated cytosines to uracils can achieve the same end product for sequencing as does bisulfite treatment and does not affect the integrity of the DNA; enzymatic methylation sequencing may, thus, provide advantages over bisulfite sequencing. Results Using an enzymatic methyl-seq (EM-seq) technique to selectively deaminate unmethylated cytosines to uracils, we generated and sequenced libraries based on different amounts of Arabidopsis input DNA and different numbers of PCR cycles, and compared these data to results from traditional whole-genome bisulfite sequencing. We found that EM-seq libraries were more consistent between replicates and had higher mapping and lower duplication rates, lower background noise, higher average coverage, and higher coverage of total cytosines. Differential methylation region (DMR) analysis showed that WGBS tended to over-estimate methylation levels especially in CHG and CHH contexts, whereas EM-seq detected higher CG methylation levels in certain highly methylated areas. These phenomena can be mostly explained by a correlation of WGBS methylation estimation with GC content and methylated cytosine density. We used EM-seq to compare methylation between leaves and flowers, and found that CHG methylation level is greatly elevated in flowers, especially in pericentromeric regions. Conclusion We suggest that EM-seq is a more accurate and reliable approach than WGBS to detect methylation. Compared to WGBS, the results of EM-seq are less affected by differences in library preparation conditions or by the skewed base composition in the converted DNA. It may therefore be more desirable to use EM-seq in methylation studies.


2019 ◽  
Author(s):  
Qiang Wu ◽  
Ya Guo ◽  
Yujia Lu ◽  
Jingwei Li ◽  
Yonghu Wu ◽  
...  

ABSTRACTCTCF is a key insulator-binding protein and mammalian genomes contain numerous CTCF-binding sites (CBSs), many of which are organized in tandem arrays. Here we provide direct evidence that CBSs, if located between enhancers and promoters in the Pcdhα and β-globin clusters, function as an enhancer-blocking insulator by forming distinct directional chromatin loops, regardless whether enhancers contain CBS or not. Moreover, computational simulation and experimental capture revealed balanced promoter usage in cell populations and stochastic monoallelic expression in single cells by large arrays of tandem variable CBSs. Finally, gene expression levels are negatively correlated with CBS insulators located between enhancers and promoters on a genome-wide scale. Thus, single CBS insulators ensure proper enhancer insulation and promoter activation while tandem-arrayed CBS insulators determine balanced promoter usage. This finding has interesting implications on the role of topological insulators in 3D genome folding and developmental gene regulation.


2013 ◽  
Author(s):  
Benjamin P. Berman ◽  
Yaping Liu ◽  
Theresa K. Kelly

Background: Nucleosome organization and DNA methylation are two mechanisms that are important for proper control of mammalian transcription, as well as epigenetic dysregulation associated with cancer. Whole-genome DNA methylation sequencing studies have found that methylation levels in the human genome show periodicities of approximately 190 bp, suggesting a genome-wide relationship between the two marks. A recent report (Chodavarapu et al., 2010) attributed this to higher methylation levels of DNA within nucleosomes. Here, we analyzed a number of published datasets and found a more compelling alternative explanation, namely that methylation levels are highest in linker regions between nucleosomes. Results: Reanalyzing the data from (Chodavarapu et al., 2010), we found that nucleosome-associated methylation could be strongly confounded by known sequence-related biases of the next-generation sequencing technologies. By accounting for these biases and using an unrelated nucleosome profiling technology, NOMe-seq, we found that genome-wide methylation was actually highest within linker regions occurring between nucleosomes in multi-nucleosome arrays. This effect was consistent among several methylation datasets generated independently using two unrelated methylation assays. Linker-associated methylation was most prominent within long Partially Methylated Domains (PMDs) and the positioned nucleosomes that flank CTCF binding sites. CTCF adjacent nucleosomes retained the correct positioning in regions completely devoid of CpG dinucleotides, suggesting that DNA methylation is not required for proper nucleosomes positioning. Conclusions: The biological mechanisms responsible for DNA methylation patterns outside of gene promoters remain poorly understood. We identified a significant genome-wide relationship between nucleosome organization and DNA methylation, which can be used to more accurately analyze and understand the epigenetic changes that accompany cancer and other diseases.


2014 ◽  
Author(s):  
Marcus M Dillon ◽  
Way Sung ◽  
Michael Lynch ◽  
Vaughn S Cooper

Spontaneous mutations are ultimately essential for evolutionary change and are also the root cause of many diseases. However, until recently, both biological and technical barriers have prevented detailed analyses of mutation profiles, constraining our understanding of the mutation process to a few model organisms and leaving major gaps in our understanding of the role of genome content and structure on mutation. Here, we present a genome-wide view of the molecular mutation spectrum in Burkholderia cenocepacia, a clinically relevant pathogen with high %GC-content and multiple chromosomes. We find that B. cenocepacia has low genome-wide mutation rates with insertion-deletion mutations biased towards deletions, consistent with the idea that deletion pressure reduces prokaryotic genome sizes. Unlike prior studies of other organisms, mutations in B. cenocepacia are not AT-biased, which suggests that at least some genomes with high %GC-content experience unusual base-substitution mutation pressure. Importantly, we also observe variation in both the rates and spectra of mutations among chromosomes and elevated G:C>T:A transversions in late-replicating regions. Thus, although some patterns of mutation appear to be highly conserved across cellular life, others vary between species and even between chromosomes of the same species, potentially influencing the evolution of nucleotide composition and genome architecture.


2021 ◽  
Vol 12 ◽  
Author(s):  
Xuhao Song ◽  
Tingbang Yang ◽  
Xinyi Zhang ◽  
Ying Yuan ◽  
Xianghui Yan ◽  
...  

Microsatellite or simple sequence repeat (SSR) instability within genes can induce genetic variation. The SSR signatures remain largely unknown in different clades within Euarchontoglires, one of the most successful mammalian radiations. Here, we conducted a genome-wide characterization of microsatellite distribution patterns at different taxonomic levels in 153 Euarchontoglires genomes. Our results showed that the abundance and density of the SSRs were significantly positively correlated with primate genome size, but no significant relationship with the genome size of rodents was found. Furthermore, a higher level of complexity for perfect SSR (P-SSR) attributes was observed in rodents than in primates. The most frequent type of P-SSR was the mononucleotide P-SSR in the genomes of primates, tree shrews, and colugos, while mononucleotide or dinucleotide motif types were dominant in the genomes of rodents and lagomorphs. Furthermore, (A)n was the most abundant motif in primate genomes, but (A)n, (AC)n, or (AG)n was the most abundant motif in rodent genomes which even varied within the same genus. The GC content and the repeat copy numbers of P-SSRs varied in different species when compared at different taxonomic levels, reflecting underlying differences in SSR mutation processes. Notably, the CDSs containing P-SSRs were categorized by functions and pathways using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes annotations, highlighting their roles in transcription regulation. Generally, this work will aid future studies of the functional roles of the taxonomic features of microsatellites during the evolution of mammals in Euarchontoglires.


Sign in / Sign up

Export Citation Format

Share Document