Mapping of scaffold/matrix attachment regions in human genome: a data mining exercise

AbstractScaffold/matrix attachment regions (S/MARs) are DNA elements that serve to compartmentalize the chromatin into structural and functional domains. These elements are involved in control of gene expression which governs the phenotype and also plays role in disease biology. Therefore, genome-wide understanding of these elements holds great therapeutic promise. Several attempts have been made toward identification of S/MARs in genomes of various organisms including human. However, a comprehensive genome-wide map of human S/MARs is yet not available. Toward this objective, ChIP-Seq data of 14 S/MAR binding proteins were analyzed and the binding site coordinates of these proteins were used to prepare a non-redundant S/MAR dataset of human genome. Along with co-ordinate (location) details of S/MARs, the dataset also revealed details of S/MAR features, namely, length, inter-SMAR length (the chromatin loop size), nucleotide repeats, motif abundance, chromosomal distribution and genomic context. S/MARs identified in present study and their subsequent analysis also suggests that these elements act as hotspots for integration of retroviruses. Therefore, these data will help toward better understanding of genome functioning and designing effective anti-viral therapeutics. In order to facilitate user friendly browsing and retrieval of the data obtained in present study, a web interface, MARome (http://bioinfo.net.in/MARome), has been developed.

Download Full-text

Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer’s disease

10.1101/078865 ◽

2016 ◽

Cited By ~ 1

Author(s):

Qiongshi Lu ◽

Ryan L. Powles ◽

Sarah Abdallah ◽

Derek Ou ◽

Qian Wang ◽

...

Keyword(s):

Human Genome ◽

Functional Annotation ◽

Late Onset ◽

Complex Diseases ◽

Tissue Specific ◽

Functional Regions ◽

Genome Wide ◽

Dna Elements ◽

Human Complex

AbstractContinuing efforts from large international consortia have made genome-wide epigenomic and transcriptomic annotation data publicly available for a variety of cell and tissue types. However, synthesis of these datasets into effective summary metrics to characterize the functional non-coding genome remains a challenge. Here, we present GenoSkyline-Plus, an extension of our previous work through integration of an expanded set of epigenomic and transcriptomic annotations to produce high-resolution, single tissue annotations. After validating our annotations with a catalog of tissue-specific non-coding elements previously identified in the literature, we apply our method using data from 127 different cell and tissue types to present an atlas of heritability enrichment across 45 different GWAS traits. We show that broader organ system categories (e.g. immune system) increase statistical power in identifying biologically relevant tissue types for complex diseases while annotations of individual cell types (e.g. monocytes or B-cells) provide deeper insights into disease etiology. Additionally, we use our GenoSkyline-Plus annotations in an in-depth case study of late-onset Alzheimer’s disease (LOAD). Our analyses suggest a strong connection between LOAD heritability and genetic variants contained in regions of the genome functional in monocytes. Furthermore, we show that LOAD shares a similar localization of SNPs to monocyte-functional regions with Parkinson’s disease. Overall, we demonstrate that integrated genome annotations at the single tissue level provide a valuable tool for understanding the etiology of complex human diseases. Our GenoSkyline-Plus annotations are freely available at http://genocanyon.med.yale.edu/GenoSkyline.Author SummaryAfter years of community efforts, many experimental and computational approaches have been developed and applied for functional annotation of the human genome, yet proper annotation still remains challenging, especially in non-coding regions. As complex disease research rapidly advances, increasing evidence suggests that non-coding regulatory DNA elements may be the primary regions harboring risk variants in human complex diseases. In this paper, we introduce GenoSkyline-Plus, a principled annotation framework to identify tissue and cell type-specific functional regions in the human genome through integration of diverse high-throughput epigenomic and transcriptomic data. Through validation of known non-coding tissue-specific regulatory regions, enrichment analyses on 45 complex traits, and an in-depth case study of neurodegenerative diseases, we demonstrate the ability of GenoSkyline-Plus to accurately identify tissue-specific functionality in the human genome and provide unbiased, genome-wide insights into the genetic basis of human complex diseases.

Download Full-text

Discovering candidate imprinted genes and Imprinting Control Regions in the human genome

10.1101/678151 ◽

2019 ◽

Cited By ~ 1

Author(s):

Minou Bina

Keyword(s):

Human Genome ◽

Genomic Dna ◽

Specific Gene ◽

Imprinted Genes ◽

Chromosomal Dna ◽

Genome Wide ◽

A Genome ◽

Dna Elements ◽

Parent Of Origin ◽

Control Regions

ABSTRACTGenomic imprinting is a process thereby a subset of genes is expressed in a parent-of-origin specific manner. This evolutionary novelty is restricted to mammals and controlled by genomic DNA segments known as Imprinting Control Regions (ICRs). The known imprinted genes function in many important developmental and postnatal processes including organogenesis, neurogenesis, and fertility. Furthermore, defects in imprinted genes could cause severe diseases and abnormalities. Because of the importance of the ICRs to the regulation of parent-of-origin specific gene expression, I developed a genome-wide strategy for their localization. This strategy located clusters of the ZFBS-Morph overlaps along the entire human genome. Previously, I showed that in the mouse genome, clusters of 2 or more of these overlaps correctly located ∼ 90% of the fully characterized ICRs and germline Differentially Methylated Regions (gDMRs). The ZFBS-Morph overlaps are composite-DNA-elements comprised of the ZFP57 binding site (ZFBS) overlapping a subset of the MLL1 morphemes. My strategy consists of creating plots to display the density of ZFBS-Morph overlaps along genomic DNA. Peaks in these plots pinpointed several of the known ICRs/gDMRs within relatively long genomic DNA sections and even along entire chromosomal DNA. Therefore, peaks in the density-plots are likely to reflect the positions of known or candidate ICRs. I also found that by locating the genes in the vicinity of candidate ICRs, I could discover potential and novel human imprinting genes. Additionally, my exploratory assessments revealed a connection between several of the potential imprinted genes and human developmental anomalies including syndromes.

Download Full-text

DNA methylation in satellite repeats disorders

Essays in Biochemistry ◽

10.1042/ebc20190028 ◽

2019 ◽

Vol 63 (6) ◽

pp. 757-771 ◽

Cited By ~ 4

Author(s):

Claire Francastel ◽

Frédérique Magdinier

Keyword(s):

Dna Methylation ◽

Human Genome ◽

Repetitive Dna ◽

Dna Sequences ◽

Satellite Repeats ◽

Tremendous Progress ◽

Genes Encoding ◽

Dna Elements ◽

Near Future

Abstract Despite the tremendous progress made in recent years in assembling the human genome, tandemly repeated DNA elements remain poorly characterized. These sequences account for the vast majority of methylated sites in the human genome and their methylated state is necessary for this repetitive DNA to function properly and to maintain genome integrity. Furthermore, recent advances highlight the emerging role of these sequences in regulating the functions of the human genome and its variability during evolution, among individuals, or in disease susceptibility. In addition, a number of inherited rare diseases are directly linked to the alteration of some of these repetitive DNA sequences, either through changes in the organization or size of the tandem repeat arrays or through mutations in genes encoding chromatin modifiers involved in the epigenetic regulation of these elements. Although largely overlooked so far in the functional annotation of the human genome, satellite elements play key roles in its architectural and topological organization. This includes functions as boundary elements delimitating functional domains or assembly of repressive nuclear compartments, with local or distal impact on gene expression. Thus, the consideration of satellite repeats organization and their associated epigenetic landmarks, including DNA methylation (DNAme), will become unavoidable in the near future to fully decipher human phenotypes and associated diseases.

Download Full-text

Faculty Opinions recommendation of A genome-wide analysis of common fragile sites: what features determine chromosomal instability in the human genome?

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.722875088.793531669 ◽

2017 ◽

Author(s):

Ian D Hickson

Keyword(s):

Human Genome ◽

Chromosomal Instability ◽

Fragile Sites ◽

Common Fragile Sites ◽

Genome Wide Analysis ◽

Genome Wide ◽

A Genome

Download Full-text

Identifying Human Genome-Wide CNV, LOH and UPD by Targeted Sequencing of Selected Regions

PLoS ONE ◽

10.1371/journal.pone.0123081 ◽

2015 ◽

Vol 10 (4) ◽

pp. e0123081 ◽

Cited By ~ 4

Author(s):

Yu Wang ◽

Wei Li ◽

Yingying Xia ◽

Chongzhi Wang ◽

Y. Tom Tang ◽

...

Keyword(s):

Human Genome ◽

Targeted Sequencing ◽

Genome Wide

Download Full-text

Erratum for Sivan et al., “Identification of Restriction Factors by Human Genome-Wide RNA Interference Screening of Viral Host Range Mutants Exemplified by Discovery of SAMD9 and WDR6 as Inhibitors of the Vaccinia Virus K1L−C7L− Mutant”

mBio ◽

10.1128/mbio.01735-17 ◽

2017 ◽

Vol 8 (5) ◽

Author(s):

Gilad Sivan ◽

Pinar Ormanoglu ◽

Eugen C. Buehler ◽

Scott E. Martin ◽

Bernard Moss

Keyword(s):

Rna Interference ◽

Vaccinia Virus ◽

Host Range ◽

Human Genome ◽

Restriction Factors ◽

Genome Wide ◽

Viral Host Range

Download Full-text

Intergenic RNA mainly derives from nascent transcripts of known genes

10.1101/2020.01.08.898478 ◽

2020 ◽

Author(s):

Agostini Federico ◽

Zagalak Julian ◽

Attig Jan ◽

Ule Jernej ◽

Nicholas M. Luscombe

Keyword(s):

Rna Polymerase Ii ◽

Human Genome ◽

Genomic Context ◽

Alternative Processing ◽

Intergenic Regions ◽

Transcriptional Units ◽

And Function ◽

Cellular Compartments ◽

Nascent Transcripts ◽

Transcriptional Start Sites

AbstractBackgroundEukaryotic genomes undergo pervasive transcription, leading to the production of many types of stable and unstable RNAs. Transcription is not restricted to regions with annotated gene features but includes almost any genomic context. Currently, the source and function of most RNAs originating from intergenic regions in the human genome remains unclear.ResultsWe hypothesised that many intergenic RNA can be ascribed to the presence of as-yet unannotated genes or the ‘fuzzy’ transcription of known genes that extends beyond the annotated boundaries. To elucidate the contributions of these two sources, we assembled a dataset of >2.5 billion publicly available RNA-seq reads across 5 human cell lines and multiple cellular compartments to annotate transcriptional units in the human genome. About 80% of transcripts from unannotated intergenic regions can be attributed to the fuzzy transcription of existing genes; the remaining transcripts originate mainly from putative long non-coding RNA loci that are rarely spliced. We validated the transcriptional activity of these intergenic RNA using independent measurements, including transcriptional start sites, chromatin signatures, and genomic occupancies of RNA polymerase II in various phosphorylation states. We also analysed the nuclear localisation and sensitivities of intergenic transcripts to nucleases to illustrate that they tend to be rapidly degraded either ‘on-chromatin’ by XRN2 or ‘off-chromatin’ by the exosome.ConclusionsWe provide a curated atlas of intergenic RNAs that distinguishes between alternative processing of well annotated genes from independent transcriptional units based on the combined analysis of chromatin signatures, nuclear RNA localisation and degradation pathways.

Download Full-text

Mapping ribonucleotides embedded in genomic DNA to single-nucleotide resolution using Ribose-Map

10.1101/2020.08.27.267153 ◽

2020 ◽

Author(s):

Alli L. Gombolay ◽

Francesca Storici

Keyword(s):

Computing Time ◽

Wide Distribution ◽

Sequencing Analysis ◽

Sequencing Data ◽

Single Nucleotide ◽

Genome Wide ◽

Hands On ◽

Nucleotide Resolution ◽

User Friendly ◽

Single Nucleotide Resolution

ABSTRACTRibose-Map is a user-friendly, standardized bioinformatics toolkit for the comprehensive analysis of ribonucleotide sequencing experiments. It allows researchers to map the locations of ribonucleotides in DNA to single-nucleotide resolution and identify biological signatures of ribonucleotide incorporation. In addition, it can be applied to data generated using any currently available high-throughput ribonucleotide sequencing technique, thus standardizing the analysis of ribonucleotide sequencing experiments and allowing direct comparisons of results. This protocol describes in detail how to use Ribose-Map to analyze raw ribonucleotide sequencing data, including preparing the reads for analysis, locating the genomic coordinates of ribonucleotides, exploring the genome-wide distribution of ribonucleotides, determining the nucleotide sequence context of ribonucleotides, and identifying hotspots of ribonucleotide incorporation. Ribose-Map does not require background knowledge of ribonucleotide sequencing analysis and assumes only basic command-line skills. The protocol requires less than 3 hr of computing time for most datasets and about 30 min of hands-on time.

Download Full-text

DNA methylation marks inter-nucleosome linker regions throughout the human genome

10.7287/peerj.preprints.27 ◽

2013 ◽

Author(s):

Benjamin P. Berman ◽

Yaping Liu ◽

Theresa K. Kelly

Keyword(s):

Dna Methylation ◽

Human Genome ◽

Ctcf Binding ◽

Gene Promoters ◽

Cpg Dinucleotides ◽

Sequencing Technologies ◽

Nucleosome Organization ◽

Genome Wide ◽

A Genome ◽

Sequencing Studies

Background: Nucleosome organization and DNA methylation are two mechanisms that are important for proper control of mammalian transcription, as well as epigenetic dysregulation associated with cancer. Whole-genome DNA methylation sequencing studies have found that methylation levels in the human genome show periodicities of approximately 190 bp, suggesting a genome-wide relationship between the two marks. A recent report (Chodavarapu et al., 2010) attributed this to higher methylation levels of DNA within nucleosomes. Here, we analyzed a number of published datasets and found a more compelling alternative explanation, namely that methylation levels are highest in linker regions between nucleosomes. Results: Reanalyzing the data from (Chodavarapu et al., 2010), we found that nucleosome-associated methylation could be strongly confounded by known sequence-related biases of the next-generation sequencing technologies. By accounting for these biases and using an unrelated nucleosome profiling technology, NOMe-seq, we found that genome-wide methylation was actually highest within linker regions occurring between nucleosomes in multi-nucleosome arrays. This effect was consistent among several methylation datasets generated independently using two unrelated methylation assays. Linker-associated methylation was most prominent within long Partially Methylated Domains (PMDs) and the positioned nucleosomes that flank CTCF binding sites. CTCF adjacent nucleosomes retained the correct positioning in regions completely devoid of CpG dinucleotides, suggesting that DNA methylation is not required for proper nucleosomes positioning. Conclusions: The biological mechanisms responsible for DNA methylation patterns outside of gene promoters remain poorly understood. We identified a significant genome-wide relationship between nucleosome organization and DNA methylation, which can be used to more accurately analyze and understand the epigenetic changes that accompany cancer and other diseases.

Download Full-text

Genome-Wide Identification and Characterization of CIPK Family and Analysis Responses to Various Stresses in Apple (Malus domestica)

International Journal of Molecular Sciences ◽

10.3390/ijms19072131 ◽

2018 ◽

Vol 19 (7) ◽

pp. 2131 ◽

Cited By ~ 8

Author(s):

Lili Niu ◽

Biying Dong ◽

Zhihua Song ◽

Dong Meng ◽

Yujie Fu

Keyword(s):

Growth Stages ◽

Hormone Signaling ◽

Chromosomal Distribution ◽

Interacting Protein ◽

C Terminus ◽

Genome Wide ◽

Enhanced Expression ◽

Interaction Domains ◽

Identification And Characterization

In the CIPK family, the CBL-interacting protein kinases have shown crucial roles in hormone signaling transduction, and response to abiotic stress in plant developmental processes. The CIPK family is characterized by conserved NAF/FISL (Asn-Ala-Phe) and PPI (protein-phosphatase interaction) domains in the C-terminus. However, little data has been reported about the CIPK family in apple. A total of 34 MdCIPK genes were identified from the apple genome in this study and were later divided into two groups according to the CIPK domains, characterized by gene structure and chromosomal distribution, and then mapped onto 17 chromosomes. All MdCIPK genes were expressed in the four apple tissues (leaf, root, flower, and fruit). In addition, the MdCIPK gene expression profile showed that five members among them revealed enhanced expression during the pollen tube growth stages. The MdCIPK4 was the most expressive during the entire fruit development stages. Under stress conditions 21 MdCIPK genes transcript levels were up-regulated in response to fungal and salt treatments. This suggested the possible features of these genes’ response to stresses in apples. Our findings provide a new insight about the roles of CIPK genes in apples, which could contribute to the cloning and functional analysis of CIPK genes in the future.

Download Full-text