An Unbiased Predictive Model to Detect DNA Methylation Propensity of CpG Islands in the Human Genome

2020 ◽  
Vol 15 ◽  
Author(s):  
Dicle Yalcin ◽  
Hasan H. Otu

Background: Epigenetic repression mechanisms play an important role in gene regulation, specifically in cancer development. In many cases, a CpG island’s (CGI) susceptibility or resistance to methylation are shown to be contributed by local DNA sequence features. Objective: To develop unbiased machine learning models–individually and combined for different biological features–that predict the methylation propensity of a CGI. Methods: We developed our model consisting of CGI sequence features on a dataset of 75 sequences (28 prone, 47 resistant) representing a genome-wide methylation structure. We tested our model on two independent datasets that are chromosome (132 sequences) and disease (70 sequences) specific. Results: We provided improvements in prediction accuracy over previous models. Our results indicate that combined features better predict the methylation propensity of a CGI (area under the curve (AUC) ~0.81). Our global methylation classifier performs well on independent datasets reaching an AUC of ~0.82 for the complete model and an AUC of ~0.88 for the model using select sequences that better represent their classes in the training set. We report certain de novo motifs and transcription factor binding site (TFBS) motifs that are consistently better in separating prone and resistant CGIs. Conclusion: Predictive models for the methylation propensity of CGIs lead to a better understanding of disease mechanisms and can be used to classify genes based on their tendency to contain methylation prone CGIs, which may lead to preventative treatment strategies. MATLAB and Python™ scripts used for model building, prediction, and downstream analyses are available at https://github.com/dicleyalcin/methylProp_predictor.

2011 ◽  
Vol 21 (2) ◽  
pp. 269-279 ◽  
Author(s):  
Rachel Michaelson-Cohen ◽  
Ilana Keshet ◽  
Ravid Straussman ◽  
Merav Hecht ◽  
Howard Cedar ◽  
...  

Background:DNA methylation regulates gene expression during development. The methylation pattern is established at the time of implantation. CpG islands are genome regions usually protected from methylation; however, selected islands are methylated later. Many undergo methylation in cancer, causing epigenetic gene silencing. Aberrant methylation occurs early in tumorigenesis, in a specific pattern, inhibiting differentiation.Although methylation of specific genes in ovarian tumors has been demonstrated in numerous studies, they represent only a fraction of all methylated genes in tumorigenesis.Objectives:To explore the hypermethylation design in ovarian cancer compared with the methylation profile of normal ovaries, on a genome-wide scale, thus shedding light on the role of gene silencing in ovarian carcinogenesis.Identifying genes that undergo de novo methylation in ovarian cancer may assist in creating biomarkers for disease diagnosis, prognosis, and treatment responsiveness.Methods:DNA was collected from human epithelial ovarian cancers and normal ovaries. Methylation was detected by immunoprecipitation using 5-methyl-cytosine-antibodies. DNA was hybridized to a CpG island microarray containing 237,220 gene promoter probes. Results were analyzed by hybridization intensity, validated by bisulfite analysis.Results:A total of 367 CpG islands were specifically methylated in cancer cells. There was enrichment of methylated genes in functional categories related to cell differentiation and proliferation inhibition. It seems that their silencing enables tumor proliferation.Conclusions:This study provides new perspectives on methylation in ovarian carcinoma, genome-wide. It illustrates how methylation of CpG islands causes silencing of genes that have a role in cell differentiation and functioning. It creates potential biomarkers for diagnosis, prognosis, and treatment responsiveness.


2018 ◽  
Author(s):  
Taeyoung Hwang ◽  
Dimitrios Mathios ◽  
Kerrie L McDonald ◽  
Irene Daris ◽  
Sung-Hye Park ◽  
...  

AbstractThe study of survival outliers of glioblastoma (GBM) can have important implications on gliomagenesis as well as in the identification of ways to alter clinical course on this almost uniformly lethal cancer type. However, current studied epigenetic and genetic signatures of the GBM outliers have failed to identify unifying criteria to characterize this unique group of patients. In this study, we profiled the global DNA methylation pattern of mainly IDH1 wild type survival outliers of glioblastoma and performed comprehensive enrichment analyses with genomic and epigenomic signatures. We found that the genome of long-term survivors in glioblastoma is differentially methylated relative to short-term survivor patients depending on CpG density: hypermethylation near CpG islands (CGIs) and hypomethylation far from CGIs. Interestingly, these two patterns are associated with distinct oncogenic aspects in gliomagenesis. The hypomethylation pattern at the region distant from CGI is associated with lower rates of de novo mutations while the hypermethylation at CGIs correlates with transcriptional downregulation of genes involved in cancer progression pathways. These results extend our understanding of DNA methylation of survival outliers in glioblastoma in a genome-wide level, and provide insight on the potential impact of DNA hypomethylation in cancer genome.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 246
Author(s):  
Xiaomeng Chen ◽  
Rui Li ◽  
Yonglin Wang ◽  
Aining Li

An emerging poplar canker caused by the gram-negative bacterium, Lonsdalea populi, has led to high mortality of hybrid poplars Populus × euramericana in China and Europe. The molecular bases of pathogenicity and bark adaptation of L. populi have become a focus of recent research. This study revealed the whole genome sequence and identified putative virulence factors of L. populi. A high-quality L. populi genome sequence was assembled de novo, with a genome size of 3,859,707 bp, containing approximately 3434 genes and 107 RNAs (75 tRNA, 22 rRNA, and 10 ncRNA). The L. populi genome contained 380 virulence-associated genes, mainly encoding for adhesion, extracellular enzymes, secretory systems, and two-component transduction systems. The genome had 110 carbohydrate-active enzyme (CAZy)-coding genes and putative secreted proteins. The antibiotic-resistance database annotation listed that L. populi was resistant to penicillin, fluoroquinolone, and kasugamycin. Analysis of comparative genomics found that L. populi exhibited the highest homology with the L. britannica genome and L. populi encompassed 1905 specific genes, 1769 dispensable genes, and 1381 conserved genes, suggesting high evolutionary diversity and genomic plasticity. Moreover, the pan genome analysis revealed that the N-5-1 genome is an open genome. These findings provide important resources for understanding the molecular basis of the pathogenicity and biology of L. populi and the poplar-bacterium interaction.


2021 ◽  
Author(s):  
Otília Menyhárt ◽  
János Tibor Fekete ◽  
Balázs Győrffy

Abstract Despite advances in molecular characterization of glioblastoma multiforme (GBM), only a handful of predictive biomarkers exist with limited clinical relevance. We aimed to identify differentially expressed genes in tumor samples collected at surgery associated with response to subsequent treatment, including temozolomide (TMZ) and nitrosoureas. Gene expression was collected from multiple independent datasets. Patients were categorized as responders/nonresponders based on their survival status at 16 months post-surgery. For each gene, the expression was compared between responders and nonresponders with a Mann-Whitney U test and receiver operating characteristic. The package "roc" was used to calculate the area under the curve (AUC). The integrated database comprises 454 GBM patients from three independent datasets and 10,103 genes. The highest proportion of responders (68%) were among patients treated with TMZ combined with nitrosoureas, where FCGR2B upregulation provided the strongest predictive value (AUC=0.72, p < 0.001). Elevated expression of CSTA and MRPS17 was associated with a lack of response to multiple treatment strategies. DLL3 upregulation was present in subsequent responders to any treatment combination containing TMZ. Three genes (PLSCR1, MX1, and MDM2) upregulated both in the younger cohort and in patients expressing low MGMT delineate a subset of patients with worse prognosis within a population generally associated with a favorable outcome. The identified transcriptomic changes provide biomarkers of responsiveness, offer avenues for preclinical studies, and may enhance future GBM patient stratifications. The described methodology provides a reliable pipeline for the initial testing of potential biomarker candidates for future validation studies.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Huihui Li ◽  
Mingzhe Xie ◽  
Yan Wang ◽  
Ludong Yang ◽  
Zhi Xie ◽  
...  

AbstractriboCIRC is a translatome data-oriented circRNA database specifically designed for hosting, exploring, analyzing, and visualizing translatable circRNAs from multi-species. The database provides a comprehensive repository of computationally predicted ribosome-associated circRNAs; a manually curated collection of experimentally verified translated circRNAs; an evaluation of cross-species conservation of translatable circRNAs; a systematic de novo annotation of putative circRNA-encoded peptides, including sequence, structure, and function; and a genome browser to visualize the context-specific occupant footprints of circRNAs. It represents a valuable resource for the circRNA research community and is publicly available at http://www.ribocirc.com.


2021 ◽  
Author(s):  
Daniel N. Weinberg ◽  
Phillip Rosenbaum ◽  
Xiao Chen ◽  
Douglas Barrows ◽  
Cynthia Horth ◽  
...  
Keyword(s):  
De Novo ◽  

2020 ◽  
Vol 12 (6) ◽  
pp. 905-910 ◽  
Author(s):  
Ruoyu Liu ◽  
Kun Wang ◽  
Jun Liu ◽  
Wenjie Xu ◽  
Yang Zhou ◽  
...  

Abstract Cold seeps, characterized by the methane, hydrogen sulfide, and other hydrocarbon chemicals, foster one of the most widespread chemosynthetic ecosystems in deep sea that are densely populated by specialized benthos. However, scarce genomic resources severely limit our knowledge about the origin and adaptation of life in this unique ecosystem. Here, we present a genome of a deep-sea limpet Bathyacmaea lactea, a common species associated with the dominant mussel beds in cold seeps. We yielded 54.6 gigabases (Gb) of Nanopore reads and 77.9-Gb BGI-seq raw reads, respectively. Assembly harvested a 754.3-Mb genome for B. lactea, with 3,720 contigs and a contig N50 of 1.57 Mb, covering 94.3% of metazoan Benchmarking Universal Single-Copy Orthologs. In total, 23,574 protein-coding genes and 463.4 Mb of repetitive elements were identified. We analyzed the phylogenetic position, substitution rate, demographic history, and TE activity of B. lactea. We also identified 80 expanded gene families and 87 rapidly evolving Gene Ontology categories in the B. lactea genome. Many of these genes were associated with heterocyclic compound metabolism, membrane-bounded organelle, metal ion binding, and nitrogen and phosphorus metabolism. The high-quality assembly and in-depth characterization suggest the B. lactea genome will serve as an essential resource for understanding the origin and adaptation of life in the cold seeps.


BMC Genetics ◽  
2014 ◽  
Vol 15 (1) ◽  
pp. 24 ◽  
Author(s):  
Samuel G Younkin ◽  
Robert B Scharpf ◽  
Holger Schwender ◽  
Margaret M Parker ◽  
Alan F Scott ◽  
...  

2013 ◽  
Vol 94 (5) ◽  
pp. 960-970 ◽  
Author(s):  
Gernot Wolf ◽  
Anders Lade Nielsen ◽  
Jacob Giehm Mikkelsen ◽  
Finn Skou Pedersen

Endogenous retroviruses (ERVs) are remnants of retroviral germ line infections and have been identified in all mammals investigated so far. Although the majority of ERVs are degenerated, some mammalian species, such as mice and pigs, carry replication-competent ERVs capable of forming infectious viral particles. In mice, ERVs are silenced by DNA methylation and histone modifications and some exogenous retroviruses were shown to be transcriptionally repressed after integration by a primer-binding site (PBS) targeting mechanism. However, epigenetic repression of porcine ERVs (PERVs) has remained largely unexplored so far. In this study, we screened the pig genome for PERVs using LTRharvest, a tool for de novo detection of ERVs, and investigated various aspects of epigenetic repression of three unrelated PERV families. We found that these PERV families are differentially up- or downregulated upon chemical inhibition of DNA methylation and histone deacetylation in cultured porcine cells. Furthermore, chromatin immunoprecipitation analysis revealed repressive histone methylation marks at PERV loci in primary porcine embryonic germ cells and immortalized embryonic kidney cells. PERV elements belonging to the PERV-γ1 family, which is the only known PERV family that has remained active up to the present, were marked by significantly higher levels of histone methylations than PERV-γ2 and PERV-β3 proviruses. Finally, we tested three PERV-associated PBS sequences for repression activity in murine and porcine cells using retroviral transduction experiments and showed that none of these PBS sequences induced immediate transcriptional silencing in the tested primary porcine cells.


2014 ◽  
Vol 70 (a1) ◽  
pp. C609-C609
Author(s):  
Patrick Gourhant ◽  
Beatriz Guimaraes ◽  
Tatiana Isabet ◽  
Sebastian Klinke ◽  
Pierre Legrand ◽  
...  

"PROXIMA 1, a beamline for macro-molecular crystallography at the 3rd generation synchrotron source SOLEIL, is equipped with a multi-circle goniometer (alpha 50 degrees) as well as a PILATUS 6M detector. These features, along with the extended energy range of the beam line towards the low energies (down to 5.5 keV) and the possibility to adapt the source size to the sample in order to optimize signal to noise ratio, have made the beam line very attractive for S-SAD phasing with more than seven examples of successful de novo phasing achieved over the last two years. The use of low energies has also proved a significant aid in assisting with MODEL building. The technical capabilities of the beam line for low energy data collections will be presented, along with a number of examples of the successful use of low wavelengths on the beam line. The importance of combining data from multiple sample orientations in order to achieve ""true multiplicity"" will be highlighted, as well as the importance of combining data from multiple crystals in order to achieve high multiplicity."


Sign in / Sign up

Export Citation Format

Share Document