scholarly journals mCSEA: detecting subtle differentially methylated regions

2019 ◽  
Vol 35 (18) ◽  
pp. 3257-3262 ◽  
Author(s):  
Jordi Martorell-Marugán ◽  
Víctor González-Rumayor ◽  
Pedro Carmona-Sáez

Abstract Motivation The identification of differentially methylated regions (DMRs) among phenotypes is one of the main goals of epigenetic analysis. Although there are several methods developed to detect DMRs, most of them are focused on detecting relatively large differences in methylation levels and fail to detect moderate, but consistent, methylation changes that might be associated to complex disorders. Results We present mCSEA, an R package that implements a Gene Set Enrichment Analysis method to identify DMRs from Illumina450K and EPIC array data. It is especially useful for detecting subtle, but consistent, methylation differences in complex phenotypes. mCSEA also implements functions to integrate gene expression data and to detect genes with significant correlations among methylation and gene expression patterns. Using simulated datasets we show that mCSEA outperforms other tools in detecting DMRs. In addition, we applied mCSEA to a previously published dataset of sibling pairs discordant for intrauterine hyperglycemia exposure. We found several differentially methylated promoters in genes related to metabolic disorders like obesity and diabetes, demonstrating the potential of mCSEA to identify DMRs not detected by other methods. Availability and implementation mCSEA is freely available from the Bioconductor repository. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Jordi Martorell-Marugán ◽  
Víctor González-Rumayor ◽  
Pedro Carmona-Sáez

AbstractMotivationThe identification of differentially methylated regions (DMRs) among phenotypes is one of the main goals of epigenetic analysis. Although there are several methods developed to detect DMRs, most of them are focused on detecting relatively large differences in methylation levels and fail to detect moderate, but consistent, methylation changes that might be associated to complex disorders.ResultsWe present mCSEA, an R package that implements a Gene Set Enrichment Analysis method to identify differentially methylated regions from Illumina 450K and EPIC array data. It is especially useful for detecting subtle, but consistent, methylation differences in complex phenotypes. mCSEA also implements functions to integrate gene expression data and to detect genes with significant correlations among methylation and gene expression patterns. Using simulated datasets, we show that mCSEA outperforms other tools in detecting DMRs. In addition, we applied mCSEA to a previously published dataset of sibling pairs discordant for intrauterine hyperglycemia exposure. We found several differentially methylated promoters in genes related to metabolic disorders like obesity and diabetes, demonstrating the potential of mCSEA to identify differentially methylated regions not detected by other methods.AvailabilitymCSEA is freely available from the Bioconductor [email protected]


2018 ◽  
Author(s):  
Nikita Mukhitov ◽  
Michael G. Roper

AbstractIn vivo levels of insulin are oscillatory with a period of ~5-10 minutes, implying that the numerous islets of Langerhans within the pancreas are synchronized. While the synchronizing factors are still under investigation, one result of this behavior is expected to be coordinated intracellular [Ca2+] ([Ca2+]i) oscillations throughout the islet population. The role that coordinated [Ca2+]i oscillations have on controlling gene expression within pancreatic islets was examined by comparing gene expression levels in islets that were synchronized using a low amplitude glucose wave and an unsynchronized population. The [Ca2+]i oscillations in the synchronized population were homogeneous and had a significantly lower drift in their oscillation period as compared to unsynchronized islets. This reduced drift in the synchronized population was verified by comparing the drift of in vivo and in vitro profiles from published reports. Microarray profiling indicated a number of Ca2+-dependent genes were differentially regulated between the two islet populations. Gene set enrichment analysis revealed that the synchronized population had reduced expression of gene sets related to protein translation, protein turnover, energy expenditure, and insulin synthesis, while those that were related to maintenance of cell morphology were increased. It is speculated that these gene expression patterns in the synchronized islets results in a more efficient utilization of intra-cellular resources and response to environmental changes.


2018 ◽  
Vol 35 (11) ◽  
pp. 1901-1906 ◽  
Author(s):  
Mary D Fortune ◽  
Chris Wallace

Abstract Motivation Methods for analysis of GWAS summary statistics have encouraged data sharing and democratized the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some ‘truth’ is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study. Results We have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis. Availability and implementation Our method is available under a GPL license as an R package from http://github.com/chr1swallace/simGWAS. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Mary D. Fortune ◽  
Chris Wallace

AbstractMotivationMethods for analysis of GWAS summary statistics have encouraged data sharing and democratised the analysis of different diseases. Ideal validation for such methods is application to simulated data, where some “truth” is known. As GWAS increase in size, so does the computational complexity of such evaluations; standard practice repeatedly simulates and analyses genotype data for all individuals in an example study.ResultsWe have developed a novel method based on an alternative approach, directly simulating GWAS summary data, without individual data as an intermediate step. We mathematically derive the expected statistics for any set of causal variants and their effect sizes, conditional upon control haplotype frequencies (available from public reference datasets). Simulation of GWAS summary output can be conducted independently of sample size by simulating random variates about these expected values. Across a range of scenarios, our method, produces very similar output to that from simulating individual genotypes with a substantial gain in speed even for modest sample sizes. Fast simulation of GWAS summary statistics will enable more complete and rapid evaluation of summary statistic methods as well as opening new potential avenues of research in fine mapping and gene set enrichment analysis.Availability and ImplementationOur method is available under a GPL license as an R package from http://github.com/chr1swallace/[email protected] InformationSupplementary Information is appended.


2013 ◽  
Vol 2013 ◽  
pp. 1-11 ◽  
Author(s):  
Suzana Makpol ◽  
Azalina Zainuddin ◽  
Kien Hui Chua ◽  
Yasmin Anum Mohd Yusof ◽  
Wan Zurinah Wan Ngah

The effect ofγ-tocotrienol, a vitamin E isomer, in modulating gene expression in cellular aging of human diploid fibroblasts was studied. Senescent cells at passage 30 were incubated with 70 μM ofγ-tocotrienol for 24 h. Gene expression patterns were evaluated using Sentrix HumanRef-8 Expression BeadChip from Illumina, analysed using GeneSpring GX10 software, and validated using quantitative RT-PCR. A total of 100 genes were differentially expressed (P<0.001) by at least 1.5 fold in response toγ-tocotrienol treatment. Amongst the genes wereIRAK3, SelS, HSPA5, HERPUD1, DNAJB9, SEPR1, C18orf55, ARF4, RINT1, NXT1, CADPS2, COG6, andGLRX5. Significant gene list was further analysed by Gene Set Enrichment Analysis (GSEA), and the Normalized Enrichment Score (NES) showed that biological processes such as inflammation, protein transport, apoptosis, and cell redox homeostasis were modulated in senescent fibroblasts treated withγ-tocotrienol. These findings revealed thatγ-tocotrienol may prevent cellular aging of human diploid fibroblasts by modulating gene expression.


2021 ◽  
Vol 12 ◽  
Author(s):  
Qian Zhang ◽  
Yang Yu ◽  
Zheng Luo ◽  
Jianhai Xiang ◽  
Fuhua Li

Acute hepatopancreatic necrosis disease (AHPND) has caused a heavy loss to shrimp aquaculture since its outbreak. Vibrio parahaemolyticus (VPAHPND) is regarded as one of the main pathogens that caused AHPND in the Pacific white shrimp Litopenaeus vannamei. In order to learn more about the mechanism of resistance to AHPND, the resistant and susceptible shrimp families were obtained through genetic breeding, and comparative transcriptome approach was used to analyze the gene expression patterns between resistant and susceptible families. A total of 95 families were subjected to VPAHPND challenge test, and significant variations in the resistance of these families were observed. Three pairs of resistant and susceptible families were selected for transcriptome sequencing. A total of 489 differentially expressed genes (DEGs) that presented in at least two pairwise comparisons were screened, including 196 DEGs highly expressed in the susceptible families and 293 DEGs in the resistant families. Among these DEGs, 16 genes demonstrated significant difference in all three pairwise comparisons. Gene set enrichment analysis (GSEA) of all 27,331 expressed genes indicated that some energy metabolism processes were enriched in the resistant families, while signal transduction and immune system were enriched in the susceptible families. A total of 32 DEGs were further confirmed in the offspring of the detected families, among which 19 genes were successfully verified. The identified genes in this study will be useful for clarifying the genetic mechanism of shrimp resistance against Vibrio and will further provide molecular markers for evaluating the disease resistance of shrimp in the breeding program.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Vincent Branders ◽  
Pierre Schaus ◽  
Pierre Dupont

Abstract Background Transcriptome analysis aims at gaining insight into cellular processes through discovering gene expression patterns across various experimental conditions. Biclustering is a standard approach to discover genes subsets with similar expression across subgroups of samples to be identified. The result is a set of biclusters, each forming a specific submatrix of rows (e.g. genes) and columns (e.g. samples). Relevant biclusters can, however, be missed when, due to the presence of a few outliers, they lack the assumed homogeneity of expression values among a few gene/sample combinations. The Max-Sum SubMatrix problem addresses this issue by looking at highly expressed subsets of genes and of samples, without enforcing such homogeneity. Results We present here the algorithm to identify K relevant submatrices. Our main contribution is to show that this approach outperforms biclustering algorithms to identify several gene subsets representative of specific subgroups of samples. Experiments are conducted on 35 gene expression datasets from human tissues and yeast samples. We report comparative results with those obtained by several biclustering algorithms, including , , , , and . Gene enrichment analysis demonstrates the benefits of the proposed approach to identify more statistically significant gene subsets. The most significant Gene Ontology terms identified with are shown consistent with the controlled conditions of each dataset. This analysis supports the biological relevance of the identified gene subsets. An additional contribution is the statistical validation protocol proposed here to assess the relative performances of biclustering algorithms and of the proposed method. It relies on a Friedman test and the Hochberg’s sequential procedure to report critical differences of ranks among all algorithms. Conclusions We propose here the method, a computationally efficient algorithm to identify K max-sum submatrices in a large gene expression matrix. Comparisons show that it identifies more significantly enriched subsets of genes and specific subgroups of samples which are easily interpretable by biologists. Experiments also show its ability to identify more reliable GO terms. These results illustrate the benefits of the proposed approach in terms of interpretability and of biological enrichment quality. Open implementation of this algorithm is available as an R package.


2017 ◽  
Author(s):  
Mingze He ◽  
Peng Liu ◽  
Carolyn J. Lawrence-Dill

AbstractGenome-wide molecular gene expression studies generally compare expression values for each gene across multiple conditions followed by cluster and gene set enrichment analysis to determine whether differentially expressed genes are enriched in specific biochemical pathways, cellular components, biological processes, and/or molecular functions, etc. This approach to analyzing differences in gene expression enables discovery of gene function, but is not useful to determine whether pre-defined groups of genes share or diverge in their expression patterns in response to treatments nor to assess the correctness of pre-defined gene set groupings. Here we present a simple method that changes the dimension of comparison by treating genes as variable traits to directly assess significance of differences in expression levels among pre-defined gene groups. Because expression distributions are typically skewed (thus unfit for direct assessment using Gaussian statistical methods) our method involves transforming expression data to approximate a normal distribution followed by dividing the genes into groups, then applying Gaussian parametric methods to assess significance of observed differences. This method enables the assessment of differences in gene expression distributions within and across samples, enabling hypothesis-based comparison among groups of genes. We demonstrate this method by assessing the significance of specific gene groups’ differential response to heat stress conditions in maize.AbbreviationsGO– gene ontology HSP – heat shock proteinKEGG– Kyoto Encyclopedia of Genes and GenomesHSF TF– heat shock factor transcription factorHSBP– heat shock binding proteinRNA– ribonucleic acidTE– transposable elementTF– transcription factorTPM– transcripts per kilobase millions


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Gulden Olgun ◽  
Afshan Nabi ◽  
Oznur Tastan

Abstract Background While some non-coding RNAs (ncRNAs) are assigned critical regulatory roles, most remain functionally uncharacterized. This presents a challenge whenever an interesting set of ncRNAs needs to be analyzed in a functional context. Transcripts located close-by on the genome are often regulated together. This genomic proximity on the sequence can hint at a functional association. Results We present a tool, NoRCE, that performs cis enrichment analysis for a given set of ncRNAs. Enrichment is carried out using the functional annotations of the coding genes located proximal to the input ncRNAs. Other biologically relevant information such as topologically associating domain (TAD) boundaries, co-expression patterns, and miRNA target prediction information can be incorporated to conduct a richer enrichment analysis. To this end, NoRCE includes several relevant datasets as part of its data repository, including cell-line specific TAD boundaries, functional gene sets, and expression data for coding & ncRNAs specific to cancer. Additionally, the users can utilize custom data files in their investigation. Enrichment results can be retrieved in a tabular format or visualized in several different ways. NoRCE is currently available for the following species: human, mouse, rat, zebrafish, fruit fly, worm, and yeast. Conclusions NoRCE is a platform-independent, user-friendly, comprehensive R package that can be used to gain insight into the functional importance of a list of ncRNAs of any type. The tool offers flexibility to conduct the users’ preferred set of analyses by designing their own pipeline of analysis. NoRCE is available in Bioconductor and https://github.com/guldenolgun/NoRCE.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jovana Maksimovic ◽  
Alicia Oshlack ◽  
Belinda Phipson

AbstractDNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalization, and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches, and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.


Sign in / Sign up

Export Citation Format

Share Document